Most Marketing Databases are built using a relational database management system. That means somewhere, buried deep beneath all those sexy screens in whatever application you choose to employ, lurks a confusing and cantankerous beast known as Structured Query Language, more commonly known as SQL. Originally invented by IBM in the 1970's, SQL has become accepted as the industry standard database access language. For marketers, it is one of the few four letter words in existence with only three letters.
My case in point for today is the SQL term used to describe the absence of data: null. According to relational purists, if you don't have data for a field in your database, then that field contains null. Null means nothing, empty, zippo, nada. It does NOT mean blank (a hard space) or zero. So if you think your checking account balance became null after Christmas, you'll be relieved to know that this is not so. In fact, purists will argue that if a field is blank or zero it means you DO know something about it, whereas if it's null it means that you don't. Post holiday spending, you'll be happy to hear that zero only means you are out of money. Less than zero means you're in trouble. Null means you don't even know if you have a checking account, or worse yet, a bank.
The problem with nulls is that they don't jive with what most marketers want or expect in terms of answering marketing questions. According to SQL, a null value can't be tested against anything but the keyword NULL. So let's say you have a field called gender in your database, and it generally contains three values: B for businesses, F for female and M for male, as well as null when you don't know. Let's further suppose that you want to run a campaign targeting every customer except for the ones you know are businesses. You might think you can do this by entering something like gender <> 'B' (gender not equal to 'B'). Wrong. Guess what? Null is neither equal nor unequal to B. So to get what you want, you have to say gender <> 'B' OR gender is null. Get it? Me neither.
Another confusing fact is that null fields don't equal each other; i.e., field1 = field2 fails if either or both fields are null. This is in direct contrast to what Mr. McAlpine taught me in my two years of 7th grade Algebra: if A = B and B = C, then A = C. With due apologies to Mr. McAlpine, when fields contain nulls you have to ask the question this way:
field1 = field2 OR field1 is null AND field2 is null
If you're still reading this article we're about to get to the fun part. You may have noticed that in both scenarios I've described so far, to get correct results I had to add an OR something is null condition. Therein lies a subtle but nasty problem: the way OR's are evaluated inside databases. Just for fun, let's put both of our examples into a single query: gender <> 'B' AND field1 = field2. Based on what we've discussed, you've learned that to get the correct answer you need to ask the question this way:
gender <> 'B' OR gender is null AND field1 = field2 OR field1 is null AND field2 is null
And the answer is wrong again! With SQL databases (as well as programming languages), AND conditions are evaluated first, followed by OR conditions. Here how the database interprets your question, with parenthesis added for clarity:
(gender <> 'B') OR (gender is null AND field1 = field2) OR (field1 is null AND field2 is null)
As you can see, this is clearly not what we thought we were asking. Unfortunately the database doesn't care what we mean, it obnoxiously does what we say instead. To get what we want we have to ask it this way:
(gender <> 'B' OR gender is null) AND (field1 = field2 OR field1 is null AND field2 is null)
Here's my inflammatory anti-relational theory advice for today: now that you know how not to ask for nothing (or not nothing), don't do it. Don't use nulls in your marketing database unless you truly feel that you have to. Instead, use blank for character fields (like gender) and zero for numerics. Trust me, ninety-nine percent of the time you'll get what you want by doing this. In over 10 years of designing marketing databases, the only area where I've seen a need for using null values is with date fields. While you can assign a default value for dates (we like to use January 1, 1900), it seems bad form to store a deceased date for someone who's still alive. On the bright side, it does cut down on your mail volume a lot if you exclude people with a deceased date and everyone in your database has one.
Some of the more technical readers know that SQL omits null values from calculations, and may object to filling null numeric fields with zero due to the impact it has on arithmetic. For marketers, this most often translates into computing sums and averages of dollar amounts, because by far the most common fields that we like to total or average involve money. If we don't know how much money somebody paid us for something, one might argue that assuming it is zero will have an adverse impact on computing their average purchase. While this is true, my advice is that if you want to get accurate answers, you can never truly escape having some knowledge about your data. If there are certain items (such as premiums) that have no cost, you are better off excluding them specifically than you are relying on them being automatically excluded because they are null. And it's a heck of a lot more intuitive to say price <> 0 than it is to say price <> 0 and price is not null.
In closing, remember that in the battle of good versus evil, there are two kinds of fields in an SQL database: those that allow nulls and those that don't. Generally speaking, you're better off sticking with those that don't.
Copyright 2002, PRIMEDIA Business Magazines & Media Inc. All rights reserved.
|