Excellent quality data is an elusive thing. Only 3% of all companies have databases that conform to even basic data quality standards. This is an extremely low number, but perhaps it shouldn't be surprising.
There are six key features that make up excellent quality data, and all six must be present in order for data to qualify.
Finding Where You Stand
We've compiled a relatively simple test to determine whether or not your records meet the standards of true data quality based on the six fundamental characteristics that all great quality data must possess.
Below is a set of questions to ask and things to look for in your own data for each of the six data quality components.
If your records meet each standard as described, you've earned a point.
You should keep track of how many you've earned as you go.
The Six Characteristics of Excellent Data Quality
The first thing that comes to mind when most people think of data quality is accuracy, and rightly so.
It's a fundamental part of building a set of data that people can actually make sense of (and then, of course, go on to make use of).
- All of your records must accurately represent the information in question
- All names should be written out in full
- All addresses should be real places where contacts actually reside
- All email addresses should exist and have an active user on the other end
You get the idea?
You should also screen for basic language and typographical errors, issues which are thought to occur about 1.2% of the time in human typing.
Are there any issues with homophones? Is everything spelt correctly, with no jumbled, extra or missing letters?
Don't assume this is so unless you know for sure; many first and last names and street names have more than one spelling, and the difference between the variations does matter.
Unless all of these things are in order, you won't be able to give yourself the point for data accuracy.
Valid data is data that conforms to the format in which all similar data in the set is being presented.
This sounds more complicated than it really is.
Dates, for instance, can follow many different formats, YYYY/MM/DD being the most common configuration. Alternately, you could also use YYYY/DD/MM, MM/DD/YYYY, DD/MM/YY, and several other variations.
Whichever of these you choose to go with, you need to stick with the same configuration for every piece of date-related information on every profile.
You'd be surprised how easy it is to switch between different date configurations when dealing with large sets of data. Any date which does not conform to the established standard risks not being read properly when you attempt to analyze it; therefore, it is not considered valid data and will cost you your point for validity.
This goes for any other sets of standardized data as well.
In order to be easily searched through, data must be consistent in syntax and terminology.
When faced with inputting a large amount of data, many people try to take shortcuts that become troublesome to deal with later - for example, is 'Highland Crt' supposed to represent 'Highland Court' or 'Highland Crescent?' Only the person who entered it knows for sure.
This problem is even more common when you are using imported data that was individually entered by each of your contacts, who have no way of knowing what standard setup you prefer.
If you're doing your data quality management right, you will have already picked up on and corrected these issues, though, so you shouldn't find any in your current database records.
If you do, you can't give yourself this point.
This component mostly speaks for itself, but it still often goes overlooked.
Critical fields like names, phone numbers and email addresses should always be filled in for every profile in your database.
Some other fields, like contacts' middle names or purchase history, may be left blank in some circumstances (not every contact will have applicable information for you to input).
Others still may not be in every contact database, but should be filled in for every profile if they are included - this is where information like a physical address falls.
Even if you don't have information for an optional field and don't plan on acquiring any, it's better to enter some indicator of that (simply entering 'N/A' will usually suffice) than to leave it blank. This shows that you have left the field open intentionally and that doing so was not an accidental oversight.
If something like this was not done, you should not count your data as complete.
Change is a constant, even in the world of data.
If your contact profiles aren't being consistently updated on a regular schedule, you can't expect to have good, usable data. Any information you store away in your database and never touch again is actually losing relevance at a rate of 2.1% per month, or nearly 25% per year.
This is usually a more pertinent concern for certain types of data, like email addresses, but nothing should be taken for granted in this area.
Have your profiles been reviewed recently (at least once in the past 3 months or so), even if nothing has changed? Is there anything listed there that looks suspiciously out of date? Because not all data will change that often, this component of good data can be hard to pin down, but it's important to make an effort.
If in doubt, assume that no review of the data has taken place and do not add this point to your score.
These criteria can only be properly assessed when all five of the others have already been addressed.
Once they have, look over your profiles again; are all of them unique?
To qualify as such, a profile must represent a combination of different data points that appears nowhere else in your system.
Some overlap with other profiles is expected (lots of people live in the same geographical area or share a name, after all), but the profile in its totality should be unlike any other.
If you're noticing a lot of problems with this dimension of data quality in your own records, you may not be collecting enough differentiating data. The more likely cause, though, is that a duplicate entry has sneaked into your records. In terms of negative effects, one or two duplicates usually won't have much of an impact and will often be noticed and deleted as time passes.
However, their presence is still worrying because of what it signals about the frequency and thoroughness of your data quality management efforts.
Even one duplicate must be considered a failure in this area since consistent data quality management would have already uncovered the issue.
The All-Or-Nothing Rule
Now that we've covered all six characteristics, take a moment to review the findings of your own data;
How well did you score?
If you're great, you got a perfect 6 out of 6.
This means that your contact data is largely of good quality and you can be confident that it is safe and reliable to use in your business.
This doesn't mean that there isn't room for improvement, but it's a good start.
Is good enough close enough?
All it takes is one missing characteristic to substantially impact data reliability and usefulness.
Data can only be considered to be of quality when it meets every single one of these six benchmarks.
You must pass all six checks in order to achieve contact data quality management standards.
Becoming Part of the 3%
Just because excellent quality data is rare doesn't mean it isn't necessary.
Not many companies may have achieved quality data yet, but every single one should be aiming for it.
If your data isn't where you want it to be, now is the time to act. Consider your data-related weaknesses, make it a priority to get your data quality management strategy in order, and you'll soon see data that meets every one of these critical benchmarks.
You'll learn just how pervasive a problem concern over shaky data can be in our next post and hear more about what you can do to keep your data quality standards in line; Join your peers to make sure you don't miss it.