Storing the results of Data Quality tests can be very helpful in determining the overall quality of the Data over time, identity poorly performing sources/processes. This quality data is also instrumental for In-Line quality tests to ensure that insufficient data doesn't make it to Production and Off-Line test for automatically rolling back issues in Production.
Example:
Customer ABC had more than 12,000 addresses for the last month. Today customer ABC only has 11,000 addresses (more than a 10% drop in addresses). All files have been processed successfully, and the other QA tests are all passed.
This variance could trigger a Rollback with notification to Data Engineering and the Business to review. Maybe client ABC closed over 10% of their locations, or a change to the source system introduced a special character in the business key that's incorrectly dropping the addresses.
The Rollback keeps the data for the customer in its last know good state until this can be reviewed and confirmed by engineering and the business.
DATA QUALITY SUMMARY
Data Quality is a highly complex topic that is critical to the success of any data platform. How you handle data quality can make or break your data platform. We know issues with data quality will arise.
How you deal with them is the challenge.
DATA QUALITY BLOG SERIES
Each day the Data Quality Blog post will be released at 8:45 AM each day.
DATA QUALITY - Part 1 January 6th
DATA QUALITY CONCEPTS - Part 2 January 7th
DATA QUALITY FOR EVERYONE - Part 3 January 10th
DATA QUALITY FRAMEWORK - Part 4 January 11th
DATA QUALITY DEVELOPMENT - Part 5 January 12th
QUALITY DATA - Part 6 January 13th