Data platforms do not spring forth fully built and operational. They take time to develop and build out. The same is true for a Data Quality practice. To ensure you have the appropriate Data Quality checks on your platform, include QA Elements to your Quality Framework for each Feature, HotFix, and Defect.
Feature- every feature added to a data environment should include a series of tests and validations necessary to say that the feature is delivered successfully. Define the QA tests when creating each new feature so Quality Engineering, Data Engineering, and Analytics Engineering understand the scope and complexity of the feature. Validation tests should be a baseline requirement for every code deployment.
It's essential to understand the Quality requirements as you design features, the acceptable variance within the system, and the appropriate actions to take given an issue.
Some data may have no tolerance for variance, and some may have a degree of variance and still be acceptable.
Examples:
Batch loads of data from a live system of record will have a variance between when the batch loads start and when validated against the source.
Large numbers of transactions added/multiplied together can create variance due to rounding differences between systems. Enough records with '.00000000012' cents can add up.
Setting up expectations for reasonable variance levels for any feature is vital to establish at the start.
These are the recommended thresholds and responses:
0-1% Feature passes QA
1-3% Feature has a Level 1 Defect. Log the Defect in the backlog, and the business provides approval/ rejection of moving to Production.
3-5% Feature has a Level 2 Defect. Log the Defect in the backlog, and the business provides approval/ rejection of moving to Production. A warning is added to the Defect as a potentially significant issue. If this issue is detected in Production, follow rollback/ 3x retry, then manual review process: Alert engineering, quality, and business of a potential problem.
5%+ feature fails QA and is sent back to engineering as a bug where code cannot move to Production. If detection occurs in Production, alerts are sent to engineering, quality, business, and any change review process warning of the issue and the need for a HotFix. The case is logged on the production quality dashboard.
HotFix: Issues with upstream data quality will always occur. Upstream changes impact data consumption, leading to Hot Fix changes to align data ASAP.
If there is ever a need for engineering to build code and implement a Hot Fix that was not triggered by a QA test, then QA needs to add quality tests when the Hot Fix change request goes live. This ensures that:
1. QA has 'skin in the game' and is working closely with the business and engineering.
2. Any defects addressed by the Hot Fix have quality tests associated with the issue, ensuring that these issues are blocked or engineering is alerted to the problem in the future.
3. Encourages QA to have robust data quality checks in place to minimize their Hot Fix engagements (no one wants to respond to issues in the dead of night).
Defect: In every complex system, imperfections will exist. When a Defect is identified, and engineering develops a fix, quality should build tests to ensure the issue is resolved correctly. Treat defects in the same way features are treated.
DATA QUALITY BLOG SERIES
Each day the Data Quality Blog post will be released at 8:45 AM each day.
DATA QUALITY - Part 1 January 6th
DATA QUALITY CONCEPTS - Part 2 January 7th
DATA QUALITY FOR EVERYONE - Part 3 January 10th
DATA QUALITY FRAMEWORK - Part 4 January 11th
DATA QUALITY DEVELOPMENT - Part 5 January 12th
QUALITY DATA - Part 6 January 13th