Share

QUALITY DATA - Part 6

Christopher Wagner • January 13, 2022

Lets wrap this up!

Storing the results of Data Quality tests can be very helpful in determining the overall quality of the Data over time, identity poorly performing sources/processes. This quality data is also instrumental for In-Line quality tests to ensure that insufficient data doesn't make it to Production and Off-Line test for automatically rolling back issues in Production. 

 

Example: 

 

Customer ABC had more than 12,000 addresses for the last month. Today customer ABC only has 11,000 addresses (more than a 10% drop in addresses). All files have been processed successfully, and the other QA tests are all passed. 

 

This variance could trigger a Rollback with notification to Data Engineering and the Business to review. Maybe client ABC closed over 10% of their locations, or a change to the source system introduced a special character in the business key that's incorrectly dropping the addresses. 

 

The Rollback keeps the data for the customer in its last know good state until this can be reviewed and confirmed by engineering and the business. 

 

DATA QUALITY SUMMARY 

 

Data Quality is a highly complex topic that is critical to the success of any data platform. How you handle data quality can make or break your data platform. We know issues with data quality will arise.


How you deal with them is the challenge. 



DATA QUALITY BLOG SERIES

Each day the Data Quality Blog post will be released at 8:45 AM each day.


DATA QUALITY - Part 1 January 6th

DATA QUALITY CONCEPTS - Part 2 January 7th

DATA QUALITY FOR EVERYONE - Part 3 January 10th

DATA QUALITY FRAMEWORK - Part 4 January 11th

DATA QUALITY DEVELOPMENT - Part 5 January 12th

QUALITY DATA - Part 6 January 13th




CHRIS WAGNER, MBA MVP

Analytics Architect, Mentor, Leader, and Visionary

Chris has been working in the Data and Analytics space for nearly 20 years. Chris has dedicated his professional career to making data and information accessible to the masses. A significant component in making data available is continually learning new things and teaching others from these experiences. To help people keep up with this ever-changing landscape, Chris frequently posts on LinkedIn and to this blog.
By Christopher Wagner September 3, 2024
Your guide to becoming a Data Engineer.
By Christopher Wagner August 19, 2024
Compare Microsoft Fabric and Databricks, two leading data platforms. Highlights their features, strengths, and unique offerings across various domains like data engineering, data analytics, data science, DevOps, security, integration with other tools, cost management, and governance. Microsoft Fabric is noted for its low-code/no-code solutions and seamless integration with Microsoft tools, making it accessible for users with varying technical skills. Databricks is praised for its high-performance capabilities in big data processing and collaborative analytics, offering flexibility and control for experienced data teams.
By Christopher Wagner November 15, 2023
In a dynamic data engineering scenario, Sam, a skilled professional, adeptly navigates urgent requests using Microsoft Fabric. Collaborating with Data Steward Lisa and leveraging OneLake, Sam streamlines data processes, creating a powerful collaboration between engineering and stewardship. With precision in Azure Data Factory and collaboration with a Data Scientist, Sam crafts a robust schema, leading to a visually appealing Power BI report.
By Christopher Wagner April 28, 2023
NOTE: This is the first draft of this document that was assembled yesterday as a solo effort. If you would like to contribute or have any suggestions, check out my first public GIT repository - KratosDataGod/LakehouseToPowerBI: Architectural design for incorporating a Data Lakehouse architecture with an Enterprise Power BI Deployment (github.com) This article is NOT published, reviewed, or approved by ANYONE at Microsoft. This content is my own and is what I recommend for architecture and build patterns.
Show More
Share by: