Share

Securing Data: Column Encryption vs. Dynamic Masking

Christopher Wagner • June 16, 2019

Storing data in the cloud requires additional security.

CAUTION : When moving to the cloud, in order to ensure you have addressed all of the necessary security concerns, please make sure you partner with a cloud security professional to address all of the potential issues . They will help you understand and talk through the myriad of different security controls you need to address while in the cloud. This post only addresses some of the aspects that need to be addressed to secure your data. This is by no means everything done to secure our environment or our data.


Two years ago we began our journey migrating from on-prem server environment running our Microsoft Business Intelligence (MSBI) solution out to the cloud. We had a classic MSBI environment, DB server, SSIS server, SSAS server and an SSRS server (more on this in another post).

There are a number of considerations we had to take into account to ensure that this entire ecosystem was locked down and secured.

Protecting our data against both external and internal threats was the most important aspect of this process.

Our True-North guiding principal around data security was that Data needed to ALWAYS be encrypted.

Encrypted connections? Check.

Encrypted tunnel? Check.

Encrypted server? Check.

Encrypted device access? Check.

Encrypted in-memory and processing? Hmmmm... this is a potential challenge.

Challenge: Encryption turns the data into gibberish that can only be used with a very specific key. Encryption at the in-memory and process level protects data from a hacker watching the data flow through the processor or in the memory. This greatly increases processing time and may be necessary in certain situations, but should be limited where ever possible.


Column Level Encryption

In order to protect our most sensitive data, yet get good performance, we opted to go with Column Level Encryption for our most sensitive fields. This ensured that our data was protected to the appropriate level to enable us to move online, while maximizing our performance.

Our ETL and ELT processes where changed to ADO connections, and secured columns were encrypted upon pull and stored with as an encrypted value.

Account number '1234567A' would be translated into qwerytuadfs1234zxcvbnmghjktyui678r4567ghjklkjfdghtre encrypted value.

This is good, the data was secure. We were confident that no one could make sense of this data and we could migrate to the cloud.


Challenges Start to Emerge

Over the first few months, we started to encounter some challenges with column level encryption.

Encrypted Challenges- Calculations

As we started to work with the encrypted fields, we began to encounter a number of issues with working with encrypted fields. Encrypted date fields could not be worked with to perform date logic. Doing something as simple as calculating a Birthday was impossible with an encrypted date field, so queries that returned people who turned 21 in the last week could not be done.

Encrypted Challenges- Integration

Integrating data from a variety of different sources offered another challenge. Especially when the account number was encrypted.

Account number '1234567A' would be translated into qwerytuadfs1234zxcvbnmghjktyui678r4567ghjklkjfdghtre encrypted value.

BUT

Account number '1234567a' would be translated into a different asfdzxvcvbnmw456r5678fghjdwerty5 encrypted value.

Crap- one of our sources for data was not properly storing the account number with all capital letters. Normaly, no big deal. Something that would have been caught in the integration development. With the encrypted values, this was something that was missed.

While not unworkable, this created a number of challenges when it came to developing and building out new data integrations. Error rates would increase, and time to defect resolution started to take longer and longer.


ADO Connection and Load Times

As we continued on, we noticed that our load times were growing at an increasing rate. While our data was growing previously, the time to load was only growing by a few minutes each month. With these new ADO connections, we started to notice that load times were growing by minutes each week. This was rapidly going to grow beyond our SLA. We tried many different load optimizations, but it all turned back to the ADO connection necessary for column level encryption. Something would need to be done.


What to do?!!?


Mitigating Security Controls

When looking at security, it was important for us to understand the risks and address any risks with an appropriate means of control.

We reached out to our security partners to determine what other controls could be added, or used to justify the removal of column level encryption.

While we were working in the cloud and uncovering these challenges, our partners in security were continuing to increase the security of our entire environment through a variety of new methods.

Our security partners were able to confirm that the advancements that they had made in perimeter and environment protection were enough to address the risk of external entities accessing the data, but that didn't cover the issue of internal resources being able to access fields that they should not have access to.


Dynamic Masking

The advancement in perimeter and environment protection allowed us to revisit other means of protecting the data. Dynamic Masking was an option that we really liked as part of our initial security review, due to the flexibility of the solution, but because it does not actually encrypt the data, it had to be eliminated from our initial deployment. Once the need for column level encryption was removed, we were able to add Dynamic Masking back into our security strategy.

To address internal threats, we replaced the column level encryption with dynamic masking of data. This allowed us to build out role based access to allow some users to see the values in the columns, and others to NOT be able to see the values in the columns.

This ensured that users in the secured role had access to the data could run all of the necessary queries to do their research and data exploration. Then users who were not in the secured role were still able to run the queries, but saw masked values instead of the secured information.

Summary
Now our data is in the cloud in a highly secured environment and stored in a manner that loads fast, is easy to use, and protected from internal potential risks.

I am proud to say that our lead security engineer proclaimed our environment 'the most secure data environment in the company'.

We continue to partner with our security teams working to proactively address any new changes to security and continue to evolve our security profile. While no digital environment is ever completely secure, I have faith that we will continue to set the bar for data security within our organization.


TDL
Partner with Security
Use column encryption sparingly
Dynamic Masking works well



CHRIS WAGNER, MBA MVP

Analytics Architect, Mentor, Leader, and Visionary

Chris has been working in the Data and Analytics space for nearly 20 years. Chris has dedicated his professional career to making data and information accessible to the masses. A significant component in making data available is continually learning new things and teaching others from these experiences. To help people keep up with this ever-changing landscape, Chris frequently posts on LinkedIn and to this blog.
By Christopher Wagner September 3, 2024
Your guide to becoming a Data Engineer.
By Christopher Wagner August 19, 2024
Compare Microsoft Fabric and Databricks, two leading data platforms. Highlights their features, strengths, and unique offerings across various domains like data engineering, data analytics, data science, DevOps, security, integration with other tools, cost management, and governance. Microsoft Fabric is noted for its low-code/no-code solutions and seamless integration with Microsoft tools, making it accessible for users with varying technical skills. Databricks is praised for its high-performance capabilities in big data processing and collaborative analytics, offering flexibility and control for experienced data teams.
By Christopher Wagner November 15, 2023
In a dynamic data engineering scenario, Sam, a skilled professional, adeptly navigates urgent requests using Microsoft Fabric. Collaborating with Data Steward Lisa and leveraging OneLake, Sam streamlines data processes, creating a powerful collaboration between engineering and stewardship. With precision in Azure Data Factory and collaboration with a Data Scientist, Sam crafts a robust schema, leading to a visually appealing Power BI report.
By Christopher Wagner April 28, 2023
NOTE: This is the first draft of this document that was assembled yesterday as a solo effort. If you would like to contribute or have any suggestions, check out my first public GIT repository - KratosDataGod/LakehouseToPowerBI: Architectural design for incorporating a Data Lakehouse architecture with an Enterprise Power BI Deployment (github.com) This article is NOT published, reviewed, or approved by ANYONE at Microsoft. This content is my own and is what I recommend for architecture and build patterns.
Show More
Share by: