Over the last 20 years, data sizes have exploded and technology has continually adjusted to come up with new ways to deal with these large data sets. Databases (SQL Server, Oracle, Sybase, DB2) with hundreds of GB of data were augmented by Data Warehouses (Netezza, Teradata, SAP HANA, Redshift, Snowflake, SQL DW) with hundreds of Terabytes of data. These Data Warehouses then added Data Data lakes (ADLg2, S3) with Petabytes of data to the ecosystem.
Power BI reports and dashboards are designed to be able to deal with any size data. You just have to know when and where to use certain techniques to make these work. To do this, you need to understand how Power BI works.
Power BI Engine Basics
Power BI Import Mode
Power BI was designed to run off of in-memory cached data. Cached data can execute tens of thousands of queries across hundreds of millions of records with a sub-second response. This allows users of reports to be able to really interact with the data in real-time. Because the data is in the cache, the time it takes for the engine to move through the data is insanely fast.
Import Mode PROS:
Import Mode CONS:
Power BI Direct Query Mode
Direct Query addresses the issue with the size of data by not having to cache the data. This enables Power BI to report on data that exceeds the size of data that can be cached.
Direct Query Mode PROS:
Direct Query Mode CONS:
Even high performing platforms (Redshift, Snow, SQL DW, SAP HANA, Netezza and Teradata) can really struggle under the number of queries Power BI will fire at it. Poorly written queries and poorly designed data sets can crash Netezza, make Redshift go non-responsive, and fail to return data from SAP HANA.
The Challenge
Well, shoot- we are often asked to deal with data sets that are much larger than what we can cram into Power BI. At the same time, NO ONE wants a slow-performing report.
So how the heck are you going to deal with these Massive amounts of data AND make sure that it performs???
Power BI Aggregations the C-C-C-COMBO BREAKER!!!
The best solution for large data sets is to leverage Power BI's new Aggregations and Dual mode where you can have some imported data, and some direct queries for detailed data. You import aggregated data at a grain that allows interaction within the dashboard but does a direct query of detailed data.
Example:
Imported Aggregate Data :
SalesPersonID, Date, ClientID, ProductType, SUM(Amt), SUM(QTY)
Direct Query Detail Data:
SalesPersonID, Date, ClientID, OrderNumber, ProductType, ProductID, Product Description, Amt, QTY
This example would allow you to build interactive reports off of all of the Sales Person attributes, Client attributes, Product type attributes, and date.
For any details, you then drill into the data for subsets of the data on another tab of the report.
This really changes the way we look at building out our reports and dashboards. No longer do we have to do ALL IMPORT or ALL DIRECT QUERY.
More Information
Josh Caplan (Power BI Principal Group Program Manager) announced these features going to GA on July 18, 2019.
https://powerbi.microsoft.com/en-us/blog/aggregations-for-petabyte-scale-bi-is-generally-available/
Technical Details:
https://docs.microsoft.com/en-us/power-bi/desktop-aggregations