GCP – BigQuery delivers a modern view of materialized views
Today, we are excited to announce the general availability (GA) of BigQuery materialized views. Materialized views (MV’s) are precomputed views that periodically cache results of a query to provide customers increased performance and efficiency. They significantly boost performance and lowers costs of workloads that have the characteristic of common and repeated queries and are ideal for the following use cases:
-
Faster query performance: If you have a raw table and perform online analytical processing (OLAP)- style aggregation that requires significant processing, and you have predictable and repeated queries, such as from extract, transform, load (ETL) or business intelligence (BI) pipelines, you should use BigQuery materialized views.
-
Aggregation of real-time data: If you want access to data for decisions in real-time, BigQuery natively supports streaming capabilities. Materialized views integrate with BigQuery’s streaming architecture and perform aggregations in real-time to provide up-to-date information.
New enhancements available in GA
Now, users can now create MV’s within different datasets and projects – 20 per dataset, 100 per project and 500 globally. Additional aggregation functions, like ANY_VALUE
, COUNTIF
, BIT_AND
, BIT_OR
, BIT_XOR
, LOGICAL_AND
, LOGICAL_OR
are now available, as well as support for CROSS/LEFT JOIN UNNEST
, INFORMATION_SCHEMA
, and Customer Managed Encryption Keys (CMEK).
A modern approach to materialized views
Materialized views are a standard optimization feature within databases and have been in existence for many years. BigQuery materialized views offer a more modern approach and avoids the pain points of traditional approaches. Key design principles includes:
-
Zero maintenance: A materialized view is recomputed in the background every time the base table changes. All incremental data changes from the base tables are automatically added to the materialized view. No user inputs are required.
-
Always fresh: A materialized view is always consistent with the base table, including BigQuery streaming tables. If a base table is modified via update, merge, partition truncation, or partition expiration, BigQuery will invalidate the impacted portions of the materialized view and fully re-read the corresponding portion of the base table. For an unpartitioned materialized view, BigQuery will invalidate the entire materialized view and re-read the entire base table. For a partitioned materialized view, BigQuery will invalidate the affected partitions of the materialized view and re-read the entire corresponding partitions from the base table. Partitions that are append-only are not invalidated and are read in delta mode. In other words, there will never be a situation when querying a materialized view results in stale data.
-
Smart tuning: If a query or part of a query against the source table can instead be resolved by querying the materialized view, BigQuery will intelligently rewrite (reroute) the query to use the materialized view for better performance and/or efficiency.
BigQuery customers have been testing materialized views and have derived great value in gaining cost efficiency and improved performance:
“BigQuery materialized views are easy to set-up, effortless to use, and best of all it’s real-time. It immediately unlocked accelerated performance and cost savings within our environment. LOVE this feature addition to BigQuery.” -Scott Schaen, VP of Analytics, Wunderkind
“Not only does it come with great cost reduction, but it also hugely improves performance. Query latency is critical in our case where reporting data is directly consumed by the UI which processes ~8,000 SQL queries per day with each query needing to complete in under a second. This has allowed us to successfully migrate our Vertica Reporting cluster to BigQuery.” -Adrian Witas, SVP Chief Architect at Viant
Using BigQuery materialized views
Let’s look at an example to see how materialized views work. In the following example, I’m using the TPC-H orders table to demonstrate the power that materialized views can have on BI workloads. The orders table below is 1.5TB in size and is unpartitioned and unclustered. The simple query below attempts to look at the total sales by day , which is a pretty common question many businesses ask on a daily basis.
From the BigQuery console, we can see that the query scanned 436GB of data and took 8.4 seconds to run.
The majority of time spent on the query is the initial scan of the 29 billion rows and initial aggregation. That first step is taking the majority of the time. Let’s see the impact that a materialized view can have on this query.
We take the original query and create a materialized view from it. Next we re-run the original query, still pointing at the original base table. Remember, a key tenant of a materialized view is the optimizer re-writing the query to take advantage of it. We can optionally point at the materialized view if we need to but the BigQuery optimizer should select it in most cases.
The original query now runs in 2.3 seconds and only processes 2.5MB of data! Compare that to the original query needing to scan nearly 500GB. The job execution metrics look even better.
The first input stage for compute and write IO have been virtually eliminated. Now the bulk of the query is spent actually writing the results to the temp table! Pretty nice improvements. The best part of all is you don’t have to refresh the data when using materialized views with BigQuery! The data returned by an MV is always up-to-date even if the base table contains modifications. BigQuery does that heavy lifting for you automatically. As your underlying data changes, BigQuery refreshes your data on an automated basis so you don’t have to manually trigger refreshes (though you can if you want!).
Understand that this is a very simplistic example but it showcases the power that BigQuery materialized views can have on performance as well as making your queries much more efficient for both costs and time.
Next steps with BigQuery materialized views
BigQuery materialized views is now GA and provides query cost-optimization and increased performance. Getting started with BigQuery materialized views couldn’t be easier. You can learn more about them in our comprehensive documentation and start taking advantage of them in your analytics projects right away. And, if you’re new to BigQuery, check out our BigQuery sandbox for a frictionless way to get started.
Read More for the details.