2025 05 29

GCP – BigLake evolved: Build open, high-performance, enterprise Iceberg-native lakehouses

Data management is changing. Enterprises need flexible, open, and interoperable architectures that allow multiple engines to operate on a single copy of data. Apache Iceberg has emerged as the leading open table format, but in real-world deployments, customers often face a dilemma: embrace the openness of Apache Iceberg but compromise on fully managed, enterprise-grade storage management, or choose managed storage but sacrifice the flexibility of open formats.

This week, we announced innovations in BigLake, a storage engine that provides a foundation for building open data lakehouses on Google Cloud that bring the best of Google’s infrastructure to Apache Iceberg, eliminating the trade-off between open-format flexibility and high-performance enterprise-grade managed storage. These innovations include:

Open interoperability across analytical and transactional systems: Formerly known as BigQuery metastore, the fully managed, serverless, scalable BigLake Metastore, now generally available (GA), simplifies runtime metadata management and works across BigQuery as well as other Iceberg compatible engines. Powered by Google’s planet-scale metadata management infrastructure, it removes the need to manage custom metastore deployments. We are also introducing support for the Iceberg REST Catalog API (Preview). The BigLake metastore provides the foundation for interoperability, allowing you to access all your Cloud Storage and BigQuery storage data across multiple runtimes including BigQuery, AlloyDB (preview), and open-source, Iceberg-compatible engines such as Spark and Flink.
New, high-performance Iceberg-native Cloud Storage: We are simplifying lakehouse management with automatic table maintenance (including compaction and garbage collection) and integration with Google Cloud Storage management tools, including auto-class tiering and encryption. Supercharge your lakehouse by combining open formats with BigQuery’s highly scalable, real-time metadata through the general availability (GA) of BigLake tables for Apache Iceberg in BigQuery, enabling high-throughput streaming, auto-reclustering, multi-table transactions (coming soon), and native integration with Vertex AI, so that you can harness the power of Google Cloud AI with your lakehouse.
AI-powered governance across Google Cloud: These BigLake updates are natively supported with Dataplex Universal Catalog, providing unified and fine-grained access controls across all supported engines and enabling end-to-end governance complete with comprehensive lineage, data quality, and discoverability capabilities.

With these changes, we’re evolving BigLake into a comprehensive storage engine designed to help you build open, high-performance, and enterprise-grade lakehouses on Google Cloud using Google Cloud services, open-source, and third-party Iceberg-compatible engines, eliminating trade-offs between open and managed solutions to accelerate your data and AI innovation.

“We wanted teams across the organization to access data in a consistent and secure way — no matter where it lived or what tools they were using. Google’s BigLake was a natural choice. It provides a unified layer to access data and fully managed experience with enterprise capabilities via BigQuery — whether it’s in open table formats like Apache Iceberg or traditional tables — all without the need to move or duplicate data. Metadata quality is essential as we continue to explore potential gen AI use cases. We are utilizing BigLake Metastore and Data Catalog to help maintain high quality metadata.” – Zenul Pomal, Executive Director, CME Group

Open and interoperable

The BigLake metastore is central to BigLake’s interoperability, providing two primary catalog interfaces to connect your data across Cloud Storage and BigQuery Storage:

The Iceberg REST Catalog (Preview) provides a standard REST interface for wider compatibility. This allows Spark users, for instance, to utilize the BigLake metastore as a serverless Iceberg catalog.
The Custom Iceberg Catalog (GA) enables Spark and other open-source engines to work with BigLake tables for Apache Iceberg and interoperate with BigQuery. Its implementation is directly integrated with public Iceberg libraries, removing the need for extra JAR files.

code_block: <ListValue: [StructValue([(‘code’, ‘# Spark session configured to use Iceberg REST Catalog (preview)rnspark = ( SparkSession.builder.appName(“iceberg-rest-catalog”)rn # … other Spark configurations …rn .config(“spark.sql.catalog.iceberg.type”, “rest”)rn.config(“spark.sql.catalog.iceberg.uri”, “https://biglake.googleapis.com/iceberg/v1beta/restcatalog”)rn # … authentication and project configurations …rn .getOrCreate()rn )rnspark.sql(“CREATE NAMESPACE IF NOT EXISTS my_namespace”)rnspark.sql(“CREATE TABLE IF NOT EXISTS my_namespace.my_table (id int, data string) USING iceberg”)rnspark.sql(“INSERT INTO my_namespace.my_table VALUES (1, ‘example’)”)rnspark.sql(“SELECT * FROM my_namespace.my_table”).show()’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2f14378cd0>)])]>

BigLake tables for Apache Iceberg created within BigQuery can be queried by open-source and third party engines using native Apache Iceberg libraries. To enable this, BigLake automatically generates an Apache Iceberg V2 specification-compliant metadata snapshot. This snapshot is registered in the BigLake metastore, allowing open-source engines to query the data through the custom Iceberg catalog integration. Importantly, these metadata snapshots are kept current by automatically refreshing after any table modification, for example DML operations, data loads, streaming updates, or optimizations, helping to ensure that external engines work with the latest data.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2f1436bca0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

A key aspect of this enhanced interoperability is bridging analytical and transactional workloads. This is particularly powerful for AlloyDB users. Now, you can seamlessly consume your analytical BigLake tables for Apache Iceberg directly within AlloyDB (Preview). This enables PostgreSQL users to combine this rich analytical data with up-to-the-second transactional data from AlloyDB, powering AI-driven applications and real-time operational use cases by leveraging advanced AlloyDB features like semantic search, natural language interfaces, and an integrated AI query engine. This unified approach across BigQuery, AlloyDB, and open-source engines unlocks the platform value of your Iceberg data.

BigLake metastore
Supported tables	BigLake tables for Apache Iceberg	BigLake tables for Apache Iceberg in BigQuery	BigQuery tables
Storage	Cloud Storage		BigQuery
Management	Google-managed
Read / Write capabilities (R/W)	OSS engines (R/W) BigQuery (R)	BigQuery (R/W) OSS engines (R/W) using BigQuery Storage API OSS engines (R) using Iceberg libraries	BigQuery (R/W) OSS engines (R/W) using BigQuery Storage API
Use cases	Open lakehouse	Open lakehouse with enterprise-grade storage for advanced analytics, streaming and AI	Enterprise-grade storage for advanced analytics, streaming and AI

New high-performance Iceberg-native storage

BigLake tables for Apache Iceberg deliver an Iceberg-native storage experience directly on Cloud Storage. Whether these tables are created using open-source engines like Spark or directly from BigQuery, they help to extend Cloud Storage management capabilities for your Iceberg data. This simplifies lakehouse management by enabling advanced Cloud Storage features such as auto-class tiering and Customer-Managed Encryption Keys (CMEK). To take full advantage of Cloud Storage management capabilities for your Iceberg data, refer to our best practices guide.

code_block: <ListValue: [StructValue([(‘code’, “–Use Spark to create a BigLake table for Apache Iceberg, registered in BigLake MetastorernCREATE TABLE orders_spark (id BIGINT, item STRING, amount DECIMAL(10,2))rnUSING icebergrnLOCATION ‘gs://my_lake_bucket/orders_spark_data’;rnrnINSERT INTO orders_spark VALUES (1, ‘Laptop’, 1200.00);rn“`bashrn# Optimize GCS storage costs for your Iceberg data (CLI)rngsutil autoclass set on gs://my_lake_bucket”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2f14378a90>)])]>

Beyond the foundational Cloud Storage integration, you can leverage BigLake tables for Apache Iceberg in BigQuery. These tables, now generally available, combine open formats with BigQuery’s highly scalable, real-time metadata. This powerful combination unlocks a suite of advanced capabilities, including:

High-throughput streaming ingestion from various sources (like Spark, Flink, Dataflow, Pub/Sub, and Kafka) via BigQuery’s Write API, scaling to tens of GiB/second with zero-latency reads
Native integration with Vertex AI
Automated table management features like compaction and garbage collection
Performance optimizations such as auto-reclustering
Fine-grained DML and multi-table transactions (coming soon in preview).

This enterprise-ready, fully managed table experience, familiar to BigQuery users, maintains the openness and interoperability of Apache Iceberg to deliver the best of both worlds.

code_block: <ListValue: [StructValue([(‘code’, “– Create BigLake table for Apache Iceberg in BigQuery, stored on GCSrnCREATE OR REPLACE TABLE my_lake_ds.inventory_bq (item_id STRING, qty INT64)rnWITH CONNECTION `us.my_bl_connection`rnOPTIONS (rn storage_uri = ‘gs://my_lake_bucket/inventory_bq_data’,rn table_format = ‘ICEBERG’,rn file_format = ‘PARQUET’rn);rnrnINSERT INTO my_lake_ds.inventory_bq VALUES (‘Laptop’, 50);rnUPDATE my_lake_ds.inventory_bq SET qty = 49 WHERE item_id = ‘Laptop’;rnrn– Perform multi-table transactionsrnBEGIN TRANSACTION;rn — Example: Record a new orderrn INSERT INTO my_lake_ds.orders_bq (id, item, amount) VALUES (2, ‘Mouse’, 25.00);rn — Example: Update inventory for the ordered itemrn UPDATE my_lake_ds.inventory_bq SET qty = qty – 1 WHERE item_id = ‘Mouse’;rnCOMMIT TRANSACTION;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2f14378310>)])]>

AI-powered governance across Google Cloud

BigLake integrates natively with Dataplex Universal Catalog, helping to ensure that governance policies defined centrally in Dataplex are consistently enforced across multiple engines. This integration supports table-level access control for direct Cloud Storage access. Fine-grained access control is automatically available for queries within BigQuery; for open-source engines, it can be achieved using Storage API connectors.

Beyond access management, BigLake’s Dataplex integration significantly enriches overall governance for BigQuery tables and BigLake tables for Apache Iceberg (created via the custom Iceberg catalog). Key capabilities include:

Comprehensive data understanding: Native support for search, discovery, profiling, data quality checks, and end-to-end data lineage within a multi-runtime architecture.
AI-powered exploration: Dataplex simplifies data exploration with AI-powered semantic search. Its knowledge graph also automatically suggests relevant questions using AI generated insights for your BigQuery and Iceberg data, helping to jumpstart analysis.

Crucially, Dataplex’s end-to-end governance benefits apply to your Iceberg data seamlessly through BigLake’s native integration, without requiring separate registration or enablement steps.

What’s next

At Google Cloud Next ‘25 we demonstrated how fine-grained DML, multi-statement transactions, and change data capture support let you simplify your Apache Iceberg lakehouse for advanced data-processing use cases. These features will be launching soon and support for remaining capabilities will continue to roll out in upcoming months. Or, explore BigLake capabilities and watch the latest demos on our webpage or get started with BigLake tables for Apache Iceberg and BigLake metastore using this guide.

GCP – BigLake evolved: Build open, high-performance, enterprise Iceberg-native lakehouses

Open and interoperable

New high-performance Iceberg-native storage

AI-powered governance across Google Cloud

What’s next

Related Posts

AWS – AWS IoT Greengrass v2.15 introduces updates to both nucleus and nucleus lite

GCP – How Jina AI built its 100-billion-token web grounding system with Cloud Run GPUs

GCP – Manipal Hospitals and Google Cloud partner to transform nurse handoffs with GenAI