GCP – Introducing data products in Dataplex Universal Catalog for curated data and context
Many organizations are overwhelmed with large amounts of fragmented data and are unclear on how it impacts their business objectives. This creates a critical disconnect: data consumers — analysts or data scientists that need the data to generate insights — can’t easily discover, access, or trust the data they need, while data producers — the teams who own these data assets — can’t enable consumers with self-service access to that data. Without easy access to reliable, context-rich data, organizations can struggle to adopt AI and agent technologies.
To help organizations overcome these challenges, we are introducing data products in Dataplex Universal Catalog, Google Cloud’s unified, intelligent data to AI governance solution. A data product is a curated, ready-to-use package of data assets, documentation, and governance controls, all purposefully assembled to solve a specific business problem. More than just data, data products can help you demonstrate business value and fuel AI innovation within your organization. Data products are available now in preview.
Understanding data products
At its core, a data product is a logical unit of distribution that models how a group of assets address a business problem. We think of it as a product on a shelf, complete with a label, instructions, and quality guarantees, but for data. It abstracts raw data assets into trusted, discoverable, and valuable resources for the entire organization. This allows data teams to more efficiently:
-
Define expectations: Instead of answering the same ad-hoc questions again and again, data producers can catalog information about data quality, freshness, and intended use cases directly within the data product’s documentation and contracts.
-
Reduce management toil: Data products allow you to group assets logically by use case. This simplifies access management, reducing the manual effort required to manage individual assets.
-
Demonstrate value: By linking data assets directly to the business use cases they serve, data teams can clearly demonstrate the value they create and justify their budget spend based on impact, not just history.
How to use data products
At a high level, data products deliver the following foundational capabilities:
-
Design for use case: Identify the business problem and model the data product to solve for a use case.
-
Establish ownership: Define the ownership of the data product to ensure accountability.
-
Democratize context: Document the problems the product addresses with usage examples and expectations.
-
Define contracts: Provide trust to consumers and communicate contractual guarantees.
-
Govern assets: Administer who can view the product and regulate access to the data assets.
-
Enable discovery: Help data consumers easily discover and request access for data products.
-
Evolve offerings: Iterate and evolve the product to address consumer needs.
What does that mean in practice? Imagine you’re a data producer for a marketing team. Data consumers such as data scientists in your organization consistently need to analyze quarterly campaign performance to recommend future adjustments. Here’s how you can empower them with a “Marketing campaign analysis” data product:
-
Create the data product: Start by creating a new data product named “Marketing campaign analysis.” You assign yourself as the owner and provide contact details.
-
Curate cross-project assets: You then add the relevant assets needed for the analysis. For example you can include BigQuery tables and views:
ad_spend_daily,customer_conversions,website_traffic_logs. -
Define roles and permissions: To govern and manage access on the assets in your data product, create a
data_scientistgroup and grant this groupvieweraccess on all of the assets. -
Establish a data contract: To build trust, specify the refresh frequency of data products and communicate the contract terms.
- Add rich documentation: Finally, add a detailed description explaining that this data product is the single source of truth for campaign analysis. You include examples of SQL queries and links to additional artifacts.
Image 1: Data Producers create data products by packaging assets, permissions, contract and context
Now, the data scientist can simply search for “campaign analysis,” find this data product, request access, and immediately have everything they need to do their job, confident in the data’s quality and origin.
Image 2: Data consumers discover data products, understand context and request access to them
“With Google Cloud’s Data Products, we are advancing our data sharing capabilities at Virgin Media O2. This evolution allows us to treat our data as a true product, complete with the rich context of data contracts, clear data lineage, and the enhanced metadata. This streamlined workflow empowers our teams to get the data they need faster, accelerating our data-driven decision-making.” – Jonathan Ford, Director, Data Applications, Virgin Media O2
How data products fuel AI and agents
We believe data products are foundational to the adoption of AI and agentic technologies, helping organizations go from managing individual data assets to delivering value-driven logical units. Data products fuel AI and agentic innovation by:
- Providing high-quality, business-ready data: Because data products are pre-curated with assets that solve a specific business problem, they can help ensure that AI agents be trained and operate on data that has already been cleansed, organized, and aligned with business objectives.
- Grounding agents with rich context for accurate insights: A key challenge in AI is grounding, the ability to connect the model’s output to verifiable sources of information, reducing the likelihood of generating false or misleading content. Data producers can enrich data products with a wealth of contextual information, including comprehensive documentation, contracts and other metadata, providing a solid foundation for AI agents to base their responses on.
- Powering conversational AI and actionable insights: By combining high-quality data and rich context, data products can fuel conversational AI. When a user interacts with an AI agent that leverages a data product, the agent can provide more relevant and nuanced responses.
Using the same example of “Marketing campaign analysis” above, a data product that includes detailed documentation with business context, sample queries and schema definition for ad_spend_daily and customer_conversions, allows an AI agent to “read” this context and provide more accurate responses.
Ready to get started? To learn more about data products in Dataplex Universal Catalog, check out the documentation here.
Read More for the details.
