GCP – What’s new with BigQuery AI and ML?
At Next ’25, we introduced several new innovations within BigQuery, the autonomous data to AI platform. BigQuery ML provides a full range of AI and ML capabilities, enabling you to easily build generative AI and predictive ML applications with BigQuery. The new AI and ML capabilities from BigQuery ML include:
-
a new state-of-the-art pre-trained forecasting model (TimesFM) which drastically simplifies forecasting problems
-
support for generating or extracting structured data with large language models (LLMs)
-
a set of new row-wise inference functions enabling you to mix gen AI processing with standard SQL
-
expanded model choice with Gemini and OSS models
-
the general availability of the Contribution Analysis feature, useful for explaining changes in your business metrics
Let us explore these new capabilities.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6ec4519910>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
1. TimesFM forecasting model in BigQuery
Accurate time series forecasting is essential for many business scenarios such as planning, supply chain management, and resource allocation. BigQuery now embeds TimesFM, a state-of-the-art (SOTA) pre-trained model from Google Research, enabling powerful forecasting via the simple AI.FORECAST
function. Trained on over 100 billion real-world time-points, TimesFM provides impressive zero-shot forecasting accuracy across various real world domains and at different granularities without requiring you to train or tune on your data.
Key benefits of TimesFM in BigQuery include:
-
Managed and scalable: A fully managed, highly scalable forecasting engine within BigQuery.
-
Easy forecasting: Generate forecasts for one or millions of time series in a single query – no model training required.
Here’s a basic example of creating a forecast using the new AI.FORECAST
function with TimesFM:
SQL
SELECT * FROM AI.FORECAST(
TABLE dataset.table,
data_col => "data",
timestamp_col => "timestamp",
model => "TimesFM 2.0",
horizon => 30
)
This query forecasts the “data” column for the next 30 time units, using “timestamp” as the time identifier. Please see the documentation for more details.
2. Structured data extraction and generation with LLMs
Extracting structured information consistently from unstructured data such as customer reviews, emails, logs etc. can be complex. BigQuery’s new AI.GENERATE_TABLE
function simplifies structured data extraction/generation using the constrained decoding capabilities of LLMs. This function takes a model, a table of input data and an output_schema
as inputs and outputs a table whose schema is determined by the output_schema
parameter.
Here’s how you can use AI.GENERATE_TABLE
:
SQL
SELECT * FROM AI.GENERATE_TABLE(
MODEL project_id.dataset.model,
(SELECT medical_transcripts as prompt from table),
STRUCT("age INT64, medications ARRAY<STRING>" AS output_schema)
)
In this example, the output table has ‘age’ and ‘medications’ columns — no complex parsing required. The output is written as a BigQuery temporary table. To materialize the results to a permanent table, the above query can be used in a DDL statement:
CREATE TABLE project_id.dataset.my_structured_table
AS <AI.GENERATE_TABLE subquery>
Please see the documentation for more details.
3. Row-wise (Scalar) LLM functions
The first wave of BigQuery’s LLM functions focused on table-valued functions (TVFs) that output entire tables. We are now introducing row-wise AI functions for LLM inference for more flexible and expressive data manipulation and analysis. These scalar functions enhance the usability of LLMs within BigQuery, as they can be used anywhere a value is needed, such as in SELECT
, WHERE
, JOIN
, and GROUP BY
clauses. Let’s go though some of the capabilities we are adding: a) Basic text generation with AI.GENERATE
First, let’s see how the new AI.GENERATE()
can be used for convenient row-wise LLM inference:
SELECT
city,
AI.GENERATE(
('Give a short, one sentence description of ', city),
connection_id => 'us.test_connection',
endpoint => 'gemini-2.0-flash').result
FROM mydataset.cities;
b) Structured output with AI.GENERATE
In addition, the structured output generation capabilities introduced above also extend to row-wise AI functions. In this example, the query generates state capitals for a list of states, using the output_schema argument to set two custom fields in the output struct — state and capital:
SQL
SELECT
state,
AI.GENERATE(
('What is the capital of ', state, '?'),
connection_id => 'us.example_connection',
endpoint => 'gemini-2.0-flash',
output_schema => 'state STRING, capital STRING').capital
FROM mydataset.states;
c) Type-specific functions (e.g., AI.GENERATE_BOOL)
For common tasks requiring specific data types like boolean, integer, or float, BigQuery now offers simple, type-specific functions. For instance, you can use AI.GENERATE_BOOL
for classification or validation tasks:
SQL
SELECT city.name, AI.GENERATE_BOOL(
("Is", city.name, "in the state of WA?"),
connection_id => "us.example_connection",
endpoint => 'gemini-2.0-flash').result
FROM city
Additional type-specific functions, namely AI.GENERATE_INT
and AI.GENERATE_DOUBLE
, are also available for generating integer and floating-point results. Please see the documentation for more details.
4. Expanded model choice: Gemini, OSS and third-party
BigQuery ML allows you to use LLMs to perform tasks such as entity extraction, sentiment analysis, translation, text generation, and more on your data using familiar SQL syntax. In addition to first-party Gemini models, BigQuery supports inference with open-source and third-party models, which comes in two flavors:
-
Customer-managed endpoints for open source models (previously announced): You can host any open source model of your choice on a Vertex AI Model Garden endpoint and then use it from BigQuery.
-
Model as a service integrations: Access fully managed model endpoints directly through BigQuery. This already included models like Anthropic’s Claude, and we are excited to announce newly added support for Llama and Mistral models, further expanding model choice available to developers.
5. Contribution analysis now generally available
Businesses constantly need to answer questions like “Why did our sales drop last month?” or ” For what user, device, demographics combination was our marketing campaign most effective?” Answering these “why” questions accurately is vital, but often involves complex manual analysis. The BigQuery contribution analysis feature automates this analysis and helps you pinpoint the key factors (or combinations of factors) responsible for the most significant changes in a metric between the control and test groups you define.
Now generally available, the BigQuery ML contribution analysis release includes enhancements focused on improved interpretability and performance, including:
-
A new summable by category metric to analyze the sum of a numerical measure of interest normalized by a categorical variable
-
Top-K Insights by Apriori Support option to automatically fetch k insights with the largest segment size
-
A redundant insight pruning option, which improves result readability by returning only unique insights
Let’s say you want to understand what drove changes in the average sales per user across various vendors and payment types between the control and test data. To answer this with a contribution analysis model, you tell BigQuery which factors (dimensions) to investigate (dimension_id_cols
), what metric you care about (contribution_metric
), and which column identifies your test/control groups (is_test_col
).
SQL
-- Define the contribution analysis task
CREATE MODEL bqml_tutorial.contribution_analysis_model
OPTIONS (
model_type = 'CONTRIBUTION_ANALYSIS',
dimension_id_cols = ['vendor', 'month', 'payment_type'],
contribution_metric = 'sum(sales)/count(distinct user_id)',
is_test_col = 'is_test_col',
top_k_insights_by_apriori_support = 25,
pruning_method = 'PRUNE_REDUNDANT_INSIGHTS'
) AS
SELECT * FROM dataset.input_data;
Once the model is created, you can use a SQL query like the following to generate insights:
SELECT * FROM ML.GET_INSIGHTS (MODEL bqml_tutorial.contribution_analysis_model);
BigQuery returns a prioritized list showing which combinations of factors (e.g., “Users paying via Amex Credit Card from Vendor”) had the most significant impact on the average sales per user between your control and test groups.
Bring AI into your data
The latest BigQuery ML updates bring powerful AI/ML capabilities directly into your data workflows. Between forecasting with TimesFM, automated root-cause analysis with contribution analysis, flexible row-wise LLM functions, streamlined structured data generation, and expanded model choice, you can move faster from data to insights and impactful outcomes.
You can also view the Next ‘25 breakout session, including a demo showcasing these capabilities.
Read More for the details.