GCP – Power up your data analysis: The Data Science Agent now supports BigQuery ML, DataFrames, and Spark
We recently announced AI-first Colab Enterprise notebook experience in BigQuery and Vertex AI to help you simplify and transform your data science and analytics workflows. Colab Enterprise notebooks come with a built-in Data Science Agent to accelerate your data science development with agentic capabilities that facilitate data exploration, transformation, and machine learning modeling. With nothing but a simple prompt, the agent generates a detailed plan for your workflows – from data loading and cleaning to model training and evaluation.
Today, we’re introducing powerful new features in the Data Science Agent to further simplify and scale your analytical journeys, especially with large and open-format datasets.
Generate BigQuery ML, BigQuery DataFrames, & Spark
You can now harness the power of BigQuery Machine Learning (ML), BigQuery DataFrames (BigFrames), and Spark for large-scale data processing directly within the Data Science Agent. BigQuery ML and BigQuery DataFrames allow you to scale up data transformation, model training, and inference by running them directly on BigQuery. And with Serverless for Apache Spark, you can perform distributed data processing on large datasets, allowing you to work with data that is too large to fit into memory on a single machine.
To invoke these tools, simply include the following keywords in your prompt:
- For BigQuery ML: use “BigQuery ML”, “BQML”, or “SQL”
- For BigQuery DataFrames: specify “BigQuery DataFrames” or “BigFrames”
- For PySpark: include “Spark” or “PySpark”
In the future, the Data Science Agent will be able to pick the relevant framework for your use case — e.g., based on the size of your selected datasets or the contents of your Notebook.
In the meantime, here are some sample prompts to get you started:
-
“Build a high-quality forecasting model using BigQuery SQL on
project_id.dataset_id.table_id
to predict stock needs. Present the model’s evaluation metrics and visualize the forecast with a 95% confidence interval.” -
“Using BigQuery DataFrames, train and evaluate a gradient boosted tree model to predict housing prices from the table
project_id.dataset_id.table_id
. Before training, one-hot encode theneighborhood
column.” -
“I want to group similar customers together for targeted marketing campaigns, but first I need to do dimensionality reduction using a PCA model. Use Spark to do this on table
project_id.dataset_id.table_id
.”
Limitation: the Data Science Agent currently generates Spark 4.0 code. The agent can help you upgrade your code to Spark 4.0. However, if you need to use an earlier version of Spark, we recommend not using the Data Science Agent for PySpark for now.
Add data using context and @ mentions
We are also making it easier to bring your data into the conversation. The Data Science Agent can now automatically retrieve metadata and tables for your BigQuery tables. This means you can describe a table directly in your prompt and let the Data Science Agent search for the most relevant table on your behalf.
Further, you can now search for BigQuery tables within your current project using an @ mention. This familiar, industry-standard mechanism allows you to build your prompt with the relevant context — without your hands ever leaving the keyboard.
Limitation: The @ mention currently only searches for BigQuery tables in your current project. For broader searches across projects or to add files from session storage and local uploads, please continue to use the “+” button.
Try the Data Science Agent today
Under the hood, we’ve also optimized the Data Science Agent so it will start up faster after your first message. Less waiting, faster insights. Similar improvements for Colab Enterprise in Vertex AI are coming soon.
We’re committed to evolving the AI-powered data science experience and can’t wait to show you what we’re building next. To get started, check out the resources below:
-
Access:
-
BigQuery: Navigate to Google Cloud Console > BigQuery > Notebook
-
Vertex AI: Navigate to Google Cloud Console > Vertex AI > Colab Enterprise (Note: BigQuery ML, BigQuery Dataframes, and Spark improvements mentioned here are not yet available in Vertex AI – but are coming soon.)
Documentation:
Read More for the details.