GCP – Understanding new Gemini model inference, tuning, grounding and safety settings in BigQuery
The proliferation of digital devices and platforms such as social media, mobile devices, and IoT sensors has led to an exponential growth in unstructured data such as images, audio files, videos, and documents. To help organizations unlock valuable insights from your data, BigQuery, Google’s AI-ready cloud data platform, is integrated with Vertex AI, Google’s fully-managed AI development platform for building gen AI apps, enabling you to leverage a variety of generative AI models (e.g., Gemini) and AI services (including Document AI and Translation AI) to work with unstructured data in BigQuery object tables.
BigQuery already lets you analyze your data using a variety of powerful large language models (LLMs) hosted in Vertex AI, including Gemini 1.0 Pro, and Gemini 1.0 Pro Vision. These models excel at tasks like text summarization and sentiment analysis, often requiring only prompt engineering.
And for scenarios where prompt engineering alone isn’t sufficient, BigQuery allows you to fine-tune the text-bison model using LoRA techniques. This additional customization is valuable when the desired model behavior is difficult to define concisely in a prompt, or when prompts don’t consistently produce the expected results. Fine-tuning enables the model to learn specific response styles, adopt new behaviors (like answering as a specific persona), and stay up-to-date with the latest information.
Recently, we added support for the latest Gemini models to BigQuery, as well as safety enhancements and grounding support:
The ML.GENERATE_TEXT SQL function now supports Gemini 1.5 Pro and Gemini 1.5 Flash foundation models. BigQuery users could already leverage Gemini 1.0 Pro for various natural language processing (NLP) tasks (e.g., advanced text generation and sentiment analysis) on text, and use Gemini 1.0 Pro Vision for visual captioning, visual Q&A, and other vision-language tasks on images and videos. Gemini 1.5, Google’s next-generation multimodal foundation model, enables users to perform not only the aforementioned NLP and vision tasks with enhanced quality but also analyze audio and PDF files (e.g., audio transcription and PDF summarization) — all from a single model.
We’re enhancing the ML.GENERATE_TEXT SQL function to support grounding with Google search and customizable safety settings for responsible AI (RAI) responses. Grounding allows the model to incorporate additional information from the internet to generate more accurate and factual responses. Safety settings enable users to define blocking thresholds for different harm categories (hate speech, dangerous content, harassment, etc.), ensuring the model filters out content that violates these settings.
We’re extending the CREATE MODEL DDL and ML.EVALUATE SQL function to enable Gemini 1.0 model tuning and evaluation. BigQuery users can already tune and evaluate text-bison PaLM models. Now, they can also fine-tune and evaluate Gemini 1.0 Pro models, further tailoring their AI capabilities.
In the following sections, we will look deeper into these new features.
BigQuery ML and Gemini 1.5
To use Gemini 1.5 Pro in BigQuery, first create the remote model that represents a hosted Vertex AI Gemini endpoint. This step typically takes just a few seconds. Once the model is created, use the model to generate text, combining data directly with your BigQuery tables.
<ListValue: [StructValue([(‘code’, “CREATE MODEL `mydataset.gemini_1_5_pro`rnREMOTE WITH CONNECTION `us.bqml_llm_connection`rnOPTIONS(endpoint = ‘gemini-1.5-pro’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5fac5471c0>)])]>
With Gemini 1.5, the ML.GENERATE_TEXT() function can accept a BigQuery managed table as an input, automatically appending your PROMPT statement to each database record to customize the prompt for each row. The “temperature” prompt parameter controls the randomness of the generated responses.
<ListValue: [StructValue([(‘code’, ‘SELECT *rnFROMrn ML.GENERATE_TEXT(rn MODEL mydataset.gemini_1_5_pro,rn (rn SELECT CONCAT(rn ‘Create a descriptive paragraph of no more than 25 words for a product with in a department named ‘, department,rn ‘, category named “‘, category, ‘”‘,rn ‘and the following name: ‘, namern )rn AS promptrn FROM mydataset.my_tablern ),rn STRUCT(0.8 AS temperature));’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5fac5478b0>)])]>
With Gemini 1.5 models, the ML.GENERATE_TEXT() function can now also process object tables as input, allowing you to process unstructured data such as images, videos, audio files, and documents. When using Object tables, the prompt is a single string placed in the STRUCT option. This prompt is then applied individually to each object in the table, row by row.
<ListValue: [StructValue([(‘code’, ‘SELECTrn uri,rn ml_generate_text_llm_resultrnFROMrn ML.GENERATE_TEXT( MODEL `mydataset.gemini_1p5_pro`,rn (rn SELECTrn *rn FROMrn `mydataset.my_audio_files`rn WHERErn content_type=”audio/mp3″ ),rn STRUCT(TRUE AS flatten_json_output,rn 1024 AS max_output_tokens,rn “Please transcribe the audio, then translate it to English.” AS prompt))’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5fac547040>)])]>
Let’s examine a sample result. Since the generated text is extensive, we’ll preview it in JSON format directly from the BigQuery UI to better showcase the long output text. You’ll observe that, as instructed by the prompt, the model first transcribes the audio into Japanese and then translates it into English.
BQML and LLM grounding and safety settings
<ListValue: [StructValue([(‘code’, ‘SELECTrn ml_generate_text_llm_result,rn ml_generate_text_grounding_result,rn promptrnFROMrn ml.generate_text(MODEL `mydataset.gemini_1p5_pro`,rn (rn SELECTrn concat(questions, “. Please only have answers and add no additional information.”) as promptrn FROMrn `mydataset.history_questions`rn LIMITrn 10),rn STRUCT(TRUE AS flatten_json_output,rn 8192 AS max_output_tokens,rn TRUE AS ground_with_google_search,rn [STRUCT(‘HARM_CATEGORY_HATE_SPEECH’ AS category,rn ‘BLOCK_LOW_AND_ABOVE’ AS threshold),rn STRUCT(‘HARM_CATEGORY_DANGEROUS_CONTENT’ AS category,rn ‘BLOCK_MEDIUM_AND_ABOVE’ AS threshold)] AS safety_settings ))’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5fac5479d0>)])]>
Let’s examine an example to illustrate the grounding feature. When grounding is activated, users gain access to detailed grounding results from Google Search, including the various search entry points and the confidence scores associated with each search.
Let’s see how safety settings will affect the LLM output. As an example, we take the prompt “Who was the French king executed in the French revolution? Please only have answers and add no additional information.” If we have a safety setting BLOCK_LOW_AND_ABOVE for categories HARM_CATEGORY_HARASSMENT and HARM_CATEGORY_DANGEROUS_CONTENT, the result is blocked.
Sometimes, we do want the LLM to output the results and check the contents ourselves. By changing the safety setting to BLOCK_MEDIUM_AND_HIGH for category HARM_CATEGORY_HARASSMENT, the LLM does less filtering and provides a result.
BQML and Gemini 1.0 LLM Tuning
BQML now supports LORA fine-tuning for Gemini 1.0 models. To fine-tune the model, designate the gemini-1.0-pro-002 endpoint as your base model and provide supplemental training data with a “prompt” and “label” column. In the following example, we demonstrate how to tackle a text classification problem. We use a medical transcription dataset and instruct our model to categorize each transcript into one of 17 distinct categories, such as”Chiropractic,” “Dentistry,” or “Dermatology.”
Create a fine-tuned model as follows:
<ListValue: [StructValue([(‘code’, ‘– Fine tune a textbison modelrnrnCREATE OR REPLACE MODELrn `bqml_tutorial.text_bison_001_medical_transcript_finetuned` REMOTErnWITH CONNECTION `LOCATION. ConnectionID`rnOPTIONS (endpoint=”gemini-1.0-pro-002″,rn max_iterations=300,rn data_split_method=”no_split”) ASrnSELECTrn CONCAT(“Please assign a label for the given medical transcript from among these labels [Autopsy, Bariatrics, Chiropractic, Dentistry, Dermatology,. “, input_text) AS prompt,rn output_text AS labelrnFROMrn `bqml_tutorial.medical_transcript_train`’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5fac547d90>)])]>
Once the model is trained, you can leverage ML.EVALUATE to assess its performance and ML.GENERATE_TEXT for model inference. For a comprehensive example, please refer to this tutorial: https://cloud.google.com/bigquery/docs/tune-evaluate
Join us for the future of data and generative AI
To delve deeper into these exciting new features, we encourage you to explore the documentation. For a hands-on demonstration of how to build a comprehensive data analytics and AI application directly within BigQuery that harnesses the power of Gemini, check out this video.
Additionally, this recent webcast showcases the latest advancements to BigQuery and demonstrates how you can utilize BigQuery ML to effortlessly create and deploy models using simple SQL.
Have feedback on these new features or have additional feature requests? Let us know at bqml-feedback@google.com.
Googlers Tianxiang Gao, Eric Hao, Jasper Xu and Manoj Gunti contributed to this blog post. Many Googlers contributed to make these features a reality.
Read More for the details.