GCP – Hands on with Gemini in BigQuery: Decoding sentiment in customer reviews
Sentiment analysis is a powerful tool that uses natural language processing (NLP) to uncover the underlying emotions (positive, negative, neutral) within text such as customer reviews. This analysis can offer valuable insight into how customers perceive your products, services, and brand overall. Furthermore, by utilizing techniques like topic modeling and keyword extraction, you can identify recurring themes. These themes could highlight specific product features, aspects of customer service, or general pain points — providing a roadmap for improvement and a way to address customer needs more effectively.
In BigQuery you can use ML.GENERATE_TEXT function that lets you directly utilize powerful large language models (LLMs) from Google’s Vertex AI within your SQL queries to analyze text in a BigQuery table. This means you can perform sophisticated text generation and analysis tasks on data stored in your BigQuery tables without needing to move data or write complex code outside the BigQuery environment. ML.GENERATE_TEXT function can also be used to generate text that describes visual content using a remote model based on a gemini-pro-vision multimodal model. Some key benefits are:
Ease of use: BigQuery’s SQL integration lets you tap into advanced language model capabilities without requiring separate machine learning pipelines or specialized coding expertise.
Scalability: BigQuery’s strengths in handling massive datasets pair nicely with LLMs to process customer reviews or other text sources at scale.
Insight generation: ML.GENERATE_TEXT assists in tasks such as:
Sentiment analysis: Determine the overall emotional tone (positive, negative, neutral) in customer feedback.
Theme extraction: Identify common topics and trends within reviews
Summarization: Condense lengthy reviews into key points.
Text completion & generation: Get help with responses, ad copy, or creative writing based on existing reviews.
ML.GENERATE_TEXT in action
Now, let’s take a look at how to use ML.GENERATE_TEXT, taking a hypothetical rideshare company as an example.
Setup instructions:
Before starting, choose your GCP project, link a billing account, and enable the necessary API; full instructions here.
Create a cloud resource connection and get the connection’s service account; full guide here.
Grant access to the service account by following the steps here.
Load data. To oad from public storage account, using the following command:
Please replace ‘[PROJECT_ID.DATASET_ID]’ with your project_id, and enter a name for your dataset
The command will create a table named ‘customer_review’ in your dataset
<ListValue: [StructValue([(‘code’, “CREATE SCHEMA IF NOT EXISTS `[PROJECT_ID.DATASET_ID]` OPTIONS (location=’us’);rnrnrnLOAD DATA OVERWRITE `[PROJECT_ID.DATASET_ID].customer_review`rnFROM FILES ( format = ‘PARQUET’,rnuris = [‘gs://data-analytics-golden-demo/rideshare-lakehouse-raw-bucket/rideshare_llm_export/v1/raw_zone/customer_review/000000000000.parquet’]);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e803a152700>)])]>
Sentiment analysis
Let’s walk through an example of performing sentiment analysis.
1. Create a model
Create a remote model in BigQuery that utilizes a Vertex AI foundation model.
Syntax:
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODELrn`PROJECT_ID.DATASET_ID.MODEL_NAME`rnREMOTE WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`rnOPTIONS (ENDPOINT = ‘ENDPOINT’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e803a152220>)])]>
Code example:
Please replace ‘[PROJECT_ID.DATASET_ID.MODEL_NAME]’ with your project_id, dataset_id and model name
Please replace ‘[PROJECT_ID.REGION.CONNECTION_ID]’ with your project_id, region and connection_id
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`rnREMOTE WITH CONNECTION `[PROJECT_ID.REGION.CONNECTION_ID]`rnOPTIONS (ENDPOINT = ‘gemini-pro’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e803a152460>)])]>
2. Generate text
With just a few lines of SQL, you can analyze text or visual content in your BigQuery table using that model and the ML.GENERATE_TEXT function.
ML.GENERATE_TEXT syntax differs depending on the Vertex AI model that your remote model targets. Read the documentation to understand all the parameters of the ML.GENERATE_TEXT function.
Syntax:
<ListValue: [StructValue([(‘code’, ‘ML.GENERATE_TEXT(rnMODEL project_id.dataset.model,rn{ TABLE project_id.dataset.table | (query_statement) },rnSTRUCT(rn [max_output_tokens AS max_output_tokens]rn [, top_k AS top_k]rn [, top_p AS top_p]rn [, temperature AS temperature]rn [, flatten_json_output AS flatten_json_output]rn [, stop_sequences AS stop_sequences])rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e803a1521f0>)])]>
Code example:
Please replace ‘[PROJECT_ID.DATASET_ID]’ with your project_id and dataset_id
Please replace ‘[PROJECT_ID.REGION.CONNECTION_ID]’ with your project_id, region and connection_id
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE TABLE `[PROJECT_ID.DATASET_ID].review_sentiment_analysis` ASrnWITH PROMPT AS (rn SELECT CONCAT (‘For the given review classify the sentiment as Positive, Neutral or Negative.’,rnrnrn’\n input: The driver was able to make some small talk, but he didn\’t go overboard. I liked that he was friendly and chatty, but he also knew when to leave me alone. The trunk fit my belongings, and the car was clean and comfortable. Overall, it was a good ride.’,rnrnrn’\n output: \n Positive – Trunk fit my belongings, friendly, chatty’,rnrnrn’\n input: I took a rideshare last night and it was an okay experience. The car was adequately clean, but it was a bit warm for my liking. The driver was able to make some small talk, but I wasn\’t really in the mood to talk. Overall, it was a fine ride.’,rnrnrn’\n output: Neutral – Clean, A bit warm, fun ride’,rn ‘\n input: ‘, customer_review_text,rn’\n output: ‘rn) AS prompt, customer_id, rncustomer_review_textrn FROM `[PROJECT_ID.DATASET_ID].customer_review`rn LIMIT 20rn),rnREVIEW_RESPONSE_GENERATION AS (rn SELECT *rn FROMrn ML.GENERATE_TEXT(rn MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`,rn (SELECT * FROM PROMPT),rn STRUCT(rn 200 AS max_output_tokens,rn 0.5 AS temperature,rn 40 AS top_k,rn 1.0 AS top_p,rn TRUE AS flatten_json_output))rn)rnSELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM REVIEW_RESPONSE_GENERATION;rnrnrnSELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM `[PROJECT_ID.DATASET_ID].review_sentiment_analysis`;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e803a152190>)])]>
3. Result:
In the prompt, we provided the model with context and two examples, clearly demonstrating our desired output format. You can validate that the outputs generated are aligned to the examples we provided to the model as part of a few-shot prompting approach.
In few-shot prompting, including a few examples of reviews with their corresponding sentiment labels is crucial for guiding the model’s behavior. To help ensure the model’s effectiveness in various situations, it’s essential to offer a sufficient number of well-structured examples covering diverse review scenarios.
Then, by performing sentiment analysis on the customer reviews, we can have insights into their preferences and pain points regarding our products. By identifying key themes in the reviews, we can effectively communicate valuable feedback to the product team, enabling them to make informed, data-driven decisions and improvements.
In the table above, you can see the results of ML.GENERATE_TEXT including the input table along with the following columns:
ml_generate_text_result: This is the JSON response and the generated text is in the text element.
ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
ml_generate_text_rai_result: a STRING value that contains the safety attributes. This column is returned when flatten_json_output is TRUE.
ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Extracting themes
Now, let’s extract the theme from the reviews using the model we created above:
Please replace ‘[PROJECT_ID.DATASET_ID]’ with your project_id and dataset_id
<ListValue: [StructValue([(‘code’, ‘CREATE OR REPLACE TABLE `[PROJECT_ID.DATASET_ID].extract_themes` ASrnrnrnWITH PROMPT AS (rn SELECT CONCAT(rn”””rnClassify the text as one or more of the following categories and return in the below json format.rn- “trunk space small”rn- “trunk space large”rn- “driving too fast”rn- “driving too slow”rn- “clean car”rn- “dirty car”rn- “car too hot”rn- “car too cold”rn- “driver likes conversation”rn- “driver likes no conversation”rn- “driver likes music”rn- “driver likes no music”rn- “distracted driver”rnrnrnJSON format: [ “value” ]rnSample JSON Response: [ “dirty car”, “car too cold” ]rnrnrnText:rn”””, customer_review_text) AS prompt, customer_id, customer_review_textrn FROM `[PROJECT_ID.DATASET_ID].customer_review`rn LIMIT 10rn),rnEXTRACT_THEMES AS (rn SELECT *rn FROMrn ML.GENERATE_TEXT(rn MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`,rn (SELECT * FROM PROMPT),rn STRUCT(rn 1024 AS max_output_tokens,rn 0 AS temperature,rn 1 AS top_k,rn 0 AS top_p,rn TRUE AS flatten_json_output))rn)rnSELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM EXTRACT_THEMES;rnrnrnSELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM `[PROJECT_ID.DATASET_ID].extract_themes`;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e803a152250>)])]>
Result:
Using ML.GENERATE_TEXT function and SQL from the BigQuery console, we can efficiently identify key themes within our customer reviews. This gives us deeper insight into customer perceptions and can provide actionable data to improve our products.
ML.GENERATE_TEXT is tightly integrated with the Gemini model, which is designed for higher input/output scale and better result quality across a wide range of tasks, like text processing including classification and summarization, sentiment analysis, and code generation.
Analyzing the themes
Now that we’ve identified the themes in our reviews, let’s dive deeper with data canvas in BigQuery, an AI-centric experience to reimagine data analytics that we introduced at Next ‘24. BigQuery data canvas lets you discover, transform, query, and visualize data using natural language. It also provides a graphical interface that lets you work with data sources, queries, and visualizations in a directed acyclic graph (DAG), giving you a view of your analysis workflow that maps to your mental model.
Given that our themes are stored in the ‘extract_themes’ table, let’s create a data canvas to analyze them further. Click the down arrow next to the ‘+’ icon and choose ‘Create Data canvas’
You will be brought to a screen where you can search for the ‘extract themes’ table and get started.
Once you’ve selected a table, you’ll see it on a canvas where you can query it directly or join it with other tables.
To create a bar chart of themes, click on the ‘Query’ button and type ‘bar chart for the most common theme and remove the null values and limit the result to top 10 values’. The AI understands your request and automatically generates the correct query, even though the ‘themes’ aren’t in a dedicated column — the AI recognizes that the themes are found within the ‘ml_genertae_text_llm_result’ column. Finally, click ‘Run’ to see the query result.
Your theme data is ready! Click ‘Visualize’ to instantly see your bar chart.
You now have a bar chart of the extracted themes from your customer reviews, plus automatically generated helpful insights based on the data and explanations of your findings.
In short, BigQuery data canvas lets you analyze your data from start to finish with simple natural language commands: Discover relevant data, merge it with customer information, find key insights, collaborate with your team members, and create reports — all in one place. Plus, you can save these results or combine them with other data for further analysis or extract it into a notebook.
Curious to learn more? The official ML.GENERATE_TEXT documentation has all the details.
Read More for the details.