2025 08 12

GCP – Tutorial: How to use the Gemini Multimodal Live API for QA

The Gemini Multimodal Live API is a powerful tool that allows developers to stream data, such as video and audio, to a generative AI model and receive responses in real-time. Unlike traditional APIs that require a complete data upload before processing can begin, this “live” or “streaming” capability enables a continuous, two-way conversation with the AI, making it possible to analyze events as they unfold.

This real-time interaction unlocks a new class of applications, transforming AI from a static analysis tool into a dynamic, active participant in live workflows. The ability to process and reason over multiple data types (e.g., seeing video, reading text, understanding audio) simultaneously allows for complex, context-aware tasks that were previously impossible to automate.

An example of how the Live API can work is in high-speed manufacturing. This tutorial will show you how to leverage the Gemini API to build an automated quality inspection system that overcomes the common challenges of manual QA.

In this blog, you will learn how to create a system that uses a standard camera feed to:

Analyze products on a production line in real-time.
Identify products by reading barcodes or QR codes.
Detects, classifies, and measures visual defects simultaneously.
Generate structured reports for every defect.
Trigger instant alerts for severe issues.

Prerequisites

A Google Cloud Platform (GCP) account with billing enabled.
Familiarity with basic cloud concepts and services like Cloud Run and BigQuery.
Basic knowledge of Python.
An enabled Gemini API key.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud developer tools’), (‘body’, <wagtail.rich_text.RichText object at 0x7f7c0c3fa310>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

System architecture

The architecture for this system is designed to be serverless, scalable, and resilient, built on the robust foundation of Google Cloud services. It is composed of two primary microservices running on Cloud Run.

Here is the step-by-step workflow:

1. Data ingestion: A standard IP camera positioned over the assembly line streams video of products passing by. This feed is directed to the primary Cloud Run service.

2. Inspection service (Cloud Run): This containerized application is the brain of the operation.

- Gemini Multimodal Live API: The service streams the video data to Gemini. It uses a dynamic prompt that might be pulled from a database, allowing it to apply different inspection criteria for different products on the same line. Gemini processes the stream, executing the multimodal task of reading the product ID and performing the visual inspection in real-time.
- The service then formats the rich output from Gemini—including product ID, defect type, measurements, and location—into a structured JSON object for downstream processing.

3. Alerting & logging service (Cloud Run): This second Cloud Run service ingests the structured JSON data and handles all reporting, logging, and notification tasks.

- Data logging: The service immediately writes the detailed defect record to BigQuery. This creates a historical, queryable database of all quality events, essential for long-term analytics.
- Gemini 2.5 Flash: This model provides an additional layer of reasoning. It can take the raw defect data and summarize it into a concise, human-readable alert message. More advanced logic can be applied, such as correlating recent events:
- Secret Manager: All API keys and credentials for notification services are securely stored and managed here, adhering to best practices for security.
- Notification APIs (e.g., Gmail API, Google Chat API): Based on the severity and rules processed by Gemini, the service calls the appropriate APIs to dispatch alerts to the right people at the right time.

Step-by-step implementation

Step 1: Set up the inspection service.

This service is the core of our system. It’s a containerized application deployed on Cloud Run that receives the video feed and interacts with Gemini. The key is the prompt sent to the Gemini API. It instructs the model to perform multiple tasks on a single video frame. This leverages Gemini’s powerful multimodal capabilities.

Sample prompt:

code_block: <ListValue: [StructValue([(‘code’, ‘You are a quality control inspector for high-end electronics. In this video frame:rn1. Identify the product SKU by decoding the QR code.rn2. Inspect the brushed aluminum casing for any defects, specifically looking for:rn – Scratches longer than 2mmrn – Dents or dingsrn – Discoloration or blemishesrn3. For each defect found, provide its type, location on the casing, and estimated dimensions in millimeters.rn4. Return your findings as a single, structured JSON object.’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bec9db040>)])]>

When a defect is found, Gemini doesn’t just flag it; it provides quantitative data (e.g., length of a scratch). This data can be used to calculate a severity score based on weighted parameters like defect size, type, and location, removing human subjectivity from the process.

Step 2: Configure the alerting & logging service

This second Cloud Run service acts on the data from the inspection service.

Logging: Upon receiving the structured JSON output from the first service, it immediately writes the full record to a BigQuery table. This creates a powerful, queryable history of all quality events.
Intelligent Alerting: For high-severity defects, the service uses a model like Gemini Flash to perform an additional reasoning step. It can summarize the technical data into a clear alert or even correlate events to identify trends.

Sample prompt:

code_block: <ListValue: [StructValue([(‘code’, ‘Given the following defect data, generate a concise, critical alert message for a line supervisor. Correlate this event with the provided history of recent defects.rnrnDefect Data:rn{rn “product_sku”: “SKU-XT789-BLK”,rn “line_id”: “Line 4”,rn “machine_id”: “M-7”,rn “defect_type”: “Housing Crack”,rn …rn}rnrnRecent History:rn[rn {“timestamp”: “2025-07-15T14:20:11Z”, “defect_type”: “Housing Crack”},rn {“timestamp”: “2025-07-15T14:15:32Z”, “defect_type”: “Housing Crack”}rn]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bec9db190>)])]>

Generated alert: CRITICAL ALERT: 3rd ‘Housing Crack’ defect detected on Line 4 in the last 10 minutes. Possible systemic issue with molding machine M-7. Based on the alert’s severity, the service calls the appropriate APIs (e.g., Gmail API, Google Chat API) to send the message to the right stakeholders instantly.

Get started

By leveraging Gemini’s multimodal capabilities on a scalable Google Cloud architecture, you can build a powerful quality intelligence platform. Check out these links to get started:

Read the full Live API Capabilities guide for key capabilities and configurations; including Voice Activity Detection and native audio features.
Read the Tool use guide to learn how to integrate Live API with tools and function calling.
Read the Session management guide for managing long running conversations.

GCP – Tutorial: How to use the Gemini Multimodal Live API for QA

Prerequisites

System architecture

Step-by-step implementation

Get started

Related Posts

AWS – Announcing larger managed database bundles for Amazon Lightsail

AWS – Amazon EMR Serverless adds support for job run level cost allocation

GCP – How Hackensack Meridian Health de-risked network migration using VPC Flow Logs