2025 05 02

GCP – Create chatbots that speak different languages with Gemini, Gemma, Translation LLM, and Model Context Protocol

Your customers might not all speak the same language. If you operate internationally or serve a diverse customer base, you need your chatbot to meet them where they are – whether they’re searching for something in Spanish or Japanese. If you want to give your customers multilingual support with chatbots, you’ll need to orchestrate multiple AI models to handle diverse languages and technical complexities intelligently and efficiently. Customers expect quick, accurate answers in their language, from simple requests to complex troubleshooting.

To get there, developers need a modern architecture that can leverage specialized AI models – such as Gemma and Gemini – and a standardized communication layer so your LLM models can speak the same language, too. Model Context Protocol, or MCP, is a standardized way for AI systems to interact with external data sources and tools. It allows AI agents to access information and execute actions outside their own models, making them more capable and versatile. Let’s explore how we can build a powerful multilingual chatbot using Google’s Gemma, Translation LLM and Gemini models, orchestrated via MCP.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e3e910f0ca0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

The challenge: Diverse needs, one interface

Building a truly effective support chatbot might be challenging for a few different reasons:

Language barriers: Support needs to be available in multiple languages, requiring high-quality, low-latency translation.
Query complexity: Questions range from simple FAQs (handled easily by a basic model) to intricate technical problems demanding advanced reasoning.
Efficiency: The chatbot needs to respond quickly without getting bogged down, especially when dealing with complex tasks or translations.
Maintainability: As AI models evolve and business needs change, the system must be easy to update without requiring a complete overhaul.

Trying to build a single, monolithic AI model to handle everything is often inefficient and complex. A better approach? Specialization and smart delegation.

MCP architecture for harnessing different LLMs

The key to making these specialized models work together effectively is MCP. MCP defines how an orchestrator (like our Gemma-powered client) can discover available tools, request specific actions (like translation or complex analysis) from other specialized services, pass necessary information (the “context”), and receive results back. It’s the essential plumbing that allows our “team” of AI models to collaborate. Here’s a framework for how it works with the LLMs:

Gemma: The chatbot uses a versatile LLM like Gemma to manage conversations, understand user requests, handle basic FAQs, and determine when to utilize specialized tools for complex tasks via MCP.
Translation LLM server: A dedicated, lightweight MCP server exposing Google Cloud’s Translation capabilities as a tool. Its sole focus is high-quality, fast translation between languages, callable via MCP.
Gemini: A specialized MCP server uses Gemini Pro or similar LLM for complex technical reasoning and problem-solving when invoked by the orchestrator.
Model Context Protocol:This protocol will allow Gemma to discover and invoke the Translation and Gemini “tools” running on their respective servers.

How it works

Let’s walk through an example non-English language scenario:

A technical question arrives: A customer types a technical question into the chat window, but it’s in French.
Gemma receives the text: The Gemma-powered client receives the French text. It recognizes the language isn’t English and determines translation is needed.
Gemma calls on Translation LLM: : Gemma uses the MCP connection to send the French text to the Translation LLM Server, requesting an English translation.
Text is translated: The Translation LLM Server performs the translation via its MCP-exposed tool and sends the English version back to the client.

This architecture offers broad applicability. For example, imagine a financial institution’s support chatbot where all user input, regardless of the original language, must be preserved in English in real time for fraud detection. Here, Gemma operates as the client, while Translation LLM, Gemini Flash, and Gemini Pro function on the server. In this configuration, the client-side Gemma manages multi-turn conversations for routine inquiries and intelligently directs complex requests to specialized tools. As depicted in the architectural diagram, Gemma manages all user interactions within a multi-turn chat. A tool leveraging Translation LLM can translate user queries and concurrently save them for immediate fraud analysis. Simultaneously, Gemini Flash and Pro models can generate responses based on the user’s requests. For intricate financial inquiries, Gemini Pro can be employed, while Gemini Flash can address less complex questions.

Let’s look at this sample GitHub repo that illustrates how this architecture works.

Why this is a winning combination

This is a powerful combination because it’s designed for both efficiency and how easily you can adapt it.

The main idea is splitting up the work. The Gemma model based client that users interact with stays light, handling the conversation and sending requests where they need to go. Tougher jobs, like translating or complex thinking, are sent to separate LLMs built specifically for those tasks. This way, each piece does what it’s best at, making the whole system perform better.

A big plus is how this makes things easier to manage and more flexible. Because the parts connect with a standard interface (the MCP), you can update or swap out one of the specialized LLMs – maybe to use a newer model for translation – without having to change the Gemma client. This makes updates simpler, reduces potential headaches, and lets you try new things more easily. You can use this kind of setup for things like creating highly personalized content, tackling complex data analysis, or automating workflows more intelligently.

Get started

Ready to build your own specialized, orchestrated AI solutions?

Explore the code: Clone the GitHub repository for this project and experiment with the client and server setup.
Learn more about the models and MCP:

GCP – Create chatbots that speak different languages with Gemini, Gemma, Translation LLM, and Model Context Protocol

The challenge: Diverse needs, one interface

MCP architecture for harnessing different LLMs

How it works

Why this is a winning combination

Get started

Related Posts

GCP – Redefining enterprise data with agents and AI-native foundations

GCP – Spanner columnar engine: Powering next-generation analytics on operational data

GCP – Announcing AI-first Colab notebook experience for Google Cloud