GCP – Build and Deploy a Remote MCP Server to Google Cloud Run in Under 10 Minutes
Integrating context from tools and data sources into LLMs can be challenging, which impacts ease-of-use in the development of AI agents. To address this challenge, Anthropic introduced the Model Context Protocol (MCP), which standardizes how applications provide context to LLMs. Imagine you want to build an MCP server for your API to make it available to fellow developers so they can use it as context in their own AI applications. But where do you deploy it? Google Cloud Run could be a great option.
Drawing directly from the official Cloud Run documentation for hosting MCP servers, this guide shows you the straightforward process of setting up your very own remote MCP server. Get ready to transform how you leverage context in your AI endeavors!
MCP Transports
MCP follows a client-server architecture, and for a while, only supported running the server locally using the stdio
transport.
MCP has evolved and now supports remote access transports: streamable-http
and sse
. Server-Sent Events (SSE) has been deprecated in favor of Streamable HTTP in the latest MCP specification but is still supported for backwards compatibility. Both of these two transports allow for running MCP servers remotely.
With Streamable HTTP, the server operates as an independent process that can handle multiple client connections. This transport uses HTTP POST and GET requests.
The server MUST provide a single HTTP endpoint path (hereafter referred to as the MCP endpoint) that supports both POST and GET methods. For example, this could be a URL like https://example.com/mcp
.
You can read more about the different transports in the official MCP docs.
Benefits of running an MCP server remotely
Running an MCP server remotely on Cloud Run can provide several benefits:
-
Scalability: Cloud Run is built to rapidly scale out to handle all incoming requests. Cloud Run will scale your MCP server automatically based on demand.
-
Centralized server: You can share access to a centralized MCP server with team members through IAM privileges, allowing them to connect to it from their local machines instead of all running their own servers locally. If a change is made to the MCP server, all team members will benefit from it.
-
Security: Cloud Run provides an easy way to force authenticated requests. This allows only secure connections to your MCP server, preventing unauthorized access.
IMPORTANT: The security benefit is critical. If you don’t enforce authentication, anyone on the public internet can potentially access and call your MCP server.
Prerequisites
-
Python 3.10+
-
Uv (for package and project management, see docs for installation)
-
Google Cloud SDK (gcloud)
Installation
Create a folder, mcp-on-cloudrun
, to store the code for our server and deployment:
- code_block
- <ListValue: [StructValue([(‘code’, ‘mkdir mcp-on-cloudrunrncd mcp-on-cloudrun’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93a60>)])]>
Let’s get started by using uv
to create a project. Uv is a powerful and fast package and project manager.
- code_block
- <ListValue: [StructValue([(‘code’, ‘uv init –name “mcp-on-cloudrun” –description “Example of deploying a MCP server on Cloud Run” –bare –python 3.10’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93e20>)])]>
After running the above command, you should see the following pyproject.toml
:
- code_block
- <ListValue: [StructValue([(‘code’, ‘[project]rnname = “mcp-on-cloudrun”rnversion = “0.1.0”rndescription = “Example of deploying a MCP server on Cloud Run”rnrequires-python = “>=3.10″rndependencies = []’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93f10>)])]>
Next, let’s create the additional files we will need: a server.py for our MCP server code, a test_server.py that we will use to test our remote server, and a Dockerfile for our Cloud Run deployment.
- code_block
- <ListValue: [StructValue([(‘code’, ‘touch server.py test_server.py Dockerfile’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b932b0>)])]>
Our file structure should now be complete:
- code_block
- <ListValue: [StructValue([(‘code’, ‘├── mcp-on-cloudrunrn│ ├── pyproject.tomlrn│ ├── server.pyrn│ ├── test_server.pyrn│ └── Dockerfile’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93700>)])]>
Now that we have our file structure taken care of, let’s configure our Google Cloud credentials and set our project:
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud auth loginrnexport PROJECT_ID=<your-project-id>rngcloud config set project $PROJECT_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93070>)])]>
Math MCP Server
LLMs are great at non-deterministic tasks: understanding intent, generating creative text, summarizing complex ideas, and reasoning about abstract concepts. However, they are notoriously unreliable for deterministic tasks – things that have one, and only one, correct answer.
Enabling LLMs with deterministic tools (such as math operations) is one example of how tools can provide valuable context to improve the use of LLMs using MCP.
We will use FastMCP to create a simple math MCP server that has two tools: add
and subtract
. FastMCP provides a fast, Pythonic way to build MCP servers and clients.
Add FastMCP as a dependency to our pyproject.toml
:
- code_block
- <ListValue: [StructValue([(‘code’, ‘uv add fastmcp==2.6.1 –no-sync’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93220>)])]>
Copy and paste the following code into server.py
for our math MCP server:
- code_block
- <ListValue: [StructValue([(‘code’, ‘import asynciornimport loggingrnimport osrnrnfrom fastmcp import FastMCP rnrnlogger = logging.getLogger(__name__)rnlogging.basicConfig(format=”[%(levelname)s]: %(message)s”, level=logging.INFO)rnrnmcp = FastMCP(“MCP Server on Cloud Run”)rnrn@mcp.tool()rndef add(a: int, b: int) -> int:rn “””Use this to add two numbers together.rn rn Args:rn a: The first number.rn b: The second number.rn rn Returns:rn The sum of the two numbers.rn “””rn logger.info(f”>>> ?️ Tool: ‘add’ called with numbers ‘{a}’ and ‘{b}'”)rn return a + brnrn@mcp.tool()rndef subtract(a: int, b: int) -> int:rn “””Use this to subtract two numbers.rn rn Args:rn a: The first number.rn b: The second number.rn rn Returns:rn The difference of the two numbers.rn “””rn logger.info(f”>>> Tool: ‘subtract’ called with numbers ‘{a}’ and ‘{b}'”)rn return a – brnrnif __name__ == “__main__”:rn logger.info(f” MCP server started on port {os.getenv(‘PORT’, 8080)}”)rn # Could also use ‘sse’ transport, host=”0.0.0.0″ required for Cloud Run.rn asyncio.run(rn mcp.run_async(rn transport=”streamable-http”, rn host=”0.0.0.0″, rn port=os.getenv(“PORT”, 8080),rn )rn )’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93850>)])]>
Transport
We are using the streamable-http
transport for this example as it is the recommended transport for remote servers, but you can also still use sse
if you prefer as it is backwards compatible.
If you want to use sse
, you will need to update the last line of server.py
to use transport="sse"
.
Deploying to Cloud Run
Now let’s deploy our simple MCP server to Cloud Run.
Copy and paste the below code into our empty Dockerfile
; it uses uv
to run our server.py
:
- code_block
- <ListValue: [StructValue([(‘code’, ‘# Use the official Python lightweight imagernFROM python:3.13-slimrnrn# Install uvrnCOPY –from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/rnrn# Install the project into /apprnCOPY . /apprnWORKDIR /apprnrn# Allow statements and log messages to immediately appear in the logsrnENV PYTHONUNBUFFERED=1rnrn# Install dependenciesrnRUN uv syncrnrnEXPOSE $PORTrnrn# Run the FastMCP serverrnCMD [“uv”, “run”, “server.py”]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473b80>)])]>
You can deploy directly from source, or by using a container image.
For both options we will use the --no-allow-unauthenticated
flag to require authentication.
This is important for security reasons. If you don’t require authentication, anyone can call your MCP server and potentially cause damage to your system.
Option 1 – Deploy from source
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud run deploy mcp-server –no-allow-unauthenticated –region=us-central1 –source .’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473ee0>)])]>
Option 2 – Deploy from a container image
Create an Artifact Registry repository to store the container image.
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud artifacts repositories create remote-mcp-servers \rn –repository-format=docker \rn –location=us-central1 \rn –description=”Repository for remote MCP servers” \rn –project=$PROJECT_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473070>)])]>
Build the container image and push it to Artifact Registry with Cloud Build.
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud builds submit –region=us-central1 –tag us-central1-docker.pkg.dev/$PROJECT_ID/remote-mcp-servers/mcp-server:latest’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b04732e0>)])]>
Deploy our MCP server container image to Cloud Run.
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud run deploy mcp-server \rn –image us-central1-docker.pkg.dev/$PROJECT_ID/remote-mcp-servers/mcp-server:latest \rn –region=us-central1 \rn –no-allow-unauthenticated’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473d30>)])]>
Once you have completed either option, if your service has successfully deployed you will see a message like the following:
- code_block
- <ListValue: [StructValue([(‘code’, ‘Service [mcp-server] revision [mcp-server-12345-abc] has been deployed and is serving 100 percent of traffic.’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473340>)])]>
Authenticating MCP Clients
Since we specified --no-allow-unauthenticated
to require authentication, any MCP client connecting to our remote MCP server will need to authenticate.
The official docs for Host MCP servers on Cloud Run provides more information on this topic depending on where you are running your MCP client.
For this example, we will run the Cloud Run proxy to create an authenticated tunnel to our remote MCP server on our local machines.
By default, the URL of Cloud Run services requires all requests to be authorized with the Cloud Run Invoker (roles/run.invoker
) IAM role. This IAM policy binding ensures that a strong security mechanism is used to authenticate your local MCP client.
Make sure that you or any team members trying to access the remote MCP server have the roles/run.invoker
IAM role bound to their IAM principal (Google Cloud account).
NOTE: The following command may prompt you to download the Cloud Run proxy if it is not already installed. Follow the prompts to download and install it.
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud run services proxy mcp-server –region=us-central1’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473eb0>)])]>
You should see the following output:
- code_block
- <ListValue: [StructValue([(‘code’, ‘Proxying to Cloud Run service [mcp-server] in project [<YOUR_PROJECT_ID>] region [us-central1]rnhttp://127.0.0.1:8080 proxies to https://mcp-server-abcdefgh-uc.a.run.app’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473970>)])]>
All traffic to http://127.0.0.1:8080
will now be authenticated and forwarded to our remote MCP server.
Testing the remote MCP server
Let’s test and connect to the remote MCP server using the FastMCP client to connect to http://127.0.0.1:8080/mcp
(note the /mcp
at the end as we are using the Streamable HTTP transport) and call the add
and subtract
tools.
Add the following code to the empty test_server.py
file:
- code_block
- <ListValue: [StructValue([(‘code’, ‘import asynciornrnfrom fastmcp import Clientrnrnasync def test_server():rn # Test the MCP server using streamable-http transport.rn # Use “/sse” endpoint if using sse transport.rn async with Client(“http://localhost:8080/mcp”) as client:rn # List available toolsrn tools = await client.list_tools()rn for tool in tools:rn print(f”>>> Tool found: {tool.name}”)rn # Call add toolrn print(“>>> Calling add tool for 1 + 2”)rn result = await client.call_tool(“add”, {“a”: 1, “b”: 2})rn print(f”<<< Result: {result[0].text}”)rn # Call subtract toolrn print(“>>> Calling subtract tool for 10 – 3”)rn result = await client.call_tool(“subtract”, {“a”: 10, “b”: 3})rn print(f”<<< Result: {result[0].text}”)rnrnif __name__ == “__main__”:rn asyncio.run(test_server())’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473580>)])]>
NOTE: Make sure you have the Cloud Run proxy running before running the test server.
In a new terminal run:
- code_block
- <ListValue: [StructValue([(‘code’, ‘uv run test_server.py’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473130>)])]>
You should see the following output:
- code_block
- <ListValue: [StructValue([(‘code’, ‘>>> Tool found: addrn>>> Tool found: subtractrn>>> Calling add tool for 1 + 2rn<<< Result: 3rn>>> Calling subtract tool for 10 – 3rn<<< Result: 7’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473c10>)])]>
You’ve done it! You have successfully deployed a remote MCP server to Cloud Run and tested it using the FastMCP client.
Want to learn more about deploying AI applications on Cloud Run? Check out this blog from Google I/O to learn the latest on Easily Deploying AI Apps to Cloud Run!
Continue Reading
Read More for the details.