2025 07 16

GCP – Build with more flexibility: New open models arrive in the Vertex AI Model Garden

In our ongoing effort to provide businesses with the flexibility and choice needed to build innovative AI applications, we are expanding the catalog of open models available as Model-as-a-Service (MaaS) offerings in Vertex AI Model Garden. Following the addition of Llama 4 models earlier this year, we are announcing DeepSeek R1 is available for everyone through our Model-as-a-Service (MaaS) offering. This expansion reinforces our commitment to an open AI ecosystem, ensuring our customers can access a diverse range of powerful models to find the one best suited for their specific use case.

Deploying and managing today’s large-scale models presents operational and financial challenges. For instance, a large model such as DeepSeek R1 can require an infrastructure of eight advanced H200 GPUs to run inference. For many organizations, procuring and managing such resources is a major undertaking that can divert focus from core application development.

Vertex AI’s MaaS offering is designed to remove this complexity. By providing these models as fully managed, serverless APIs, we eliminate the need for customers to provision or manage the underlying infrastructure. This allows your teams to bypass the complexities of GPU management and focus directly on building and innovating. With Vertex AI, you benefit from a secure, enterprise-grade platform with built-in data privacy and compliance, all under a flexible, pay-as-you-go pricing model that scales with your needs.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee4edba20d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Getting started

Below we provide a step-by-step guide on how you can use open models available on MaaS. We have used DeepSeek R1 on Vertex AI as an example. It can be accessed both via the UI and API.

1. Enable the DeepSeek API Service

Navigate to the DeepSeek API Service from the Vertex AI Model Garden and click on the title to open the model card. Then, enable access to the DeepSeek API Service. It may take a few minutes for permissions to propagate after enablement.

2. Try out the model via the UI

Navigate to the DeepSeek API Service from the Vertex AI Model Garden and click on the tile to open the model card. You can use the UI in the sidebar to test the service.

DeepSeek API Service with UI sidebar to test the service

3. Try out the model via Vertex AI API

To integrate DeepSeek R1 within your applications, you can use either REST API or OpenAI Python API Client Library. Note: For security of your data, DeepSeek MaaS endpoint does not have any outbound internet access.

Get Predictions via the REST API

You can make API requests via curl from the Cloud Shell or your machine with gcloud credentials configured. Remember to replace the placeholders with this code:

code_block: <ListValue: [StructValue([(‘code’, ‘export PROJECT_ID=<ENTER_PROJECT_ID>rnexport REGION_ID=<ENTER_REGION_ID> rnrncurl \rn-X POST \rn-H “Authorization: Bearer $(gcloud auth print-access-token)” \rn-H “Content-Type: application/json” \rn”https://${REGION_ID}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION_ID}/endpoints/openapi/chat/completions” \rn-d ‘{rn “model”: “deepseek-ai/deepseek-r1-0528-maas”,rn “max_tokens”: 200,rn “stream”: true,rn “messages”: [rn {rn “role”: “user”,rn “content”: “which is bigger – 9.11 or 9.9″rn }rn ]rn}”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee4edba2cd0>)])]>

Get Predictions via the OpenAI Python API Client Library

Install the OpenAI Python API Library:

code_block: <ListValue: [StructValue([(‘code’, ‘pip install openai’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee4edba2550>)])]>

Initialize the client and configure the endpoint URL. To get the access token to use as an API key, you can read more here. If run from a local machine, GOOGLE_APPLICATION_CREDENTIALS will authenticate your requests.

code_block: <ListValue: [StructValue([(‘code’, ‘import osrnimport openairnrnPROJECT_ID = “ENTER_PROJECT_ID”rnLOCATION = “us-central1″rnMODEL_ID = “deepseek-ai/deepseek-r1-0528-maas”rnAPI_KEY = os.environ[“GOOGLE_APPLICATION_CREDENTIALS”] # or add output from gcloud auth print-access-token rnrndeepseek_vertex_endpoint_url = (rn f”https://{LOCATION}-aiplatform.googleapis.com/v1beta1/”rn f”projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/openapi”rn)rnrnclient = openai.OpenAI(rn base_url=deepseek_vertex_endpoint_url,rn api_key=API_KEYrn)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee4edba2970>)])]>

Make completions requests via the client:

code_block: <ListValue: [StructValue([(‘code’, ‘response = client.chat.completions.create(rn model=”deepseek-ai/deepseek-r1-0528-maas”,rn messages=[rn {“role”: “system”, “content”: “You are a helpful assistant”},rn {“role”: “user”, “content”: “How many r’s are in strawberry ?”},rn ],rn stream=False,rn)rnrnprint(response.choices[0].message.content)rnrn# ChatCompletion(“id=”””,rn# “choices=”[rn# “Choice(finish_reason=””length”,rn# index=0,rn# “logprobs=None”,rn# “message=ChatCompletionMessage(content=””<think>\nFirst, the question is: \”How many r\\’s are in strawberry?\” I need to count the number of times the letter \\’r\\’ appears in the word \”strawberry\”.\n\nLet me write down the word: S-T-R-A”,rn# “refusal=None”,rn# “role=””assistant”,rn# “annotations=None”,rn# “audio=None”,rn# “function_call=None”,rn# “tool_calls=None))”rn# ],rn# created=,rn# “model=””deepseek-ai/deepseek-r1-0528-maas”,rn# “object=””chat.completion”,rn# “service_tier=None”,rn# “system_fingerprint=”””,rn# usage=CompletionUsage(completion_tokens=50,rn# prompt_tokens=18,rn# total_tokens=68,rn# “completion_tokens_details=None”,rn# “prompt_tokens_details=None))”‘), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee4edba2a30>)])]>

What’s next?

Vertex AI Model Garden opens up new possibilities for building applications that require state-of-the-art foundation models. Here are some next steps:

Review documentation guide for DeepSeek R1 MaaS here and Llama MaaS here
Review pricing here for both models
Explore the Model Garden: Discover other models available as managed services
Build a proof-of-concept: Start with a small project to understand the model’s capabilities
Join the community: Share your experiences and learn from others in the Google Cloud AI Community

GCP – Build with more flexibility: New open models arrive in the Vertex AI Model Garden

Getting started

1. Enable the DeepSeek API Service

2. Try out the model via the UI

3. Try out the model via Vertex AI API

Get Predictions via the REST API

Get Predictions via the OpenAI Python API Client Library

What’s next?

Related Posts

AWS – OpenSearch UI supports Fine Grained Access Control by SAML attributes

AWS – Amazon SageMaker HyperPod now supports continuous provisioning for enhanced cluster operations

GCP – Google is a Leader in the Gartner® Magic Quadrant for Strategic Cloud Platform Services