2025 10 06

GCP – More choice, more control: self-deploy proprietary models in your VPC with Vertex AI

Building the best AI applications requires both the freedom to choose the most powerful, specialized model for the task at hand, and a platform that can handle them all. This flexibility is core to the Vertex AI platform, and today, we’re taking a significant step forward in our commitment to giving you unparalleled choice and control.

We are excited to announce that you can now securely deploy a growing selection of leading proprietary models from industry partners, including AI21 Labs, CAMB.AI, CSM, Mistral AI, Qodo, and Virtue AI, with models from Contextual AI and WRITER coming soon. You can deploy these models — including closed-source models and those with restricted commercial licenses — directly into your own Virtual Private Cloud (VPC).

You will find all of these models in the Vertex AI Model Garden, our central gateway to over 200 foundation models, including Google’s versatile Gemini family, leading open models, and third-party models. We provide a single, curated catalog where you can discover, test, and deploy the ideal model for your application.

Announcing self-deployable proprietary models in your VPC

For organizations that require maximum control over their data and infrastructure, you can now self-deploy powerful proprietary models from leading AI model builders directly within your VPC. With this new capability, you can acquire commercial licenses via Google Cloud Marketplace and deploy models securely within your environment, all while meeting Google Cloud’s high standards for security and compliance. Self-deploying proprietary models with Google Cloud gives you a number of benefits:

Deploy models within your VPC with full adherence to your VPC-SC policies, providing the highest assurance that your proprietary business data never leaves your environment. You can evaluate and deploy third-party models to production on a trusted platform.
You can optimize for performance or cost by selecting from a range of available machine types. Scale your replica count up or down manually to meet workload demands, or configure auto-scaling policies for hands-free management. Deploy to specific Google Cloud regions of your choice to achieve data compliance in your target markets or select locations for low latency delivery to your customers.
Discover, license, and deploy a curated selection of proprietary models from industry-leading providers, all in one place. We’re launching with models from eight partners—AI21 Labs, CAMB.AI, Contextual AI, CSM, Mistral AI, Qodo, Virtue AI, and WRITER. These models cover a wide range of use-cases and specializations. This is just the beginning, and you’ll see us continue to expand our catalog with the latest generative AI models.
Go from model discovery to production with ease. You can procure commercial licenses and deploy the models with just a few clicks directly from the Model Garden console. Our fully managed AI inference service handles the underlying infrastructure, so you’re free to focus on building your application.
Get started with simple pay-as-you-go pricing, so you only pay for what you use. You control your costs by scaling your deployment to meet your needs, and not deal with artificial limits or quota caps. You can further optimize costs by applying your existing Google Cloud committed-use discounts (CUDs) or reservations.

Meet Our Launch Models

Explore the new models available today for self-deployment in your VPC:

AI21 Labs – Jamba Large 1.6: Delivers leading model quality at fast speed, making it an excellent choice for private enterprise deployment.
CAMB.AI – MARS7: Enables you to ship production-ready voice applications with hyper-realistic, multilingual text-to-speech (TTS) outputs featuring optional voice cloning and fine-grained emotional control.
(Coming soon!) Contextual AI – Reranker: Designed to significantly enhance the relevance and quality of Retrieval-Augmented Generation (RAG) systems.
CSM – Cube: A generative AI model that transforms 2D images into detailed 3D models with remarkable precision and efficiency.
Mistral AI – Codestral (25.01): Explicitly designed for code generation tasks, helping developers write and interact with code through a shared instruction.
Qodo – Embed-1: A suite of large-scale code embedding models that enhance search accuracy for RAG by enabling efficient code and text retrieval.
Virtue AI – VirtueGuard: An enterprise-ready AI guardrail model that enables real-time content security, policy enforcement, and regulatory compliance with multilingual support for generative AI systems.
(Coming soon!) WRITER – Palmyra X4: Enterprise-grade LLM that combines a 128K token context window with a suite of capabilities, including advanced reasoning, tool calling, LLM delegation, built-in RAG, code generation, structured outputs, multi-modality, and multilingual

How to Get Started

You can deploy these new models in three simple steps:

Visit the Vertex AI Model Garden. On the left-hand navigation tab, under “Model Collections,” select “Self-deploy partner models”.
Select the model from a partner that you choose to deploy. To use the selected model, purchase a license by clicking “Enable”.
Your license is active in a few seconds. Simply click “Deploy” to configure and deploy the model endpoint in your VPC using Model Garden’s one-click deployment workflow.

We are committed to providing the most open and flexible AI platform for the enterprise. With even more choice and the fine-grained control and security of your own environment, you have everything you need to innovate responsibly. Explore the new models in Model Garden today and start building today!

GCP – More choice, more control: self-deploy proprietary models in your VPC with Vertex AI

Announcing self-deployable proprietary models in your VPC

Meet Our Launch Models

Related Posts

AWS – Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview)

AWS – Amazon Athena for Apache Spark is now available in Amazon SageMaker notebooks

AWS – AWS Payments Cryptography announces support for post-quantum cryptography to secure data in transit