GCP – Introducing Vertex AI RAG Engine: Scale your Vertex AI RAG pipeline with confidence
Closing the gap between impressive model demos and real-world performance is crucial for successfully deploying generative AI for enterprise. Despite the incredible capabilities of generative AI for enterprise, this perceived gap may be a barrier for many developers and enterprises to “productionize” AI. This is where retrieval-augmented generation (RAG) becomes non-negotiable – it strengthens your enterprise applications by building trust in its AI outputs.
Today, we’re sharing the general availability of Vertex AI’s RAG Engine, a fully managed service that helps you build and deploy RAG implementations with your data and methods. With our Vertex AI RAG Engine you will be able to:
-
Adapt to any architecture: Choose the models, vector databases, and data sources that work best for your use case. This flexibility ensures RAG Engine fits into your existing infrastructure rather than forcing you to adapt to it.
-
Evolve with your use case: Add new data sources, updating models, or adjusting retrieval parameters happens through simple configuration changes. The system grows with you, maintaining consistency while accommodating new requirements.
-
Evaluate in simple steps: Set up multiple RAG engines with different configurations to find what works best for your use case.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3eef9c26ed30>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Introducing Vertex AI RAG Engine
Vertex AI RAG Engine is a managed service that lets you build and deploy RAG implementations with your data and methods. Think of it as having a team of experts who have already solved complex infrastructure challenges such as efficient vector storage, intelligent chunking, optimal retrieval strategies, and precise augmentation — all while giving you the controls to customize for your specific use case.
Vertex AI’s RAG Engine offers a vibrant ecosystem with a range of options catering to diverse needs.
-
DIY capabilities: DIY RAG empowers users to tailor their solutions by mixing and matching different components. It works great for low to medium complexity use cases with easy-to-get-started API, enabling fast experimentation, proof-of-concept and RAG-based application with a few clicks.
-
Search functionality: Vertex AI Search stands out as a robust, fully managed solution. It supports a wide variety of use cases, from simple to complex, with high out-of-the-box quality, easiness to get started and minimum maintenance.
-
Connectors: A rapidly growing list of connectors helps you quickly connect to various data sources, including Cloud Storage, Google Drive, Jira, Slack, or local files. RAG Engine handles the ingestion process (even for multiple sources) through an intuitive interface.
-
Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.
-
Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.
-
Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.
Customization
One of the defining strengths of Vertex AI’s RAG Engine is its capacity for customization. This flexibility allows you to fine-tune various components to perfectly align with your data and use case.
-
Parsing: When documents are ingested into an index, they are split into chunks. RAG Engine provides the possibility to tune chunk size and chunk overlap and different strategies to support different types of documents.
-
Retrieval: you might already be using Pinecone, or perhaps you prefer the open-source capabilities of Weaviate. Maybe you want to leverage Vertex AI Vector Search or our Vector database. RAG Engine works with your choice, or if you prefer, can manage the vector storage entirely for you. This flexibility ensures you’re never locked into a single approach as your needs evolve.
-
Generation: You can choose from hundreds of LLMs in Vertex AI Model Garden, including Google’s Gemini, Llama and Claude.
Use Vertex AI RAG as a tool in Gemini
Vertex AI’s RAG Engine is natively integrated with Gemini API as a tool. You can create grounded conversation that uses RAG to provide contextually relevant answers. Simply initialize a RAG retrieval tool, configured with specific settings like the number of documents to retrieve and using an LLM-based ranker. This tool is then passed to a Gemini model.
- code_block
- <ListValue: [StructValue([(‘code’, ‘from vertexai.preview import ragrnfrom vertexai.preview.generative_models import GenerativeModel, Toolrnimport vertexairnrnPROJECT_ID = “PROJECT_ID”rnCORPUS_NAME = “projects/{PROJECT_ID}/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE”rnMODEL_NAME= “MODEL_NAME”rnrn# Initialize Vertex AI API once per sessionrnvertexai.init(project=PROJECT_ID, location=”LOCATION”)rnrnconfig = vertexai.preview.rag.RagRetrievalConfig(rn top_k=10,rn ranking=rag.Ranking(rn llm_ranker=rag.LlmRanker(rn model_name=MODEL_NAMErn )rn )rn)rnrnrag_retrieval_tool = Tool.from_retrieval(rn retrieval=rag.Retrieval(rn source=rag.VertexRagStore(rn rag_resources=[rn rag.RagResource(rn rag_corpus=CORPUS_NAME,rn )rn ],rn rag_retrieval_config=configrn ),rn )rn)rnrnrag_model = GenerativeModel(rn model_name=MODEL_NAME, tools=[rag_retrieval_tool]rn)rnresponse = rag_model.generate_content(“Why is the sky blue?”)rnprint(response.text)rn# Example response:rn# The sky appears blue due to a phenomenon called Rayleigh scattering.rn# Sunlight, which contains all colors of the rainbow, is scatteredrn# by the tiny particles in the Earth’s atmosphere….rn# …’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eef9c26ed00>)])]>
Use Vertex AI Search as a retriever:
Vertex AI Search provides a solution for retrieving and managing data within your Vertex AI RAG applications. By using Vertex AI Search as your retrieval backend, you can improve performance, scalability, and ease of integration.
-
Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.
-
Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.
-
Seamless integration: Vertex AI provides built-in integration with Vertex AI Search, which lets you select Vertex AI Search as the corpus backend for your RAG application. This simplifies the integration process and helps to ensure optimal compatibility between components.
-
Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.
- code_block
- <ListValue: [StructValue([(‘code’, ‘from vertexai.preview import ragrnimport vertexairnrnPROJECT_ID = “PROJECT_ID”rnDISPLAY_NAME = “DISPLAY_NAME”rnENGINE_NAME = “ENGINE_NAME”rnrn# Initialize Vertex AI API once per sessionrnvertexai.init(project=PROJECT_ID, location=”us-central1″)rnrn# Create a corpusrnvertex_ai_search_config = rag.VertexAiSearchConfig(rn serving_config=f”{ENGINE_NAME}/servingConfigs/default_search”,rn)rnrnrag_corpus = rag.create_corpus(rn display_name=DISPLAY_NAME,rn vertex_ai_search_config=vertex_ai_search_config,rn)rnrn# Check the corpus just createdrnnew_corpus = rag.get_corpus(name=rag_corpus.name)rnprint(new_corpus)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eef9cad9ac0>)])]>
Get started today
You can access Vertex AI’s RAG Engine through our Vertex AI Studio. Visit the Google Cloud Console to get started, or reach out to us for a guided proof of concept. To get started visit our RAG quick start documentation or take a look at our Vertex AI RAG Engine GitHub repository.
Read More for the details.