GCP – Boost your Search and RAG agents with Vertex AI’s new state-of-the-art Ranking API
The AI era has supercharged expectations: users now issue more complex queries and demand pinpoint results, meaning there’s an 82% chance of losing a customer if they can’t quickly find what they need. Similarly, AI agents require ultra-relevant context for reliable task execution. However, when traditional search methods deliver noise – with generally up to 70% of retrieved passages lacking a true answer – both agentic workflows and user experiences suffer from untrustworthy and unreliable results.
To help businesses meet these rising expectations, we’re launching our new state-of-the-art Vertex AI Ranking API. It makes it easy to boost the precision of information surfaced within search, agentic workflows, and retrieval-augmented generation (RAG) systems. This means you can elevate your legacy search system and AI application in minutes, not months.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0bae7740a0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Go beyond simple retrieval
This is where precise ranking becomes essential. Think of the Vertex AI Ranking API as the precision filter at the crucial final stage of your retrieval pipeline. It intelligently sifts through the initial candidate set, identifying and elevating only the most pertinent information. This refinement step is key to unlocking higher quality, more trustworthy, and more efficient AI applications.
Vertex AI Ranking API acts as this powerful, yet easy-to-integrate, refinement layer. It takes the candidate list from your existing search or retrieval system and re-orders it based on deep semantic understanding, ensuring the best results rise to the top. Here’s how it helps you uplevel your systems:
-
Upgrade legacy search systems: Easily add state-of-the-art relevance scoring to existing search outputs, improving user satisfaction and business outcomes on commercial searches without overhauling your current stack.
-
Strengthen RAG systems: Send fewer, more relevant documents to your generative models. This improves answer trustworthiness while reducing latency and operating costs by optimizing context window usage.
-
Support intelligent agents: Guide AI agents with highly relevant information, streamlining their context and traces, and significantly improving the success rate of task completion.
Figure 1: Ranking API usage in a typical search and retrieval flow
What’s new in Ranking API
Today, we’re launching our new semantic reranker models:
- semantic-ranker-default-004 – our most accurate model for any use case
- semantic-ranker-fast-004 – our fastest model for latency-critical use cases
Our model establishing a new benchmark for ranking performance:
- State-of-the-art ranking: Based on evaluations using the industry-standard BEIR dataset, our model leads in accuracy among competitive standalone reranking API services. The nDCG is a metric that’s used to evaluate the quality of a ranking system by assessing how well ranked items align with their actual relevance and prioritizes relevant results at the top. We’ve published our evaluation scripts to ensure reproducibility of results.
Figure 2: semantic-ranker-default-004 leads in NDCG@5 on BEIR datasets compared to other rankers.
-
Industry-leading low latency: Our default model (semantic-ranker-default-004) is at least 2x faster than competitive reranking API services at any scale. Our fast model (semantic-ranker-fast-004) is tuned for latency-critical applications and typically exhibits 3x lower latency than our default model.
We’re also launching long context ranking with a limit of 200k total tokens per API request. Providing longer documents to the Ranking API allows it to better understand nuanced relationships between queries and information such as for customer reviews or product specifications in Retail.
Real-world impact across domains
The benefits aren’t just theoretical. Benchmarks on industry-specific datasets demonstrate that integrating the Ranking API can significantly boost the quality of search results across diverse high-value domains such as retail, news, finance, and healthcare.
Figure 3: nDCG@5 performance improvement with semantic-ranker-default-004 in various high-value domains based on internal datasets. Lexical & Semantic search baseline uses the best result of Vertex AI text-embedding-004 and BM25 based retrieval.
Elevate your search results in minutes
We designed the Vertex AI Ranking API for seamless integration. Adding this powerful relevance layer is straightforward, with several options:
-
Try it live: Experience the difference on real-world data by enabling our Ranking API in the interactive Vertex Vector Search demo (link)
-
Build with Vertex AI: Integrate directly into any existing system for maximum flexibility (link)
-
Enable it in RAG Engine: Select Ranking API in your RAG Engine to get more robust and accurate answers from your generative AI applications (link)
-
Use it in AlloyDB: For a truly streamlined experience, leverage the built-in ai.rank() SQL function directly within AlloyDB – a novel integration simplifying search use cases with AlloyDB (link)
-
AI Frameworks: Use our native integrations with popular AI frameworks like GenKit and LangChain (link)
-
Use it in Elasticsearch: Quickly boost accuracy with our built-in Ranking API integration in Elasticsearch (link)
Read More for the details.