GCP – Build generative AI and similarity search applications at virtually unlimited scale with Spanner
Spanner, Google Cloud’s fully managed, highly available distributed database service, combines virtually-unlimited horizontal scalability with relational semantics, for both relational and non-relational workloads — all with a 99.999% availability SLA. As data volumes grow and applications demand more from their operational databases, customers need scale. We recently announced support for searching vector embeddings with exact nearest neighbor (KNN) search in preview, helping businesses build generative AI, at virtually unlimited scale. All these capabilities are available within Spanner, so you can perform vector search on your transactional data without moving your data to another database, maintaining operational simplicity.
In this blog, we discuss how vector search can enhance gen AI applications, and how Spanner’s underlying architecture supports extremely large-scale vector search deployments. In addition, we discuss the many operational benefits of using Spanner instead of a dedicated vector database.
Generative AI and vector embeddings
Generative AI is enabling all kinds of new applications, from virtual assistants that can have personalized conversations, to generating new content from simple text prompts. Pre-trained large language models (LLMs), on which gen AI relies, open the door for the broader developer community to easily build gen AI applications, even without specialized machine learning expertise. But because LLMs sometimes hallucinate and provide incorrect responses, combining LLMs with vector search and operational databases can help build gen AI applications that are grounded on contextual, domain-specific, and real-time data, for high-quality AI-assisted user experiences.
Imagine a financial institution has a virtual assistant that helps customers answer questions about their account, performs account management, and recommends financial products that best fit a customer’s unique needs. In complex scenarios, the customer’s decision-making process can spread across multiple chat sessions with the virtual assistant. Performing vector search over the conversation history can help the virtual assistant find the most relevant content, enabling a high-quality, highly relevant, and informative chat experience.
Vector search relies on vector embeddings — numerical representations of content such as text, images, or video generated by embedding models — and helps the gen AI application to identify the most relevant data to include in LLM prompts, thereby customizing and improving the quality of the LLM’s responses. Vector search can be performed by computing the distance between vector embeddings. The closer the embeddings are in the vector space, the more similar their content.
Bring virtually unlimited scale to vector search with Spanner
Vector workloads that need to support a large number of users can easily reach a very large scale, as seen in the financial virtual assistant example described above. Large-scale vector search workloads can have both a large number of vectors (e.g., greater than 10 billion), or queries per second (e.g., greater than millions of QPS). Not surprisingly, this can be challenging for many database systems. But many of these searches are highly partitionable, where each search is constrained to data associated with a particular user. These workloads are a great fit for Spanner KNN search because Spanner efficiently reduces the search space to provide accurate, real-time results with low latencies. Spanner’s horizontally scalable architecture lets it support vector search on trillions of vectors for highly partitionable workloads.
Spanner also lets you query and filter vector embeddings using SQL, maintaining application simplicity. Using SQL, you can easily join vector embeddings with operational data, and combine regular queries with vector search. For example, you can use secondary indexes to efficiently filter rows of interest before performing a vector search. Spanner’s vector search queries return fresh, real-time data as soon as transactions are committed, just like any other query on your operational data.
Operational simplicity and resource efficiency with Spanner
Further, Spanner’s in-database vector search capabilities eliminate the cost and complexity of managing a separate vector database, streamlining your operational workflow. In Spanner, vector embeddings and operational data are stored together and managed the same way, enabling vector embeddings to benefit from all of Spanner’s enterprise features, including high 99.999% availability, managed backups, point-in-time recovery (PITR), security and access control features, and change streams. Compute resources are shared between operational and vector queries, enabling better resource utilization and cost savings. Additionally, these capabilities are also supported by Spanner’s PostgreSQL interface, thereby giving users coming from PostgreSQL a familiar and portable interface.
Spanner is also integrated with popular AI developer tools including LangChain Vector Store, Document Loader, and Memory, helping developers to easily build AI applications with their preferred tooling.
Getting started
The rise of gen AI has spurred new interest in vector search capabilities. With support for KNN vector search on top of its virtually unlimited scale, Spanner is well-suited to support your large-scale vector search needs, all on the same platform that you already rely on for your demanding, distributed workloads. To learn more about Spanner and its vector search (and get started for free), check out the following resources:
Spanner: Database Unlimited – An overview of Spanner’s top use cases, and deep dives of how it delivers unlimited scale, strong consistency and up to 99.999% availabilitySpanner vector search documentationVector embeddings and how you can use Spanner’s vector search in retailHow to use Spanner’s ML.PREDICT SQL function for in-database vector embedding generation (tutorial), LLM queries (tutorial), and online inference with custom models (tutorial) served by Vertex AISpanner’s AI ecosystem integrations including LangChain and the open source spanner-analytics package that facilitates common data-analytic operations in Python and that includes integrations with Jupyter Notebooks
Read More for the details.