GCP – Getting started with retrieval augmented generation on BigQuery with LangChain
The ability of large language models (LLMs) to process and generate human language continues to revolutionize many aspects of business. But an LLM’s knowledge is limited to the data it was trained on, which can cause drawbacks when dealing with specific company information or nuanced industry contexts. Retrieval-augmented generation (RAG) offers a powerful solution to this limitation by connecting LLMs with your own data sources and enabling them to pull from internal knowledge bases, enabling new business processes all grounded in the specifics of your data.
BigQuery now allows you to generate embeddings and execute powerful vector search at scale, enabling RAG workflows within BigQuery. By leveraging LangChain, a framework designed for developing applications with LLMs, you can seamlessly build RAG applications tailor-made for your business needs.
In this blog, we’ll provide a practical guide to implement RAG using BigQuery and LangChain and provide you with a framework to get started with your own data.
Limitations of LLMs
Imagine a scenario where we want to ask questions about the 2024 Cymbal Starlight — a fictional automobile. We might ask: “how many miles until I need to change my oil?” Or “I broke down on the highway and where can I get help?” Traditionally, we might consult the owner’s manual and page through it until we find an answer.
We could also simply pose a question to an LLM:
<ListValue: [StructValue([(‘code’, ‘model_id = “gemini-1.5-pro-preview-0409″rnmodel = GenerativeModel(model_id)rnrnquery = “How many miles can I drive the 2024 Google Starlight until I need an oil change?”rnresponse = model.generate_content(query)rnrnprint(response.text)rnrnUnfortunately, as an AI language model, I don’t have access to real-time information, including specific details about the 2024 Google Starlight’s maintenance schedule. The recommended oil change interval can vary depending on several factors…’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e4e60fa4850>)])]>
Unfortunately, this response doesn’t answer our question. This is no surprise, because the 2024 Cymbal Starlight is a fictional vehicle and its owners manual wasn’t included in the LLM’s training data. To solve this constraint, we can use retrieval-augmented generation, which augments the LLM with proprietary or first-party data, like the 2024 Cymbal Starlight owner’s manual!
Enter retrieval augmented generation (RAG)
LLMs are powerful tools, but can be limited by their internal knowledge. RAG addresses this by incorporating data from external sources, allowing LLMs to access relevant information in real-time and without having to fine-tune or retrain a model. A simple RAG pipeline has two main components:
Data preprocessing:
Input data like documents are split into smaller chunks, converted into vector embeddings, and sent to a vector store for later retrieval
Query and retrieval:
A user asks a question in natural language. This is turned into an embedding relevant context is retrieved from a vector search
The context is provided to an LLM to augment its knowledge
The LLM generates a response that weaves together retrieved chunks with its pretrained knowledge and summarization capabilities
LangChain
LangChain is an open source orchestration framework to work with LLMs, enabling developers to quickly build generative AI applications on their data. Google Cloud contributed a new LangChain integration with BigQuery that can make it simple to pre-process your data, generate and store embeddings, and run vector search, all using BigQuery.
In this demo, we’ll handle both the pre-processing and runtime steps with LangChain. Let’s take a look!
Building a RAG pipeline with BigQuery and LangChain
This blog post highlights a few of the major steps to building a simple RAG pipeline using BigQuery and LangChain. To view other steps, get more in-depth, or To follow along and view additional steps, you can make a copy of the notebook, Augment Q&A Generation using LangChain and BigQuery Vector Search, which allows you to run the following example in Colab using your own Google Cloud environment.
Data preprocessing
We begin by reading our document, the 2024 Cymbal Starlight Owner’s Manual, into memory using a LangChain Document Loader, called PyPDFLoader, which loads objects from Google Cloud Storage.
Once loaded, we split the document into smaller chunks. Chunking makes RAG more efficient, as chunks allow for more targeted retrieval of relevant information and reduced computational load. This improves the accuracy and contextuality of generated responses and improves response time. We use LangChain’s RecursiveTextSplitter, which splits text based on rules we define.
<ListValue: [StructValue([(‘code’, ‘from langchain.text_splitter import RecursiveCharacterTextSplitterrnrn# Split the documents into chunksrntext_splitter = RecursiveCharacterTextSplitter(rn chunk_size=1000,rn chunk_overlap=50,rn separators=[“\n\n”, “\n”, “.”, “!”, “?”, “,”, ” “, “”],rn)rndoc_splits = text_splitter.split_documents(documents)rnrn# Add chunk number to metadatarnfor idx, split in enumerate(doc_splits):rn split.metadata[“chunk”] = idx’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e4e60fa4d30>)])]>
With the text chunks stored in doc_splits, we now need to generate embeddings for each chunk and store them in BigQuery. To do so, we’ll first initialize a LangChain Vector Store using the new BigQueryVectorSearch class. This requires some input around your Google Cloud and BigQuery environments and requires us to define an embedding model. We’ll use a textembedding-gecko model from VertexAI.
Lastly, we call the vector store (bq_vector_cars_manual) and pass it all of the document chunks. LangChain facilitates turning these chunks into embeddings and sending them to BigQuery.
<ListValue: [StructValue([(‘code’, ‘from langchain_google_vertexai import VertexAIEmbeddingsrnfrom langchain_community.vectorstores import BigQueryVectorSearchrnrnembedding_model = VertexAIEmbeddings(rn model_name=”textembedding-gecko@latest”, project=PROJECT_IDrn)rnrnbq_vector_cars_manual = BigQueryVectorSearch(rn project_id=PROJECT_ID,rn dataset_name=DATASET,rn table_name=TABLE,rn location=REGION,rn embedding=embedding_model,rn)rnrnbq_vector_cars_manual.add_documents(doc_splits)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e4e60fa4760>)])]>
We can inspect the BigQuery table and confirm that it contains the document metadata, content, and text embedding.
Query and retrieval
Now that our text embedding data exists in BigQuery, we search for relevant chunks and ground our generated answers with them. This pattern is often called RAG. We’ll begin by initializing a Vertex AI LLM and a LangChain retriever to fetch documents using BigQuery Vector Search.
<ListValue: [StructValue([(‘code’, ‘from langchain_google_vertexai import VertexAIrnfrom langchain.chains import RetrievalQArnrnllm = VertexAI(model_name=”gemini-pro”)rnrnretriever = bq_vector_cars_manual.as_retriever()’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e4e60fa44f0>)])]>
For Q&A chains, our retriever is passed directly to the chain and can be used without further configuration. When I ask a question, the following happens behind the scenes:
My question is turned into a text embedding
A vector search occurs on BigQuery and the relevant document chunks are retrieved
These chunks are then passed to the prompt used by the LLM to augment its knowledge and generate a concise answer.
Let’s take a look at a basic example using LangChain’s RetrievalQA Chain.
<ListValue: [StructValue([(‘code’, ‘search_query = “How many miles can I drive the 2024 Google Starlight until I need an oil change?”rnrnretrieval_qa = RetrievalQA.from_chain_type(rn llm=llm, chain_type=”stuff”, retriever=retrieverrn)rnretrieval_qa.invoke(search_query)rnrn{‘query’: ‘How many miles can I drive the 2024 Google Starlight until I need an oil change?’,rn ‘result’: ‘You can drive the 2024 Google Starlight up to 5,000 miles before you need an oil change or 6 months, whichever comes first.’}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e4e60f268b0>)])]>
The LLM now provides us with a concrete answer! We should change the oil every 5,000 miles on this vehicle.
Now let’s take a slightly more sophisticated example. We will use the ConversationalRetrievalChain. This still uses BigQuery Vector Search, but persists previous conversation history in memory and adds it as context to the LLM response. This provides a conversational capability with your data.
<ListValue: [StructValue([(‘code’, ‘from langchain.chains import ConversationalRetrievalChainrnfrom langchain.memory import ConversationBufferMemoryrnrnmemory = ConversationBufferMemory(memory_key=”chat_history”, return_messages=True)rnconversational_retrieval = ConversationalRetrievalChain.from_llm(rn llm=llm, retriever=retriever, memory=memoryrn)rnrnsearch_query = “Does the Cymbal Starlight have roadside assistance?”rnrnconversational_retrieval.invoke(search_query)[“answer”]rnrnYes, the Cymbal Starlight 2024 comes with 24/7 emergency roadside assistance.’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e4e60f26340>)])]>
We can then ask a follow up question without needing to provide much additional context, because the last question and answer are already passed through.
<ListValue: [StructValue([(‘code’, ‘new_query = “What number do I call?”rnrnresult = conversational_retrival.invoke(new_query)rnrnprint(result[“answer”])rnrnTo contact emergency roadside assistance, call the following number: 1-800-555-1212.’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e4e60f26730>)])]>
Recall that initially, the LLM was unable to answer any questions about the 2024 Cymbal Starlight. But in a few steps, we used BigQuery Vector Search and LangChain to build a simple RAG Q&A application that provides us with useful information grounded in our own documents!
Get started
Google Cloud offers many tools to store embeddings and run vector search. BigQuery Vector Search is optimized for large-scale analytical workloads and incorporates many of the features you expect from BigQuery. It’s fully managed, serverless – scaling up and down without needing to worry about infrastructure management, and incorporates capabilities like governance and fine-grained access control.
Get started building a RAG application today with BigQuery and LangChain! Check out the sample notebook to follow the example above with greater depth, or read the new BigQuery Vector Search LangChain documentation to begin building an application on your data.
For additional approaches and resources on building RAG applications on Google Cloud, check out the following:
Learn more about RAG and related Google Cloud services
In-depth overview of RAG applications using Vertex AI
Unsure which vector store to use? Check out this decision tree!
Read More for the details.