2024 11 22

GCP – Build an AI agent for trip planning with Gemini 1.5 Pro: A step-by-step guide

Gemini 1.5 Pro is creating new possibilities for developers to build AI agents that streamline the customer experience. In this post, we’ll focus on a practical application that has emerged in the travel industry – building an AI-powered trip planning agent. You’ll learn how to connect your agent to external data sources like event APIs, enabling it to generate personalized travel itineraries based on real-time information.

Understanding the core concepts

Function calling: Allows developers to connect Gemini models (all Gemini models except Gemini 1.0 Pro Vision) with external systems, APIs, and data sources. This enables the AI to retrieve real-time information and perform actions, making it more dynamic and versatile.
Grounding: Enhances Gemini’ model’s ability to access and process information from external sources like documents, knowledge bases, and the web, leading to more accurate and up-to-date responses.

By combining these features, we can create an AI agent that can understand user requests, retrieve relevant information from the web, and provide personalized recommendations.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud developer tools’), (‘body’, <wagtail.rich_text.RichText object at 0x3e47e6479580>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Step-by-step: Function calling with grounding

Let’s run through a scenario:

Let’s say you’re an AI engineer tasked with creating an AI agent that helps users plan trips by finding local events and potential hotels to stay at. Your company has given you full creative freedom to build a minimal viable product using Google’s generative AI products, so you’ve chosen to use Gemini 1.5 Pro and loop in other external APIs.

The first step is to define potential queries that any user might enter into the Gemini chat. This will help clarify development requirements and ensure the final product meets the standards of both users and stakeholders. Here are some examples:

“I’m bored, what is there to do today?”
“I would like to take me and my two kids somewhere warm because spring break starts next week. Where should I take them?”
“My friend will be moving to Atlanta soon for a job. What fun events do they have going on during the weekends?”

From these sample queries, it looks like we’ll need to use an events API and a hotels API for localized information. Next, let’s set up our development environment.

Notebook setup

To use Gemini 1.5 Pro for development, you’ll need to either create or use an existing project in Google Cloud. Follow the official instructions that are linked here before continuing. Working in a Jupyter notebook environment is one of the easiest way to get started developing with Gemini 1.5 Pro. You can either use Google Colab or follow along in your own local environment.

First, you’ll need to install the latest version of the Vertex AI SDK for Python, import the necessary modules, and initialize the Gemini model:

1. Add a code cell to install the necessary libraries. This demo notebook requires the use of the google-cloud-aiplatform>=1.52 Python module.

code_block: <ListValue: [StructValue([(‘code’, ‘!pip3 install –upgrade –user “google-cloud-aiplatform>=1.52″rn!pip3 install vertexai’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e6479f40>)])]>

2. Add another code cell to import the necessary Python packages.

code_block: <ListValue: [StructValue([(‘code’, ‘import vertexairnfrom vertexai.preview.generative_models import GenerativeModel, FunctionDeclaration, Tool, HarmCategory, HarmBlockThreshold, Content, Partrnrnimport requestsrnimport osrnfrom datetime import date’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e6479c70>)])]>

3. Now we can initialize Vertex AI with your exact project ID. Enter your information in between the variable quotes so you can reuse them. Uncomment the gcloud authentication commands if necessary.

code_block: <ListValue: [StructValue([(‘code’, ‘PROJECT_ID = “” #@param {type:”string”}rnLOCATION = “” #@param {type:”string”}rnrn# !gcloud auth login rn# !gcloud config set project $PROJECT_IDrn# !gcloud auth application-default loginrnrnvertexai.init(project=PROJECT_ID, location=LOCATION)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e64796a0>)])]>

API key configuration

For this demo, we will also be using an additional API to generate information for the events and hotels. We’ll be using Google’s SerpAPI for both, so be sure to create an account and select a subscription plan that fits your needs. This demo can be completed using their free tier. Once that’s done, you’ll find your unique API key in your account dashboard.

Once you have the API keys, you can pass them to the SDK in one of two ways:

Put the key in the GOOGLE_API_KEY environment variable (where the SDK will automatically pick it up from there)
Pass the key using genai.configure(api_key = . . .)

Navigate to https://serpapi.com and replace the contents of the variable below between the quotes with your specific API key:

code_block: <ListValue: [StructValue([(‘code’, ‘SERP_API_KEY = os.environ.get(“SERP API”, “your_serp_api_key”)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e6479d00>)])]>

Defining custom functions for function calling

In this step, you’ll define custom functions in order to pass them to Gemini 1.5 Pro and incorporate the API outputs back into the model for more accurate responses. We’ll first define a function for the events API.

To use function calling, pass a list of functions to the tools parameter when creating a generative model. The model uses the function name, docstring, parameters, and parameter type annotations to decide if it needs the function to best answer a prompt.

code_block: <ListValue: [StructValue([(‘code’, ‘def event_api(query: str, htichips: str = “date:today”):rn URL = f”https://serpapi.com/search.json?api_key={SERP_API_KEY}&engine=google_events&q={query}&htichips={htichips}&hl=en&gl=us”rn response = requests.get(URL).json()rn return response[“events_results”]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e64791c0>)])]>

Now we will follow the same format to define a function for the hotels API.

code_block: <ListValue: [StructValue([(‘code’, ‘def hotel_api(query:str, check_in_date:str, check_out_date:int, hotel_class:int = 3, adults:int = 2):rn URL = f”https://serpapi.com/search.json?api_key={SERP_API_KEY}&engine=google_hotels&q={query}&check_in_date={check_in_date}&check_out_date={check_out_date}&adults={int(adults)}&hotel_class={int(hotel_class)}&currency=USD&gl=us&hl=en”rn response = requests.get(URL).json()rn rn return response[“properties”]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e6479d60>)])]>

Declare the custom function as a tool

The function declaration below describes the function for the events API. It lets the Gemini model know this API retrieves event information based on a query and optional filters.

code_block: <ListValue: [StructValue([(‘code’, ‘event_function = FunctionDeclaration(rn name = “event_api”,rn description = “Retrieves event information based on a query and optional filters.”,rn parameters = {rn “type”:”object”,rn “properties”: {rn “query”:{rn “type”:”string”,rn “description”:”The query you want to search for (e.g., ‘Events in Austin, TX’).”rn },rn “htichips”:{rn “type”:”string”,rn “description”:”””Optional filters used for search. Default: ‘date:today’.rn rn Options:rn – ‘date:today’ – Today’s eventsrn – ‘date:tomorrow’ – Tomorrow’s eventsrn – ‘date:week’ – This week’s eventsrn – ‘date:weekend’ – This weekend’s eventsrn – ‘date:next_week’ – Next week’s eventsrn – ‘date:month’ – This month’s eventsrn – ‘date:next_month’ – Next month’s eventsrn – ‘event_type:Virtual-Event’ – Online eventsrn “””,rn }rn },rn “required”: [rn “query”rn ]rn },rn)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e6479c40>)])]>

Again, we will follow the same format for the hotels API.

code_block: <ListValue: [StructValue([(‘code’, ‘hotel_function = FunctionDeclaration(rn name=”hotel_api”,rn description=”Retrieves hotel information based on location, dates, and optional preferences.”,rn parameters= {rn “type”:”object”,rn “properties”: {rn “query”:{rn “type”:”string”,rn “description”:”Parameter defines the search query. You can use anything that you would use in a regular Google Hotels search.”rn },rn “check_in_date”:{rn “type”:”string”,rn “description”:”Check-in date in YYYY-MM-DD format (e.g., ‘2024-04-30’).”rn },rn “check_out_date”:{rn “type”:”string”,rn “description”:”Check-out date in YYYY-MM-DD format (e.g., ‘2024-05-01’).”rn },rn “hotel_class”:{rn “type”:”integer”,rn “description”:”””hotel class.rnrnrn Options:rn – 2: 2-starrn – 3: 3-starrn – 4: 4-starrn – 5: 5-starrn rn For multiple classes, separate with commas (e.g., ‘2,3,4’).”””rn },rn “adults”:{rn “type”: “integer”,rn “description”: “Number of adults. Only integers, no decimals or floats (e.g., 1 or 2)”rn }rn },rn “required”: [rn “query”,rn “check_in_date”,rn “check_out_date”rn ]rn },rn)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e6479bb0>)])]>

Consider configuring safety settings for the model

Safety settings in Gemini exist to prevent the generation of harmful or unsafe content. They act as filters that analyze the generated output and block or flag anything that might be considered inappropriate, offensive, or dangerous. This is good practice when you’re developing using generative AI content.

code_block: <ListValue: [StructValue([(‘code’, ‘generation_config = {rn “max_output_tokens”: 128,rn “temperature”: .5,rn “top_p”: .3,rn}rnrnsafety_settings = {rn HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,rn HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH,rn HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,rn HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_ONLY_HIGH,rn}’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e64797f0>)])]>

Pass the tool and start a chat

Here we’ll be passing the tool as a function declaration and starting the chat with Gemini. Using the chat.send_message(“ . . . “) functionality, you can send messages to the model in a conversation-like structure.

code_block: <ListValue: [StructValue([(‘code’, ‘tools = Tool(function_declarations=[event_function, hotel_function])rnrnmodel = GenerativeModel(rn model_name = ‘gemini-1.5-pro-001′, rn generation_config = generation_config, rn safety_settings = safety_settings, rn tools = [tools])rnchat = model.start_chat()rnresponse = chat.send_message(“Hello”)rnprint(response.text)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e64790a0>)])]>

Build the agent

Next we will create a callable hashmap to map the tool name to the tool function so that it can be called within the agent function. We will also implement prompt engineering (mission prompt) to better prompt the model to handle user inputs and equip the model with the datetime.

code_block: <ListValue: [StructValue([(‘code’, ‘CallableFunctions = {rn “event_api”: event_api,rn “hotel_api”: hotel_apirn}rnrntoday = date.today()rnrndef mission_prompt(prompt:str):rn return f”””rn Thought: I need to understand the user’s request and determine if I need to use any tools to assist them.rn Action: rn rn – If the user’s request needs following APIs from available ones: weather, event, hotel, and I have all the required parameters, call the corresponding API.rn – Otherwise, if I need more information to call an API, I will ask the user for it.rn – If the user’s request doesn’t need an API call or I don’t have enough information to call one, respond to the user directly using the chat history.rn – Respond with the final answer onlyrnrn [QUESTION] rn {prompt}rnrn [DATETIME]rn {today}rnrn “””.strip()rnrnrnrndef Agent(user_prompt):rn prompt = mission_prompt(user_prompt)rn response = chat.send_message(prompt)rn tools = response.candidates[0].function_callsrn while tools:rn for tool in tools:rn function_res = CallableFunctions[tool.name](**tool.args)rn response = chat.send_message(Content(role=”function_response”,parts=[Part.from_function_response(name=tool.name, response={“result”: function_res})]))rn tools = response.candidates[0].function_callsrn return response.text’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e6479730>)])]>

Test the agent

Below are some sample queries you can try to test the chat capabilities of the agent. Don’t forget to test out a query of your own!

code_block: <ListValue: [StructValue([(‘code’, ‘response1 = Agent(“Hello”)rnprint(response1)rnrnresponse2 = Agent(“What events are there to do in Atlanta, Georgia?”)rnprint(response2)rnrnresponse3 = Agent(“Are there any hotel avaiable in Midtown Atlanta for this weekend?”)rnprint(response3)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e47e6479640>)])]>

Wrapping up

That’s all! Gemini 1.5 Pro’s function calling and grounding features enhances its capabilities, enabling developers to connect to external tools and improve model results. This integration enables Gemini models to provide up-to-date information while minimizing hallucinations.

If you’re looking for more hands-on tutorials and code examples, check out some of Google’s Codelabs (such as How to Interact with APIs Using Function Calling in Gemini) to guide you through examples of building a beginner function calling application.