2025 02 13

GCP – Operationalizing generative AI apps with Apigee

Generative AI is now well beyond the hype and into the realm of practical application. But while organizations are eager to build enterprise-ready gen AI solutions on top of large language models (LLMs), they face challenges in managing, securing, and scaling these deployments, especially when it comes to APIs. As part of the platform team, you may already be building a unified gen AI platform. Some common questions you might have are:

How do you ensure security and safety for your organization? As with any API, LLM APIs represent an attack vector. What are the LLM-specific considerations you need to worry about?
How do you stay within budget when your LLM adoption grows, while ensuring that each team has appropriate LLM capacity they need to continue to innovate and make your business more productive?
How do you put the right observability capabilities in place to understand your usage patterns, help troubleshoot issues, and capture compliance data?
How do you give end users of your gen AI applications the best possible experience, i.e., provide responses from the most appropriate models with minimal downtime?

Apigee, Google Cloud’s API management platform, has enabled our customers to address API challenges like these for over a decade. Here is an overview of the AI-powered digital value chain leveraging Apigee API Management.

1 ai-digital-value-chain — Figure 1: AI-powered Digital Value chain

Gen AI, powered by AI agents and LLMs, is changing how customers interact with businesses, creating a large opportunity for any business. Apigee streamlines the integration of gen AI agents into applications by bolstering their security, scalability, and governance through features like authentication, traffic control, analytics, and policy enforcement. It also manages interactions with LLMs, improving security and efficiency. Additionally, Application Integration, an Integration-Platform-as-a-Service solution from Google cloud, offers pre-built connectors that allow gen AI agents to easily connect with databases and external systems, helping them fulfill user requests.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0ad423a3d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

This blog details how Apigee’s customers have been using the product to address challenges specific to LLM APIs. We’re also releasing a comprehensive set of reference solutions that enable you to get started on addressing these challenges yourself with Apigee. You can also view a webinar on the same topic, complete with product demos.

Apigee as a proxy for agents

AI agents leverage capabilities from LLMs to accomplish tasks for end-users. These agents can be built using a variety of tools — from no-code and low-code platforms, to full-code frameworks like LangChain or LlamaIndex. Apigee acts as an intermediary between your AI application and its agents. It enhances security by allowing you to defend your LLM APIs against the OWASP Top 10 API Security risks, manages user authentication and authorization, and optimizes performance through features like semantic caching. Additionally, Apigee enforces token limits to control costs and can even orchestrate complex interactions between multiple AI agents for advanced use cases.

Apigee as a gateway between LLM application and models

Depending on the task at hand, your AI agents might need to tap into the power of different LLMs. Apigee simplifies this by intelligently routing and managing failover of requests to the most suitable LLM using Apigee’s flexible configurations and templates. It also streamlines the onboarding of new AI applications and agents while providing robust access control for your LLMs. Beyond LLMs, agents often need to connect with databases and external systems to fully address users’ needs. Apigee’s robust API Management platform enables these interactions via managed APIs, and for more complex integrations, where custom business logic is required, you can leverage Google Cloud’s Application Integration platform.

It’s important to remember that these patterns aren’t one-size-fits-all. Your specific use cases will influence the architecture pattern for an agent and LLM interaction. For example, you might not always need to route requests to multiple LLMs. In some scenarios, you could connect directly to databases and external systems from the Apigee agent proxy layer. The key is flexibility — Apigee lets you adapt the architecture to match your exact needs.

Now let’s break down the specific areas where Apigee helps one by one:

AI safety
For any API managed with Apigee, you can call out to Model Armor, Google Cloud’s model safety offering that allows you to inspect every prompt and response to protect you against potential prompt attacks and help your LLMs respond within the guardrails you set. For example, you can specify that your LLM application does not provide answers about financial or political topics.

Latency and cost
Model response latency continues to be a major factor when building LLM-powered applications, and this will only get worse as more reasoning happens during inference. With Apigee, you can implement a semantic cache that allows you to cache responses to any model for semantically similar questions. This dramatically reduces the time end users need to wait for a response.

In this solution, Vertex AI Vector Search and Vertex AI Embeddings API process your prompts and help you identify similar prompts for which you can then retrieve a response from Apigee’s Cache. See Semantic Cache in Apigee reference solution to get started.

Performance
Different models are good at different things. For example, Gemini Pro models provide the highest quality answers, while Gemini Flash models excel at speed and efficiency. You can route users’ prompts to the best model for the job, depending on the use case or application.

You can decide which model to use by specifying it in your API call and Apigee routes it to your desired model while keeping a consistent API contract. See this reference solution to get started.

Distribution and usage limits
With Apigee you can create a unified portal with self-service access to all the models in your organization. You can also set up usage limits by individual apps and developers to maintain capacity for those who need it, while also controlling overall costs. See how you can set up usage limits in Apigee using LLM token counts here.

Availability
Due to the high computational demands of LLM inference, model providers regularly restrict the number of tokens you can use in a certain time window. If you reach a model limit, requests from your applications will get throttled, which could lead to your end users being locked out of the model. In order to prevent this, you can implement a circuit breaker in Apigee so that requests are re-routed to a model with available capacity. See this reference solution to get started.

Reporting
As a platform team, you need visibility into usage of the various models you support as well as which apps are consuming how many tokens. You might want to use this data for internal cost reporting or to optimize. Whatever your motivation, with Apigee, you can build dashboards that let you see usage based on the actual tokens counts — the currency of LLM APIs. This way you can see the true usage volume across your applications. See this reference solution to get started.

Auditing and troubleshooting
Perhaps you need to log all interactions with LLMs (prompts, responses, RAG data) to meet compliance or troubleshooting requirements. Or perhaps you want to analyze response quality to continue to improve your LLM applications. With Apigee you can safely log any LLM interaction with Cloud Logging, de-identify it, and inspect it from a familiar interface. Get started here.

Security
With APIs increasingly seen as an attack surface, security is paramount to any API program. Apigee can act as a secure gateway for LLM APIs, allowing you to control access with API keys, OAuth 2.0, and JWT validation. This helps you enforce using enterprise security standards to authenticate users and applications that interact with your models. Apigee can also help prevent abuse and overload by enforcing rate limits and quotas, safeguarding LLMs from malicious attacks and unexpected traffic spikes.

In addition to these security controls, you can also use Apigee to control the model providers and models that can be used. You can do this by creating policies that define the models that can be accessed by which users or applications. For example, you could create a policy that only allows certain users to access your most powerful LLMs, or you could create a policy that only allows certain applications to access your LLMs for specific tasks. This gives you granular control over how your LLMs are used, so they are only used for their intended purposes.

But Apigee can offer even more advanced protection with its Advanced API Security functionality. This allows you to defend your LLM APIs against the OWASP Top 10 API Security vulnerabilities.

By integrating Apigee with your LLM architecture, you create a secure and reliable environment for your AI applications to thrive.

Ready to unlock the full potential of gen AI?

Explore Apigee’s comprehensive capabilities for operationalizing AI and start building secure, scalable, and efficient gen AI solutions today! Visit our Apigee generative AI samples page to learn more and get started, watch a webinar with more details, or contact us here!

GCP – Operationalizing generative AI apps with Apigee

Apigee as a proxy for agents

Apigee as a gateway between LLM application and models

Ready to unlock the full potential of gen AI?

Google Cloud Apigee named a Leader in the 2024 Gartner® Magic Quadrant™ for API Management

Related Posts

GCP – Innovate with Confidential Computing: Attestation, Live Migration on Google Cloud

GCP – Graduating the inaugural Google for Startups Accelerator: AI First cohort in the UK

GCP – Chrome brings personal and work separation to iOS users and more enterprise protections to mobile