2025 05 02

GCP – Palo Alto Networks’ journey to productionizing gen AI

At Google Cloud, we empower businesses to accelerate their generative AI innovation cycle by providing a path from prototype to production. Palo Alto Networks, a global cybersecurity leader, partnered with Google Cloud to develop an innovative security posture control solution that can answer complex “how-to” questions on demand, provide deep insights into risk with just a few clicks, and guide users through remediation steps.

Using advanced AI services, including Google’s Gemini models and managed Retrieval Augmented Generation (RAG) services such as Google Cloud’s Vertex AI Search, Palo Alto Networks had an ideal foundation for building and deploying gen AI-powered solutions.

The end result was Prisma Cloud Co-pilot, the Palo Alto Networks Prisma Cloud gen AI offering. It helps simplify cloud security management by providing an intuitive, AI-powered interface to help understand and mitigate risks.

Technical challenges and surprises

The Palo Alto Networks Prisma Cloud Co-pilot journey began in 2023 and launched in October 2024. During this time, Palo Alto Networks witnessed Google’s AI models evolve rapidly, from Text Bison (PaLM) to Gemini Flash 1.5. That rapid pace of innovation meant that each iteration brought new capabilities, necessitating a development process that could quickly adapt to the evolving landscape.

To effectively navigate the dynamic landscape of evolving gen AI models, Palo Alto Networks established robust processes that proved invaluable to their success:

Prompt engineering and management: Palo Alto Networks used Vertex AI to help manage prompt templates and built a diverse prompt library to generate a wide range of responses. To rigorously test each new model’s capabilities, limitations, and performance across various tasks, Palo Alto Networks and Google Cloud team systematically created and updated prompts for each submodule. Additionally, Vertex AI’s Prompt Optimizer helped streamline the tedious trial-and-error process of prompt engineering.
Intent recognition: Palo Alto Networks used the Gemini Flash 1.5 model to develop an intent recognition module, which efficiently routed user queries to the relevant co-pilot component. This approach provided users with many capabilities through a unified and lightweight user experience.
Input guardrails: Palo Alto Networks created guardrails as a first line of defense against unexpected, malicious, or simply incorrect queries that could compromise the functionality and experience of the chatbot. These guardrails maintain the chatbot’s intended functionality by preventing known prompt injection attacks, such as circumventing system instructions; and restricting chatbot usage to its intended scope. Guardrails were created to detect if user queries are restricted to responses within the predefined domain of general cloud security, risks, and vulnerabilities to prevent unintended use. Any topics outside this scope did not receive a response from the chatbot. Additionally, since the chatbot was designed for proprietary code generation for Palo Alto Networks systems to query internal systems, requests for general-purpose code generation similarly did not receive a response.
Evaluation dataset curation: A robust and representative evaluation dataset serves as a foundation to accurately and quickly assess the performance of gen AI models. The Palo Alto Networks team took great care to choose high-quality evaluation data and keep it relevant by constantly refreshing it with representative questions and expert-validated answers. The accuracy and reliability of the evaluation dataset was sourced and validated directly from Palo Alto Networks subject matter experts.
Automated evaluation: In collaboration with Google Cloud, Palo Alto Networks developed an automated evaluation pipeline using Vertex AI’s gen AI evaluation service. This pipeline allowed Palo Alto Networks to rigorously scale their assessment of different gen AI models, and benchmark those models using custom evaluation metrics while focusing on key performance indicators such as accuracy, latency, and consistency of responses.
Human evaluator training and red teaming: Palo Alto Networks invested in training their human evaluation team to identify and analyze specific loss patterns and provide detailed answers on a broad set of custom rubrics. This allowed them to pinpoint where a model’s response was inadequate and provide insightful feedback on model performance, which then guided model selection and refinement.

The team also conducted red teaming exercises focused on key areas, including:
- Manipulating the co-pilot: Can the co-pilot be tricked into giving bad advice by feeding it false information?
- Extracting sensitive data: Can the co-pilot be manipulated into revealing confidential information or system details?
- Bypassing security controls: Can the co-pilot be used to craft attacks that circumvent existing security measures?
Load testing: To ensure the gen AI solutions met real-time demands, Palo Alto Networks actively load tested them, working within the pre-defined QPM (query per minute) and latency parameters of Gemini models. They simulated user traffic scenarios to find the optimal balance between responsiveness and scalability using provisioned throughput, which helped ensure a smooth user experience even during peak usage.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e3e91549430>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Operational and business challenges

Operationalizing gen AI can introduce complex challenges across multiple functions, especially for compliance, legal, and information security. Evaluating ROI for gen AI solutions also requires new metrics. To address these challenges, Palo Alto Networks implemented the following techniques and processes:

Data residency and regional ML processing: Since many Palo Alto Networks customers need a regional approach for ML processing capabilities, we prioritized regional machine learning processing to help enable customer compliance with data residency needs and regional regulations, if applicable.

Where Google does not offer an AI data center that matched Prisma Cloud data center locations, customers were able to choose having their data processed in the U.S. before gaining access to the Prisma Cloud Co-pilot. We implemented strict data governance policies and used Google Cloud’s secure infrastructure to help safeguard sensitive information and uphold user privacy.

Deciding KPIs and measuring success for gen AI apps: The dynamic and nuanced nature of gen AI applications demands a bespoke set of metrics tailored to capture its specific characteristics and comprehensively evaluate its efficacy. There are no standard metrics that work for all use cases. The Prisma Cloud AI Co-pilot team relied on technical and business metrics to measure how well the system was operating.

Technical metrics, such as recall, helped to measure how thoroughly the system fetches relevant URLs when answering questions from documents, and to help increase the accuracy of prompt responses and provide source information for users.
Customer experience metrics, such as measuring helpfulness, relied on explicit feedback and telemetry data analysis. This provided deeper insights into user experience that resulted in increased productivity and cost savings.

Collaborating with security and legal teams: Palo Alto Networks brought in legal, information security, and other critical stakeholders early in the process to identify risks and create guardrails for issues including, but not limited to: information security requirements, elimination of bias in the dataset, appropriate functionality of the tool, and data usage in compliance with applicable law and contractual obligations.

Given customer concerns, enterprises must prioritize clear communication around data usage, storage, and protection. By collaborating with legal and information security teams early on to create transparency in marketing and product communications, Palo Alto Networks was able to build customer trust and help ensure they have a clear understanding of how and when their data is being used.

Ready to get started with Vertex AI ?

The future of generative AI is bright, and with careful planning and execution, enterprises can unlock its full potential. Explore your organization’s AI needs through practical pilots in Vertex AI, and rely on Google Cloud Consulting for expert guidance.

Learn more about Vertex AI customer use cases and stories.
Dive into our generative AI repository and explore tuning notebooks and samples.

GCP – Palo Alto Networks’ journey to productionizing gen AI

Technical challenges and surprises

Operational and business challenges

Ready to get started with Vertex AI ?

Related Posts

AWS – Amazon SNS now supports delivery to Amazon Data Firehose in three additional AWS Regions

AWS – AWS Fargate now supports SOCI Index Manifest v2 for greater deployment consistency

AWS – Amazon Rekognition Face Liveness launches accuracy improvements and new challenge setting for improved UX