A few months back, we kicked-off Network Performance Decoded, a series of whitepapers sharing best practices for network performance and benchmarking. Today, we’re dropping the second installment – and this one’s about some of those pesky performance limiters you’re bound to run into. In this blog, we’re giving you the inside scoop on how to tackle these issues head-on.
First up: A Brief Look at Network Performance Limiters
Mbit/s isn’t everything: It’s not just about raw speed. How you package your data (packet size) seriously impacts throughput and CPU usage.
Bigger packets, better results: Larger packets mean less overhead per packet, which translates to better throughput and less strain on your CPU.
Offload for a TCP boost: TCP Segmentation Offload (TSO) and Generic Receive Offload (GRO) let your network interface card do some of the heavy lifting, freeing up your CPU and giving your TCP throughput a nice bump — even with smaller packets.
Watch out for packet-per-second limits: Smaller packets can sometimes hit a bottleneck because of how many packets your system can handle per second.
At a constant bitrate, bigger packets are more efficient: Even with a steady bitrate, larger packets mean fewer packets overall, which leads to less CPU overhead and a more efficient network.
Get a handle on these concepts and you’ll be able to fine-tune your network for better performance and efficiency, no matter the advertised speed.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 to try Google Cloud networking’), (‘body’, <wagtail.rich_text.RichText object at 0x3ed0694ed460>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/products?#networking’), (‘image’, None)])]>
Next: A Brief Look at Round Trip Time
This whitepaper dives into TCP Round Trip Time (RTT) — a key network performance metric. You’ll learn how it’s measured, what can throw it off, and how to use that info to troubleshoot network issues like a pro. We’ll show you how the receiving application’s behavior can mess with RTT measurements, and call out some important nuances to consider. For example, TCP RTT measurements do not include the time TCP may spend resending lost segments, which your applications see as latency. Lastly, we’ll show how you can use tools like netperf (also included in our PerfKit Benchmarker toolkit) to get an end-to-end picture.
Finally: A Brief Look at Path MTU Discovery
Last but not least, this whitepaper breaks down Path MTU discovery, a process that helps prevent IP fragmentation. Understanding how networks handle packet sizes can help you optimize your network setup, avoid those frustrating fragmentation issues, and troubleshoot effectively. We’ll even walk you through common problems — like those pesky ICMP blocks leading to large packets being dropped without any notification to the sender — and how to fix them. Plus, you’ll learn the difference between Maximum Transmission Unit (MTU) and Maximum Segment Size (MSS) — knowledge that’ll come in handy when you’re configuring your network and troubleshooting packet size problems.
Stay tuned!
These resources are part of our mission to create an open, collaborative space for network performance benchmarking and troubleshooting. The examples might be from Google Cloud, but the ideas apply everywhere – regardless of where your workloads may be running. You can find all our whitepapers (past, present, and future) on our webpage. Keep an eye out for more!
It’s been more than two and a half years since we introduced AlloyDB for PostgreSQL, our 100% PostgreSQL-compatible database that offers superior performance, availability, and scale. AlloyDB reimagines PostgreSQL with Google’s cutting-edge technology. It includes a scale-out architecture, built-in analytics, and AI/ML-powered management for a truly modern data experience, and is fully managed so you can focus on your application.
PostgreSQL has long been a favorite among developers for its flexibility, reliability, and robust feature set. AlloyDB brings PostgreSQL to the next level with faster performance, stronger functionality, better migration options, and smarter AI capabilities.
As 2024 comes to a close, it felt like a great time to celebrate with a snazzy AlloyDB overview video and summarize the AlloyDB’s key benefits in an infographic. Whether you’re new to the product or have tried it already, take a look to make sure you’re taking advantage of every benefit. You can also download our in-depth AlloyDB e-book for a deeper dive.
Ready to dive deeper?
Want to learn more about how AlloyDB can revolutionize your PostgreSQL experience? Download our in-depth AlloyDB e-book and discover the transformative ways AlloyDB is redefining what’s possible. You’ll uncover:
How AlloyDB delivers superior transactional performance at half the cost
Why AlloyDB is the best database service for building gen AI apps
The flexibility of running AlloyDB anywhere, on any platform
How AI-driven development and operations can simplify your database journey
The power of real-time business insights with AlloyDB’s columnar engine
The future of PostgreSQL is here, and it’s built for you. Start building your next great app with a 30-day AlloyDB free trial.
Think about your favorite apps – the ones that deliver instant results from massive amounts of data. They’re likely powered by vector search, the same technology that fuels generative AI.
Vector search is crucial for developers who need to build applications that are lightning-fast, handle massive datasets, and remain cost-effective, even with huge spikes in traffic. But building and deploying this technology can be a real challenge, especially for gen AI applications that demand incredible flexibility, scale, and speed. In a previous blog post, we showed you how to create production-ready AI applications with features like easy filtering, automatic scaling, and seamless updates.
Today, we’ll share how Vertex AI’s vector search is tackling these challenges head-on. We’ll explore real-world performance benchmarks demonstrating incredible speed and scalability – all in a cost-effective way.
aside_block
<ListValue: [StructValue([(‘title’, ‘Start building today and enjoy a $1,000 credit’), (‘body’, <wagtail.rich_text.RichText object at 0x3e80a7f0f4f0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
How does Vertex AI vector search work?
Imagine you own a popular online store: to keep shoppers happy, your search engine needs to instantly sift through millions of products and deliver relevant results, even during peak shopping seasons. Vector search is a technique for finding similar items within massive datasets. It works by converting data, like text or images, into numerical representations called embeddings. These embeddings capture the semantic meaning of the data, allowing for more accurate and relevant search results.
For example, imagine your customers are searching for a “navy blue dress shirt.” A keyword search might miss products labeled “midnight blue button-down,” even though they’re essentially the same. Vector search does a better job of surfacing the right products because it uses embeddings to understand the relationships between words and concepts.
You can use it for a wide range of applications, like the e-commerce example shared above, or as a retrieval-augmented generation (RAG) system for generative AI agents, where it grounds responses in your data or recommendation systems that deliver personalized suggestions based on user preferences.
As Xun Wang, Chief Technology Officer of Bloomreach, recently said, “Bloomreach has made the strategic decision to replace OpenAI with Google Vertex AI Embeddings and Vertex AI vector search. Google’s platform delivers clear advantages in performance, scalability, reliability and cost optimization. We’re confident this move will drive significant benefits and we’re thrilled to embark on this new partnership.”
Real-world impact of Vertex AI’s vector search
Our customers are achieving remarkable results with vector search. Here are four standout ways this technology is helping them build high-performance gen AI apps.
#1: The fastest vector search for highly responsive applications
To meet customer expectations, fast response times are critical across search, recommendation systems, and gen AI applications. Studies have consistently found that faster response times directly contribute to an increase in revenue, conversion, and retention.
Vector search is engineered for incredibly low latency at high quality, while maintaining cost-effectiveness. In our testing, vector search was able to maintain ultra-low latency (9.6ms at P95) and high recall (0.99) while scaling up to 5K queries per second on a dataset of one billion vectors. By achieving such low latencies, Vertex AI vector search ensures that users receive fast, relevant responses, no matter how large the dataset or how many parallel requests hit the system.
As Yuri M. Brovman from eBay wrote in a recent blog post, “[eBay’s vector search] hit a real-time read latency of less than 4ms at 95%, as measured server-side on the Google Cloud dashboard for vector search”.
#2: Massively scalable for all application sizes
Another important consideration in production-ready applications is the ability of your application to support growing data sizes and user bases.
This means it can easily accommodate sudden spurts in demand, making it massively scalable for applications of any size. Vertex AI vector search can scale up to support billions of embeddings and hundreds of thousands of queries per second while maintaining ultra low latency.
#3: Up to 4X more cost effective
Vertex AI vector search not only maintains performance at scale, but it is also 4x more cost effective than competing solutions, especially for high performance applications. With Vertex AI vector search’s ANN index, you will need significantly less compute for fast and relevant results at scale.
Dataset
QPS
Recall
Latency (P95)
Glove 100 / 100 dim
44,876
0.96
3ms
OpenAI 5M / 1536 dim
2,981
0.96
9ms
Cohere 10M / 768 dim
3,144
0.96
7ms
LAION 100M / 768 dim
2,997
0.96
9ms
BigANN 10M / 128 dim
33,921
0.97
3.5 ms
BigANN 100M / 128 dim
9,871
0.97
7.2 ms
BigANN 1B / 128 dim
4,967
0.99
9.6 ms
Vertex AI vector search’s real-world benchmarks of public datasets by using 2 replicas of n2d machines. Latency was measured at the provided QPS; vector search can scale up beyond this throughput by adding additional replicas.
#4: It’s highly configurable for all application types
In some scenarios, developers might be interested in trading-off latency for higher recall (or vice versa). For example, an e-commerce website might prioritize speed for quick product suggestions, while a research database might prioritize comprehensive results even if it takes slightly longer. Vector search enables tuning these parameters and hitting higher recall or higher latency, to match business needs.
Additionally, vector search supports auto-scaling – and when load on the deployment increases, it scales to maintain performance. We measured auto-scaling and found that vector search was able to maintain consistent latency with high recall, as QPS increased from 1K to 5K.
Developers can also increase the number of replicas in order to handle higher throughput, as well as pick different machine types to balance cost and performance. This flexibility makes vector search suitable for a wide range of applications beyond semantic search, including recommendation systems, chatbots, multimodal search, anomaly detection, and image similarity matching.
Going further with hybrid search
Dense embedding-based semantic search, while excellent at understanding meaning and context, has a weak point: it cannot find items that the embedding model can’t make sense of. Items like product numbers, company’s internal codenames or newly coined terms, aren’t found by semantic search because the embedding model doesn’t understand their meanings.
With Vertex AI vector search’s hybrid search, building this type of sophisticated search engine is no longer a daunting task. Developers can easily create a single index that incorporates both dense and sparse embeddings, representing semantic meaning and keyword relevance respectively. This streamlined approach allows for rapid development and deployment of high-performance search applications, fully customized to meet specific business needs.
As Nicolas Presta, Sr. Engineering Manager at Mercado Libre wrote, “Most of our successful sales start with a search, so it is important that we give precise results that best match a user’s query. These complex searches are getting better with the addition of the items retrieved from vector search, which will ultimately increase our conversion rate. Hybrid search will unlock more opportunities to uplevel our search engine so that we can create the best customer experience while improving our bottom line.” – Nicolas Presta, Sr. Engineering Manager at Mercado Libre.
In today’s fast-paced digital world, businesses are constantly seeking innovative ways to leverage cutting-edge technologies to gain a competitive edge. AI has emerged as a transformative force, empowering organizations to automate complex processes, gain valuable insights from data, and deliver exceptional customer experiences.
However, with the rapid adoption of AI comes a significant challenge: managing the associated cloud costs. As AI — and really cloud workloads in general — grow and become increasingly sophisticated, so do their associated costs and potential for overruns if organizations don’t plan their spend carefully.
These unexpected charges can arise from a variety of factors:
Human error and mismanagement: Misconfigurations in cloud services (e.g., accidentally enabling a higher-tiered service or changing scaling settings) can inadvertently drive up costs.
Unexpected workload changes: Spikes in traffic or usage, or changes in application behavior (e.g., marketing campaign or sudden change in user activity) can lead to unforeseen service charges.
Lack of proactive governance and cost transparency: Without a robust cloud FinOps framework, it’s easy for cloud spending to spiral out of control, leading to significant financial overruns.
Organizations have an opportunity to proactively manage their cloud costs and avoid budget surprises. By implementing real-time cost monitoring and analysis, they can identify and address potential anomalies before they result in unexpected expenses. This approach empowers businesses to maintain financial control and support their growth objectives.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e613d351400>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
As one of the world’s leading cybersecurity organizations — serving more than 70,000 organizations in 150 countries — Palo Alto Networks must bring a level of vigilance and awareness to its digital business. Since it experiments often with new technologies and tools and deals with spikes in activity when threat actors mount an attack, the chances for anomalous spending run higher than most.
Recognizing the need of all its customers to effectively manage its cloud spend, Google Cloud launched the Cost Anomaly Detection as part of the Cost Management toolkit. It does not require any setup and automatically detects anomalies for your Google Cloud projects and empowers teams with details to alert and provide root-cause analysis. While Palo Alto Networks used this feature for a while and found it useful, it eventually realized the need for a customized solution. Due to stringent custom requirements, it wanted a service that could identify anomalies based on labels, such as applications or products that span across Google Cloud projects, and provide more control over anomaly variables that are detected and alerted to its teams. Creating a consistent experience across its multicloud environments was also a priority.
Palo Alto Networks’ purpose-built solution tackles cloud management and AI costs head-on, helping the organization to be proactive at scale. It is designed to enhance cost transparency by providing real-time alerts to product owners, so they can make informed decisions and act quickly. The solution also delivers automated insights at scale, freeing up valuable time for the team to focus on innovation.
By removing the worry of unexpected costs, Palo Alto Networks can now confidently embrace new cloud and AI workloads, accelerating its digital transformation journey.
Lifecycle of an anomaly
For Palo Alto Networks, anomalies are unexpected events or patterns that deviate from the norm. In a cloud environment, anomalies can indicate anything from a simple misconfiguration to a full-blown security breach. That’s why it’s critical to have a system in place to detect, analyze, and mitigate anomalies before they can cause significant damage.
This flowchart illustrates the typical lifecycle of an anomaly, broken down into three key stages:
The following sections will take a deeper dive into how Palo Alto Networks used Google Cloud to build its custom AI-powered anomaly solution to address each of these stages.
1. Detection
The first step is to identify potential anomalies.Palo Alto Networks partnered with Google Cloud Consulting to train the ARIMA+ model with billing data from its applications using BigQuery ML (BQML). The team chose this model for its great results for time-series billing data, its ability to customize hyper parameters, and its overall effective cost of operation at scale.
The ARIMA+ model allowed Palo Alto Networks to generate a baseline spend with upper and lower bounds for its cost anomaly solution. The team also tuned the model using Palo Alto Networks’ historic billing data, enabling it to inherently understand factors like seasonality, common spikes and dips, migration patterns, and more. If the spend exceeds the upper bound created by the model, the team can then quantify the business cost impact (both percentage and dollar amount) to determine the severity of the alert to be investigated further.
Looker, Google Cloud’s business intelligence platform, serves as the foundation for custom data modeling and visualization, seamlessly integrating with Palo Alto Networks’ existing billing data infrastructure, which continuously streams into BigQuery multiple times a day. This eliminates the need for additional data pipelines, ensuring the team has the most up-to-date information for analysis.
BigQuery MLempowers Palo Alto Networks with robust capabilities for machine learning model training and inference. By leveraging BQML, the team can build and deploy sophisticated models directly within BigQuery, eliminating the complexities of managing separate machine learning environments. This streamlined approach accelerates the ability to detect and analyze cost anomalies in real time. In this case, Palo Alto Networks trained the ARIMA+ model on the last 13 months of billing data for specific applications on the Net Spend field to capture seasonality, spikes and dips, along with migration patterns and known spikes based on a custom calendar.
To enhance alerting and anomaly management processes, the team also utilizes Google Cloud Pub/Sub and Cloud Run functions. Pub/Sub facilitates the reliable and scalable delivery of anomaly notifications to relevant stakeholders. Cloud Run functions enable custom logic for processing these notifications, including intelligent grouping of similar anomalies to minimize alert fatigue and streamline investigations. This powerful combination allows Palo Alto Networks to respond swiftly and effectively to potential cost issues.
2. Notification and analysis
Once the anomaly is captured, the solution computes the business cost impact and routes alerts to the appropriate application teams through Slack for further investigation. To accelerate root-cause analysis, it synthesizes critical information through text and images to provide all the details about anomaly, pinpointing exactly when it occurred and which SKUs or resources are involved. Application teams can then further analyze this information and, with their application context, quickly arrive at a decision.
Here is an example of snapshot that captured an increased cost in BigQuery that started on July 30th:
The cost anomaly solution automatically gathered all the information related to the flagged anomalies, such as Google Cloud project ID, data, environment, service names andSKUs, along with the cost impact. This data provided much of the necessary context for the application team to act quickly. Here is an example of the Slack alert:
3. Mitigation
Once the root cause is identified, it’s time to take action to mitigate the anomaly. This may involve anything from making a simple configuration change to deploying a hotfix. In some cases, it may be necessary to escalate the issue and involve cross-functional teams.
In the provided example, a cloud hosted tenant encountered a substantial increase in data volume due to a configuration error. This misconfiguration led to unusually high BigQuery usage. As no default BigQuery reservation existed in the newly established region, the system defaulted to the on-demand pricing model, incurring higher costs.
To address this, the team procured 100 baseline slots with a 3-year commitment and implemented autoscaling to accommodate any future spikes without impacting performance. To prevent similar incidents, especially in new regions, a long-term cost governance policy was implemented at the organizational level.
Post incident, the cost anomaly solution generates a blameless post mortem document containing the highlights of the actions taken, the impact of collaboration, and the cost savings achieved through timely detection and mitigation. This document focuses on:
A detailed timeline of events: This list might include when a cost increase was captured, when the team was alerted, and the mitigation plan with short-term and long-term initiatives to prevent this in future.
Actions taken: This description includes details about anomaly detection, the analysis conducted by the application team, and mitigative actions taken.
Preventative strategy: This describes the short-term and long-term plan to avoid similar future incidents.
Cost impact and cost avoidance: These calculations include the overall cost incurred from the anomaly and estimate the additional cost if the issue had not been detected in a timely manner.
A formal communication is then sent out to the Palo Alto Networks application team, including leadership, for further visibility.
From its experience working at scale, Palo Alto Networks has learned to embrace the fact that anomalies are unavoidable in cloud environments. To manage them effectively, a well-defined lifecycle encompassing detection, analysis, and mitigation is crucial. Automated monitoring tools play a key role in identifying potential anomalies, while collaboration across teams is also essential for successful resolution. In particular, the team places huge emphasis on the importance of continuous improvement for optimizing the anomaly management process. For example, they established the reporting dashboard below for long-term continuous governance.
By leveraging the power of AI and partnering with Google Cloud, Palo Alto Networks is enabling businesses to unlock the full potential of AI while ensuring responsible and sustainable cloud spending. With a proactive approach to cost anomaly management, organizations can confidently navigate the evolving landscape of AI, drive innovation, and achieve their strategic goals. Check out the public preview of Cost Anomaly Detection or reach out to Google Cloud Consulting for a customized solution.
Mapping the user experience is one of the most persistent challenges a business can face. Fullstory, a leading behavioral data analytics platform, helps organizations identify pain points and optimize digital experiences by reproducing user sessions and sharing strong analytics highlighting areas for improvement in the customer’s journey. This boosts conversion rates, reduces churn, and enhances customer satisfaction.
AI has made this even stronger. Fullstory’s comprehensive AI-powered autocapture technology, Fullcapture, removes the need for manual instrumentation and uncovers hidden patterns that might otherwise be missed.
Today, we’ll share how Fullstory leverages Vertex AI serving Gemini 1.5 Pro to strengthen their autocapture technology.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6159a5f880>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
How Vertex AI and AI agents help Fullstory measure the user experience
Think of Fullcapture as a video recorder for your website or app, capturing every interaction in detail. Traditional autocapture methods are more like transcription services, logging only selected highlights and often missing the complete picture. With Fullcapture, no user action goes unrecorded, with minimal impact on device performance. Operating server-side, Fullcapture allows for revisiting any aspect of user behavior as needed. If a new signal is required, it can be easily retrieved from the recorded data without affecting client-side performance.
The table below breaks down how Fullcapture goes beyond traditional autocapture capabilities to give users a deeper understanding of their customer data.
By integrating its Fullcapture capabilities with Google’s Vertex AI serving Gemini 1.5 Pro, Fullstory empowers customers to effortlessly analyze this extensive data and focus on what truly matters. Driven by a proactive AI agent, Fullstory enables faster data discovery by highlighting important elements and automatically categorizing user interactions into semantic components, providing even deeper levels of analysis.
AI-powered data discovery
Data discovery is a 6-step process that involves exploring, classifying, and analyzing data from various sources to uncover patterns and extract actionable insights. This process allows users to visually navigate data relationships and apply advanced analytics to optimize business decisions and performance.
To effectively analyze user behavior, businesses need to identify and label key elements on their websites (e.g., buttons, forms). This process can be tedious and time-consuming. Fullstory’s AI agent, powered by Gemini 1.5 Pro, automates this critical task by scraping data from user interactions and making intelligent decisions at various stages—identifying key elements, determining their significance, and autonomously categorizing them. This multi-stage decision-making process not only streamlines workflows but also ensures businesses can focus on deriving actionable insights rather than manual labeling.
Within Fullstory, “elements” allow users to label UI components based on specific CSS selectors. A CSS selector is a pattern used to target elements in a webpage, such as classes, IDs, or attributes. For instance, a “Checkout Button” element might be created with the selector .checkout-page-container [data-testid="primary-button"]. These labels help categorize UI components and utilize them for product analytics. Broad semantic labeling is crucial for long-term success with Fullstory, and automating this process simplifies workflows for users.
Vertex AI with Gemini 1.5 Pro offers a unique opportunity to add a human touch at scale. It proactively identifies and describes web components, ultimately providing actionable insights that benefit Fullstory customers. Gemini 1.5 Pro is trained on extensive web expertise, including web implementation from CSS and web frameworks like React, along with a vast dataset of website images.
For example, the model can analyze a website screenshot and accurately describe its components, understanding both the overall structure, visible text, and the logical structure of the web page. This understanding can be further enhanced with web implementation details, such as CSS selectors, to gain a deeper understanding of specific components.
Optimizing for accurate element identification
Fullstory employs a meticulous approach to ensure the model provides high-quality element suggestions in four critical ways:
Strategic prompt engineering: Complex tasks are broken down into smaller, manageable steps, allowing the model to build a foundational understanding and deliver consistent and accurate results.
Pre-filtering with heuristics: Heuristics pre-filter potential elements before requests are sent to Vertex AI with Gemini, optimizing efficiency.
Validation with Vertex AI with Gemini: The model’s expertise validates potential elements, ensuring that only useful suggestions are presented.
Contextualized suggestions: Each suggestion includes a screenshot, CSS selector, and occurrence metrics, providing valuable context for informed decision-making.
This process ensures effective and efficient use of Gemini’s AI capabilities, resulting in accurate and valuable element suggestions.
Pinpointing and perfecting: How we identify and label key web elements
Optimizing the digital experience requires identifying and understanding key web elements. These elements need to be defined in a way that remains resilient to website changes. This presents a challenge, given the diverse nature of websites and user behaviors.
In the real world, an element selector can look something like:
While metrics like “most clicked buttons” provide some insight, a more sophisticated approach is needed to uncover elements that drive engagement, signal errors, or reveal hidden opportunities. Effective management of potentially long and brittle element selector definitions is also crucial for maintaining data quality.
The search for meaningful elements
Fullstory captures every user interaction, generating a wealth of data. The platform continuously monitors unrecognized components, prioritizing:
New feature discovery: Identifying elements on newly launched feature pages.
Power user behavior: Understanding how experienced users interact with the website.
Error signals: Detecting elements with CSS that suggests potential errors.
Content analysis: Analyzing elements containing text that indicates user intent or potential issues.
These searches utilize CSS selectors to precisely target elements for granular analysis and efficient refinement.
Analyzing user behavior in Fullstory often involves crafting complex CSS selectors. With Vertex AI and Gemini 1.5 Pro, this process is simplified.
Deep indexing: Components of website CSS selectors and associated events are tokenized and indexed, enabling efficient searching through countless variations.
Semantic relevance: The model understands the meaning behind selectors. For example, when tracking an “Add to Cart” button, the model recognizes that the class .add-to-cart is more relevant than a generic class like .primary-button.
Powerful search: Combining semantic understanding with advanced search capabilities, the model identifies the best match for selectors.
This results in high-quality selectors without requiring in-depth CSS expertise, allowing users to focus on uncovering valuable insights from their Fullstory data.
Here’s an example of a Fullstory element being optimized:
The Importance of accurate labeling
Once an element is identified through its CSS selector, accurate labeling becomes crucial. This involves:
Name: A clear and concise name reflecting the element’s function.
Description: A detailed explanation of the element’s purpose and behavior.
Role: Assigning a predefined role from Fullstory’s library (e.g., “Add to Cart Button,” “Validation Error”).
Adding context to the equation
To ensure high-quality and consistent element labeling, Vertex AI with Gemini 1.5 Pro leverages Fullstory’s rich data and advanced AI capabilities to provide comprehensive context:
Visual representation: Screenshots of the element in action, generated from session playbacks.
Textual analysis: Examining text occurrences associated with the element.
Location tracking: Identifying the URLs where the element appears.
This approach, combining Fullstory’s data capture with Vertex AI and Gemini 1.5 Pro, allows for AI-powered analysis that moves beyond basic metrics to truly understand user behavior. By identifying and labeling key web elements with precision and context, businesses can unlock valuable insights and create exceptional digital experiences.
Delivering real-world results with Vertex AI API for Gemini 1.5 Pro
The collaboration between Fullstory and Google Cloud has yielded tens of thousands of element suggestions generated for customers, with Gemini 1.5 Pro intelligently filtering out a significant portion of irrelevant suggestions. The model has also identified numerous error elements that were previously unrecognized.
Beyond element identification, the mapping between element configuration and screenshots has opened up new opportunities for improving site configuration and enhancing analytics. This ongoing collaboration between Fullstory and Google Cloud continues to drive significant value for customers, empowering them to gain a deeper understanding of user behavior and optimize their digital experiences.
Ready to unlock the power of behavioral data? Visit Fullstory on Google Cloud Marketplace today! With Fullstory, you can gain a deeper understanding of your customers by uncovering hidden insights into their behavior and identifying key opportunities to optimize their digital experience.
Want to learn more about leveraging AI to enhance your Fullstory experience? Explore Fullstory’s documentation or try out this collaboration to see how AI can accelerate your journey.
From helping your developers write better code faster with Code Assist, to helping cloud operators more efficiently manage usage with Cloud Assist, Gemini for Google Cloud is your personal AI-powered assistant.
However, understanding exactly how your internal users are using Gemini has been a challenge — until today.
Today we are announcing that Cloud Logging and Cloud Monitoring support for Gemini for Google Cloud. Currently in public preview, Cloud Logging records requests and responses between Gemini for Google Cloud and individual users, while Cloud Monitoring reports 1-day, 7-day, and 28-day Gemini for Google Cloud active users and response counts in aggregate.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6158f2a4c0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Cloud Logging
In addition to offering customers general visibility into the impact of Gemini, there are a few scenarios where logs are useful:
to track the provenance of your AI-generated content
to record and review user usage of Gemini for Google Cloud
This feature is available as opt-in and when enabled, logs your users’ Gemini for Google Cloud activity to Cloud Logging (Cloud Logging charges apply).
Once enabled, log entries are made for each request to and response from Gemini for Google Cloud. In a typical request entry, Logs Explorer would provide an entry similar to the following example:
There are several things to note about this entry:
The content inside jsonPayload contains information about the request. In this case, it was a request to complete Python code with def fibonacci as the input.
The labels tell you the method (CompleteCode), the product (code_assist), and the user who initiated the request (cal@google.com).
The resource labels tell you the instance, location, and resource container (typically project) where the request occurred.
In a typical response entry, you’ll see the following:
Note that the request_id inside the label are identical for this pair of requests and responses, enabling identification of request and response pairs.
In addition to the Log Explorer, Log Analytics supports queries to analyze your log data, and help you answer questions like “How many requests did User XYZ make to Code Assist?”
For more details, please see the Gemini for Google Cloud logging documentation.
Cloud Monitoring
Gemini for Google Cloud monitoring metrics help you answer questions like:
How many unique active users used Gemini for Google Cloud services over the past day or seven days?
How many total responses did my users receive from Gemini for Google Cloud services over the past six hours?
Cloud Monitoring support for Gemini for Google Cloud is available to anybody who uses a Gemini for Google Cloud product and records responses and active users as Cloud Monitoring metrics, with which dashboards and alerts can be configured.
Because these metrics are available with Cloud Monitoring, you can also use them as part of Cloud Monitoring dashboards. A “Gemini for Google Cloud” dashboard is automatically installed under “GCP Dashboards” when Gemini for Google Cloud usage is detected:
Metrics Explorer offers another avenue where metrics can be examined and filters applied to gain a more detailed view of your usage. This is done by selecting the “Cloud AI Companion Instance” active resource in the Metrics Explorer:
In the example above, response_count is the number of responses sent by Gemini for Google Cloud, and can be filtered for Gemini Code Assist or the Gemini for Google Cloud method (code completion/generation).
For more details, please see the Gemini for Google Cloud monitoring documentation.
What’s next
We’re continually working on additions to these new capabilities, and in particular are focused on Code Assist logging and metrics enhancements that will bring even further insight and observability into your use of Gemini Code Assist and its impact. To get started with Gemini Code Assist and learn more about Gemini Cloud Assist — as well as observability data about it from Cloud Logging and Monitoring — check out the following links:
The automotive industry is facing a profound transformation, driven by the rise of CASE, — connected cars, autonomous and automated driving, shared mobility, and electrification. Simultaneously, manufacturers face the imperative to further increase efficiency, automate manufacturing, and improve quality. AI has emerged as a critical enabler of this evolution. In this dynamic landscape, Toyota turned to Google Cloud’s AI Infrastructure to build an innovative AI Platform that empowers factory workers to develop and deploy machine learning models across key use cases.
Toyota‘s renowned production system, Toyota Production System, rooted in the principles of “Jidoka” (automation with a human touch) and “Just-in-Time” inventory management, has long been the gold standard in manufacturing efficiency. However, certain parts of this system are resistant to conventional automation.
We started experimenting with using AI internally in 2018. However, a shortage of employees with the expertise required for AI development created a bottleneck in promoting its wider use. Seeking to overcome these limitations, Toyota’s Production Digital Transformation Office embarked on a mission to democratize AI development within its factories in 2022.
Our goal was to build an AI Platform that enabled factory floor employees, regardless of their AI expertise, to create machine learning models with ease. This would facilitate the automation of manual, labor-intensive tasks, freeing up human workers to focus on higher-value activities such as process optimization, AI implementation in other production areas, and data-driven decision-making.
AI Platform is the collective term for the AI technologies we have developed, including web applications that enable easy creation of learning models on the manufacturing floor, compatible equipment on the manufacturing line, and the systems that support these technologies.
By the time we completed implementing the AI platform earlier this year, we found it would be able to save us as many as 10,000 hours of mundane work annually through manufacturing efficiency and process optimization.
For this company-wide initiative, we brought the development in-house to accumulate know-how. It was also important to stay up-to-date with the latest technologies so we could accelerate development and broaden opportunities to deploy AI. Finally, it was crucial to democratize our AI technology into a truly easy-to-use platform. We knew we needed to be led by those working on the manufacturing floor if we wanted them to use the AI more proactively; while at the same time, we wanted to improve the development experience for our software engineers.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e8250086700>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Hybrid Architecture Brings Us Numerous Advantages
To power our AI Platform, we opted for a hybrid architecture that combines our on-premises infrastructure and cloud computing.
The first objective was to promote agile development. The hybrid cloud environment, coupled with a microservices-based architecture and agile development methodologies, allowed us to rapidly iterate and deploy new features while maintaining robust security. The path for a microservices architecture arose from the need to flexibly respond to changes in services and libraries, and as part of this shift, our team also adopted a development method called “SCRUM” where we release features incrementally in short cycles of a few weeks, ultimately resulting in streamlined workflows.
If we had developed machine learning systems solely on-premises with the aim to ensure security, we would need to perform security checks on a large amount of middleware, including dependencies, whenever we add a new feature or library. On the other hand, with the hybrid cloud, we can quickly build complex, high-volume container images while maintaining a high level of security.
The second objective is to use resources effectively. The manufacturing floor, where AI models are created, is now also facing strict cost efficiency requirements.
With a hybrid cloud approach, we can use on-premises resources during normal operations and scale to the cloud during peak demand, thus reducing GPU usage costs and optimizing performance. This allows us to flexibly adapt to an expected increase in the number of users of AI Platform in the future, as well.
Furthermore, adapting a hybrid cloud helps us to achieve cost savings on facility investments. By leveraging the cloud for scaling capacity, we minimized the need for extensive on-premises hardware investments. In a traditional on-premises environment, we would need to set up high-performance servers with GPUs in every factory. With a hybrid cloud, we can reduce the number of on-premises servers to one and use the cloud to cover the additional processing capacity whenever needed. The hybrid cloud’s concept of “using resources when and only as needed” aligns well with our “Just-in-Time” method.
The Reasons We Chose Google Cloud AI Hypercomputer
Several factors influenced our decision when choosing a cloud partner for the development of the Toyota AI Platform’s hybrid architecture and ultimately, we chose Google Cloud.
The first is the flexibility of using GPUs. In addition to the availability of using high-performance GPUs from one unit, we could use A2 VMs with Google Cloud’s unique features like multi-instance GPUs and time-sharing GPUs. This flexibility reduces idle compute resources and optimizes costs, leading to increased business value over a given time by allowing scarce GPUs to perform more machine learning trainings. Plus Dynamic Workload Scheduler helps us efficiently manage and schedule GPU resources to help us optimize running costs.
Next is ease of use. We anticipate that we will be required to secure more GPU resources across multiple regions in the future. With Google Cloud, we can manage GPU resources through a single VPC, avoiding network complexity. When considering the system to deploy, only Google Cloud had this capability.
The speed of build and processing was also a big appeal for us . In particular, Google Kubernetes Engine (GKE), its Autopilot and Image Streaming provide flexibility and speed, thereby allowing us to improve cost-effectiveness in terms of operational burden. We measured the communication speed of containerization during the system evaluation process, and found that Google Cloud was four times faster scaling from zero than other existing services. The speed of communication and processing is extremely important, as we use up to 10,000 images when creating the learning model. When we first started developing AI technology in-house, we struggled with flexible system scaling and operations. In this regard, too, using Google Cloud was the ideal choice.
Completed Large-scale Development in 1.5 Years with 6 Members
With Google Cloud’s support, a small team of six developers achieved a remarkable feat by successfully building and deploying the AI Platform in about half the time it would take for a standard system development project at Toyota. This rapid development was facilitated by Google Cloud’s user-friendly tools, collaborative approach, and alignment with Toyota’s automation-focused culture.
After choosing Google Cloud, we began discussing the architecture with the Google Cloud team. We then worked on modifying the web app architecture for the cloud lift, building the hybrid cloud, developing human resources within the company, while learning skills for the “internalization of technology (acquisition and accumulation of new know-how)”.During the implementation process, we divided the workloads into on-premises and cloud architectures, and implemented best practices to monitor communications and resources. This process also involved migrating CI/CD pipelines and image data to the cloud. By performing builds in the cloud and caching images on-premises, we ensured quick start-up and flexible operations.
In addition to the ease of development of Google Cloud products, cultural factors also contributed greatly to the success of this project. Our objective of making the manufacturing process automated as much as possible, is in line with Google’s concept of SRE (Site Reliability Engineering). So, we shared the same sense of purpose.
Currently, in the hybrid cloud, we deploy a GKE Enterprise cluster on-premises and link it to the GKE cluster on Google Cloud. When we develop our AI Platform and web apps, we run Cloud Build with Git CI triggers, verify container image vulnerabilities with Artifact Registry and Container Analysis, and ensure a secure environment with Binary Authorization. At the manufacturing floor, structural data such as numerical data and unstructured data such as images are deployed on GKE via a web app, and learning models are created on N1 VMs with NVIDIA T4 GPUs and A2 VMs which include NVIDIA A100 GPUs.
Remarkable Results Achieved Through Operating AI Platform
We have achieved remarkable results with this operational structure.
Enhanced Developer Experience: First, with regard to the development experience, waiting time for tasks have been reduced, and operational and security burdens have been lifted, allowing us to focus even more on development.
Increased User Adoption: Additionally, the use of our AI Platform on the manufacturing floor is also growing. Creating a learning model can typically take 10 to 15 minutes in the shortest, and up to 10 hours in the longest. GKE’s Image Streaming streamlined pod initialization and accelerated learning, resulting in a 20% reduction in learning model creation time. This improvement has enhanced the user experience (UX) on the manufacturing floor, leading to a surge in the number of users. Consequently, the number of models created in manufacturing has steadily increased, rising from 8,000 in 2023 to 10,000 in 2024. The widespread adoption of this technology has allowed for a substantial reduction of over 10,000 man-hours per year in the actual manufacturing process, optimizing efficiency, and productivity.
Expanding Impact: AI Platform is already in use at all of our car and unit manufacturing factories (total 10 factories), and its range of applications is expanding. At the Takaoka factory, the platform is used not only to inspect finished parts, but also in the manufacturing process; inspect the application of adhesives used to attach glass to back doors, and to detect abnormalities in injection molding machines used for manufacturing bumpers and other parts. Meanwhile, the number of active users in the company has increased to nearly 1,200, and more than 400 employees participate in in-house training programs each year.
Recently, there have been cases where people who were developing in other departments became interested in Google Cloud and joined our development team. Furthermore, this project has sparked an unprecedented shift within the company: the resistance to the cloud technology itself diminishing and other departments beginning to consider adopting it.
Utilizing Cloud Workstations for Further Productivity With an Eye on Generative AI
For the AI Platform, we plan to develop an AI model that can set more detailed criteria for detection, implement it in an automated picking process, and use it for maintenance and predictive management of the entire production line. We are also developing original infrastructure models based on the big data collected on the platform, and expect to use the AI Platform more proactively in the future.Currently, the development team compiles work logs and feedback from the manufacturing floor, and we believe that the day will soon come when we will start utilizing generative AI. For example, the team is considering using AI to create images for testing machine learning during the production preparation stage, which has been challenging due to a lack of data. In addition, we are considering using Gemini Code Assist to improve the developer experience, or using Gemini to convert past knowledge into RAG and implement a recommendation feature.In March 2024, we joined Google Cloud’s Tech Acceleration Program (TAP) and implemented Cloud Workstations. This also aims to achieve the goals we have been pursuing: to improve efficiency, reduce workload, and create a more comfortable work environment by using managed services.
Through this project, led by the manufacturing floor, we have established a “new way of manufacturing” where anyone can easily create and utilize AI learning models, and significantly increase the business impact for our company. This was enabled by the cutting-edge technology and services provided by Google Cloud.
Like “Jidoka (auto’no’mation)” of production lines and “Just-in-Time” method, the AI Platform has now become an indispensable part of our manufacturing operations. Leveraging Google Cloud, we will continue our efforts to make ever-better cars.
Generative AI is giving people new ways to experience audio content, from podcasts to audio summaries. For example, users are embracing NotebookLM’s recent Audio Overview feature, which turns documents into audio conversations. With one click, two AI hosts start up a lively “deep dive” discussion based on the sources you provide. They summarize your material, make connections between topics, and discuss back and forth.
While Notebook LM offers incredible benefits for making sense of complex information, some users want more control over generating unique audio experiences – for example, creating their own podcasts. Podcasts are an increasingly popular medium for creators, business leaders, and users to listen to what interests them. Today, we’ll share how Gemini 1.5 Pro and the Text-to-Speech API on Google Cloud can help you create conversations with diverse voices and generate podcast scripts with custom prompts.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3ef677ac5d90>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
The approach: Expand your reach with diverse audio formats
A great podcast starts with accessible audio content. Gemini’s multimodal capabilities, combined with our high-fidelity Text-to-Speech API, offers 380+ voices across 50+ languages and custom voice creation. This unlocks new ways for users to experience content and expand their reach through diverse audio formats.
This approach also helps content creators reach a wider audience and streamline the content creation process, including:
Expanded reach: Connect with an audience segment that prefers audio content.
Increased engagement: Foster deeper connections with listeners through personalized audio.
Content repurposing: Maximize the value of existing written content by transforming it into a new format, reaching a wider audience without starting from scratch.
Let’s take a look at how.
The architecture: Gemini 1.5 Pro and Text-to-Speech
Our audio overview creation architecture uses two powerful services from Google Cloud:
Gemini 1.5 Pro: This advanced generative AI model excels at understanding and generating human-like text. We’ll use Gemini 1.5 Pro to:
Generate engaging scripts: Feed your podcast content overview to Gemini 1.5 Pro, and it can generate compelling conversational scripts, complete with introductions, transitions, and calls to action.
Adapt content for audio: Gemini 1.5 Pro can optimize written content for the audio format, ensuring a natural flow and engaging listening experience. It can also adjust the tone and style to suit any format such as podcasts.
Text-to-Speech API: This API converts text into natural-sounding speech, giving a voice to your scripts. You can choose from various voices and languages to match your brand and target audience.
How to create an engaging podcast yourself, step-by-step
Content preparation: Prepare your podcast. Ensure it’s well-structured and edited for clarity. Consider dividing longer posts into multiple episodes for optimal listening duration.
Gemini 1.5 Pro integration: Use Gemini 1.5 Pro to generate a conversational script from your podcast. Experiment with prompts to fine-tune the output, achieving the desired style and tone. Example prompt: “Generate an engaging audio overview script from this podcast, including an introduction, transitions, and a call to action. Target audience is technical developers, engineers, and cloud architects.”
Section extraction: For complex or lengthy podcasts, you might use Gemini 1.5 Pro to extract key sections and subsections as JSON, enabling a more structured approach to script generation.
A python function that powers our podcast creation process can look as simple as below:
code_block
<ListValue: [StructValue([(‘code’, ‘def extract_sections_and_subsections(document1: Part, project=”<your-project-id>”, location = “us-central1”) -> str:rn “””rn Extracts hierarchical sections and subsections from a Google Cloud blog postrn provided as a PDF document.rnrnrn This function uses the Gemini 1.5 Pro language model to analyze the structurern of a blog post and identify its key sections and subsections. The extractedrn information is returned in JSON format for easy parsing and use inrn various applications.rnrnrn This is particularly useful for:rnrnrn * **Large documents:** Breaking down content into manageable chunks forrn efficient processing and analysis.rn * **Podcast creation:** Generating multi-episode series where each episodern focuses on a specific section of the blog post.rnrnrn Args:rn document1 (Part): A Part object representing the PDF document,rn typically obtained using `Part.from_uri()`.rn For example:rn “`pythonrn document1 = Part.from_uri(rn mime_type=”application/pdf”,rn uri=”gs://your-bucket/your-pdf.pdf”rn )rn “`rn location: The region of your Google Cloud project. Defaults to “us-central1”.rn project: The ID of your Google Cloud project. Defaults to “<your-project-id>”.rnrnrnrnrn Returns:rn str: A JSON string representing the extracted sections and subsections.rn Returns an empty string if there are issues with processing orrn the model output.rn “””rnrnrn vertexai.init(project=project, location=location) # Initialize Vertex AIrn model = GenerativeModel(“gemini-1.5-pro-002”)rnrnrn prompt = “””Analyze the following blog post and extract its sections and subsections. Represent this information in JSON format using the following structure:rn [rn {rn “section”: “Section Title”,rn “subsections”: [rn “Subsection 1”,rn “Subsection 2”,rn // …rn ]rn },rn // … more sectionsrn ]”””rnrnrn try:rn responses = model.generate_content(rn [“””The pdf file contains a Google Cloud blog post required for podcast-style analysis:”””, document1, prompt],rn generation_config=generation_config,rn safety_settings=safety_settings,rn stream=True, # Stream results for better performance with large documentsrn )rnrnrn response_text = “”rn for response in responses:rn response_text += response.textrnrnrn return response_textrnrnrn except Exception as e:rn print(f”Error during section extraction: {e}”)rn return “”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef677f01a60>)])]>
Then, use Gemini 1.5 Pro to generate the podcast script for each section. Again, provide clear instructions in your prompts, specifying target audience, desired tone, and approximate episode length.
For each section and subsection you can use a function like below to generate a script:
code_block
<ListValue: [StructValue([(‘code’, ‘def generate_podcast_content(section, subsection, document1:Part, targetaudience, guestname, hostname, project=”<your-project-id>”, location=”us-central1″) -> str:rn “””Generates a podcast dialogue in JSON format from a blog post subsection.rnrnrn This function uses the Gemini model in Vertex AI to create a conversationrn between a host and a guest, covering the specified subsection content. It usesrn a provided PDF as source material and outputs the dialogue in JSON.rnrnrn Args:rn section: The blog post’s main section (e.g., “Introduction”).rn subsection: The specific subsection (e.g., “Benefits of Gemini 1.5″).rn document1: A `Part` object representing the source PDF (created usingrn `Part.from_uri(mime_type=”application/pdf”, uri=”gs://your-bucket/your-pdf.pdf”)`).rn targetaudience: The intended audience for the podcast.rn guestname: The name of the podcast guest.rn project: Your Google Cloud project ID.rn location: Your Google Cloud project location.rnrnrn Returns:rn A JSON string representing the generated podcast dialogue.rn “””rn print(f”Processing section: {section} and subsection: {subsection}”)rnrnrn prompt = f”””Create a podcast dialogue in JSON format based on a provided subsection of a Google Cloud blog post (found in the attached PDF).rn The dialogue should be a lively back-and-forth between a host (R) and a guest (S), presented as a series of turns.rn The host should guide the conversation by asking questions, while the guest provides informative and accessible answers.rn The script must fully cover all points within the given subsection.rn Use clear explanations and relatable analogies.rn Maintain a consistently positive and enthusiastic tone (e.g., “Movies, I love them. They’re like time machines…”).rn Include only one introductory host greeting (e.g., “Welcome to our next episode…”). No music, sound effects, or production directions.rnrnrn JSON structure:rn {{rn “multiSpeakerMarkup”: {{rn “turns”: [rn {{“text”: “Podcast script content here…”, “speaker”: “R”}}, // R for host, S for guestrn // … more turnsrn ]rn }}rn }}rnrnrn Input Data:rn Section: “{section}”rn Subsections to cover in the podcast: “{subsection}”rn Target Audience: “{targetaudience}”rn Guest name: “{guestname}”rn Host name: “{hostname}”rn “””rnrnrn vertexai.init(project=project, location=location)rn model = GenerativeModel(“gemini-1.5-pro-002”)rnrnrn responses = model.generate_content(rn [“””The pdf file contains a Google Cloud blog post required for podcast-style analysis:”””, document1, prompt],rn generation_config=generation_config, # Assuming these are defined alreadyrn safety_settings=safety_settings, # Assuming these are defined alreadyrn stream=True,rn )rnrnrn response_text = “”rn for response in responses:rn response_text += response.textrnrnrn return response_text’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef677f010d0>)])]>
Next, feed the generated script by Gemini to the Text-to-Speech API. Choose a voice and language appropriate for your target audience and content.
A function as below can generate human quality audio based on text. For this we can use the advanced text-to-speech API in Google Cloud.
code_block
<ListValue: [StructValue([(‘code’, ‘def generate_audio_from_text(input_json):rn “””Generates audio using Google Text-to-Speech API.rnrnrn Args:rn input_json: A dictionary containing the ‘multiSpeakerMarkup’ for the TTS API. This is generated by the Gemini 1.5 Pro model in the buildPodCastContent() function. rnrnrn Returns:rn The audio data in bytes (MP3 format) if successful, None otherwise.rn “””rnrnrn try:rn # Build the Text-to-Speech servicern service = build(‘texttospeech’, ‘v1beta1’)rnrnrn # Prepare synthesis inputrn synthesis_input = {rn ‘multiSpeakerMarkup’: input_json[‘multiSpeakerMarkup’]rn }rnrnrn # Configure voice and audio settingsrn voice = {rn ‘languageCode’: ‘en-US’,rn ‘name’: ‘en-US-Studio-MultiSpeaker’rn }rnrnrn audio_config = {rn ‘audioEncoding’: ‘MP3’,rn ‘pitch’: 0,rn ‘speakingRate’: 0,rn ‘effectsProfileId’: [‘small-bluetooth-speaker-class-device’]rn }rnrnrn # Make the API requestrn response = service.text().synthesize(rn body={rn ‘input’: synthesis_input,rn ‘voice’: voice,rn ‘audioConfig’: audio_configrn }rn ).execute()rnrnrn # Extract and return audio contentrn audio_content = response[‘audioContent’]rn return audio_contentrnrnrn except Exception as e:rn print(f”Error: {e}”) # More informative error messagern return None’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef677f01a90>)])]>
Finally, to store audio content already encoded as base64 MP3 data in Google Cloud Storage, you can use the google-cloud-storage Python library. This allows you to decode the base64 string and upload the resulting bytes directly to a designated bucket, specifying the content type as ‘audio/mp3’.
Hear it for yourself
While the Text-to-Speech API produces high-quality audio, you can further enhance your audio conversation with background music, sound effects, and professional editing using tools.Hear it for yourself – download the audio conversation I created from this blog using Gemini 1.5 Pro and Text-to-Speech API.
To start creating for yourself, explore our full suite of audio generation features using Google Cloud services, such as Text-to-Speech API and Gemini models using the free tier. We recommend experimenting with different modalities like text and image prompts to experience Gemini’s potential for content creation.
Like many businesses, your SAP HANA database is the heart of your SAP business applications, a repository of mission-critical data that drives your operations. But what happens when disaster strikes?
Protecting a SAP HANA system involves choices. Common methods include HANA System Replication (HSR) for high availability and Backint for backups. But while having a disaster recovery (DR) strategy is crucial, it doesn’t need to be overly complex or expensive. While HSR offers rapid recovery, it requires a significant investment. For many SAP deployments, a cold DR strategy strikes the perfect balance between cost-effectiveness and recovery time objectives (RTOs).
What is cold DR? Think of it as your backup plan’s backup plan. It minimizes costs by maintaining a non-running environment that’s only activated when disaster strikes. This traditionally means longer RTOs compared to hot or warm DR, but significantly lower costs, and while often deemed sufficient, any improvement on RTO and lower cost is what businesses are often in search of.
Backint, when paired with storage (e.g. Persistent Disk and Cloud Storage) enables data transfer to a secondary location, and can be an effective cold DR solution. However, using Backint for DR can mean longer restore times and high storage costs, especially for large databases. Google Cloud is delivering a solution addressing both the cost-effectiveness of cold DR and the rapid recovery of a full DR solution: Backup and DR Service with Persistent Disk (PD) snapshot integration. This innovative approach leverages the power of incremental forever backups and HANA Savepoints to protect your SAP HANA environment.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ef6421c6fd0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Rethinking SAP disaster recovery in Google Cloud
Backup and DR is an enterprise backup and recovery solution that integrates directly with cloud-based workloads that run in Google Compute Engine. Backup and DR provides backup and recovery capabilities for virtual machines (VMs), file systems, multiple SAP databases (HANA, ASE, MaxDB, IQ) as well as Oracle, Microsoft SQL Server, and Db2. You can elect to create backup plans to configure the time of backup, how long to retain backups, where to store the backups (regional/multi-regional) and in what tier of storage, along with specifying database log backup intervals to help ensure a low recovery point objective (RPO).
A recent Backup and DR feature offers Persistent Disk (PD) snapshot integration for SAP HANA databases. This is a significant advancement because these PD snapshots are integrated with SAP HANA Savepoints to help ensure database consistency. When the database is scheduled to be backed up, the Backup and DR agent running in the SAP HANA node instructs the database to trigger a Savepoint image, where all changed data is written to storage in the form of pages. Another benefit of this integration is that the data copy process occurs on the storage side. You no longer copy the backup data through the same network interfaces that the database or operating system are using. This results in production workloads retaining the compute and networking resources, even during an active backup
Once completed, Backup and DR services trigger the PD snapshots from the Google Cloud storage APIs, so that the image is captured on disk, and logs can also be truncated if desired. All of these snapshots are “incremental forever” and database-consistent backups. Alternatively, you can use logs to recover to a point in time (from the HANA PD snapshot image).
Integration with SAP HANA Savepoints is critical to this process. Savepoints are SAP HANA API calls whose primary use is to help speed up recovery restart times, to provide a low RTO. They achieve this because when the system is starting up, logs don’t need to be processed from the beginning, but only from the last Savepoint position. Savepoints are coordinated across all processes (called SAP HANA services) and instances of the database to ensure transaction consistency.
The HANA Savepoint Backup sequence using PD snapshots can be summarized as:
Tell agent to initiate HANA Savepoint
Initiate PD snapshot, wait for ‘Uploading’ state (seconds)
Tell agent to close HANA Savepoint
Wait for PD snapshot ‘Ready’ state (minutes)
Expire any logs on disk that have passed expiration time
Catalog backup for reporting, auditing
In addition, you can configure log backups to occur regularly, independent of Savepoint snapshots. These logs are stored on a separate disk and also backed up via PD snapshots, allowing for point-in-time recovery.
Operating system backups
What about the operating system backups? Good news: Backup and DR lets you take PD snapshots for the bootable OS and selectively any other disk attached directly to your Compute Engine VMs. These backup images can be also stored in the same regional or multi-regional location for cold DR purposes.
You can then restore HANA databases to a local VM or your disaster recovery (DR) region. This flexibility allows you to use your DR region for a variety of purposes, such as development and testing, or maintaining a true cold DR region for cost efficiency.
Backup and DR helps simplify DR setup by allowing you to pre-configure networks, firewall rules, and other dependencies. It can then quickly provision a backup appliance in your DR region and restore your entire environment, including VMs, databases, and logs.
This approach gives you the freedom to choose the best DR strategy for your needs: hot, warm, or cold, each with its own cost, RPO, and RTO implications.
One of the key advantages of using Backup and DR with PD snapshots is the significant cost savings it offers compared to traditional DR methods. By eliminating the need for full backups and leveraging incremental forever snapshots, customers can reduce their storage costs by up to 50%, in our testing. Additionally, we found that using a cold DR region with Backup and DR can reduce storage consumption by 30% or more compared to using a traditional backup to file methodology.
Why this matters
Using Google Cloud’s Backup and DR to protect your SAP HANA environment brings a lot of benefits:
Better backup performance(throughput) – storage layer handles data transfer rather than an agent on the HANA server
Reduced TCO through elimination of regular full backups
Reduced I/O on the SAP HANA server by avoiding database reads and the writes during the backup window that can be very long by comparison to a regular Backint full backup event.
Operational simplicity with an onboarding wizard, and no need to manage additional storage provisioning on the source host
Faster recovery times (local or DR) as PD Snapshots recover natively to the VM storage subsystem (not copied over customer networks). Recovery to a point-in-time is possible with logs from the HANA PD Snapshot. You can even take more frequent Savepoints by scheduling these every few hours, to further reduce the log recovery time for restores
Data resiliency – HANA PD Snapshots are stored in regional or multi-regional locations
Low Cost DR – Since Backup images for VMs and Databases are already replicated to your DR region (via regional or multi-regional PD snapshots), recovery is just a matter of bringing up your VM, then choosing your recovery point-in-time for the SAP HANA Database and waiting for a short period of time
When to choose Persistent Disk Asynchronous Replication
While Backup and DR offers a comprehensive solution for many, some customers may have specific needs or preferences that require a different approach. For example, if your SAP application lacks built-in replication, or you need to replicate your data at the disk level, Persistent Disk Asynchronous Replication is a valuable alternative. This approach allows you to spin up new VMs in your DR region using replicated disks, speeding up the recovery process.
PD Async’s infrastructure-level replication is application agnostic, making it ideal for applications without built-in replication. It’s also cost-effective, as you only pay for the storage used by the replicated data. Plus, it offers flexibility, allowing you to customize the replication frequency to balance cost and RPOs.
If you are interested in setting up PD Async, and would like to configure this within Terraform, please take a look at one of our colleagues who created this Terraform example for how to test in a failover and failback scenario for a number of Compute Engine VMs.
Take control of your SAP disaster recovery
By leveraging Google Cloud’s Backup and DR and PD Async, you can build a robust and cost-effective cold DR solution for your SAP deployments on Google Cloud that minimizes costs without compromising on data protection, providing peace of mind in the face of unexpected disruptions.
HighLevel is an all-in-one sales and marketing platform built for agencies. We empower businesses to streamline their operations with tools like CRM, marketing automation, appointment scheduling, funnel building, membership management, and more. But what truly sets HighLevel apart is our commitment to AI-powered solutions, helping our customers automate their businesses and achieve remarkable results.
As a software as a service (SaaS) platform experiencing rapid growth, we faced a critical challenge: managing a database that could handle volatile write loads. Our business often sees database writes surge from a few hundred requests per second (RPS) to several thousand within minutes. These sudden spikes caused performance issues with our previous cloud-based document database.
This previous solution required us to provision dedicated resources, which created several bottlenecks:
Slow release cycles: Provisioning resources before every release impacted our agility and time-to-market.
Scaling limitations: We constantly battled DiskOps limitations due to high write throughput and numerous indexes. This forced us to shard larger collections across clusters, requiring complex coordination and consuming valuable engineering time.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3ef6787c9400>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Going serverless with Firestore
To overcome these challenges, we sought a database solution that could seamlessly scale and handle our demanding write requirements.
Firestore‘s serverless architecture made it a strong contender from the start. But it was the arrival of point-in-time recovery and scheduled backups that truly solidified our decision. These features eliminated our initial concerns and gave us the confidence to migrate the majority of HighLevel’s workloads to Firestore.
Since migrating to Firestore, we have seen significant benefits, including:
Increased developer productivity: Firestore’s simplicity has boosted our developer productivity by 55%, allowing us to focus on product innovation.
Enhanced scalability: We’ve scaled to over 30 billion documents without any manual intervention, handling workloads with spikes of up to 250,000 RPS and five million real-time queries.
Improved reliability: Firestore has proven exceptionally reliable, ensuring consistent performance even under peak load.
Real-time capabilities: Firestore’s real-time sync capabilities power our real-time dashboards without the need for complex socket infrastructure.
Firestore powering HighLevel’s AI
Firestore also plays a crucial role in enabling our AI-powered services across Conversation AI, Content AI, Voice AI and more. All these services are designed to put our customers’ businesses on autopilot.
For Conversation AI, for example, we use a retrieval augmented generation (RAG) architecture. This involves crawling and indexing customer data sources, generating embeddings, and storing them in Firestore, which acts as our vector database. This approach allows us to:
Overcome context window limitations of generative AI models
Reduce latency and cost
Improve response accuracy and minimize hallucinations
Lessons learned and a path forward
Our journey with Firestore has been eye-opening, and we’ve learned valuable lessons along the way.
For example, in December 2023, we encountered intermittent failures in collections with high write queries per second (QPS). These collections were experiencing write latencies of up to 60 seconds, causing operations to fail as deadlines expired before completion. With support from the Firestore team, we conducted a root-cause analysis and discovered that the issue stemmed from default single-field indexes on constantly increasing fields. These indexes, while helpful for single-field queries, were generating excessive writes on a specific sector of the index.
Once we understood the root cause, our team identified and excluded these unused indexes. This optimization resulted in a dramatic improvement, reducing write-tail latency from 60 seconds to just 15 seconds.
Firestore has been instrumental in our ability to scale rapidly, enhance developer productivity, and deliver innovative AI-powered solutions. We are confident that Firestore will continue to be a cornerstone of our technology stack as we continue to grow and evolve. Moving forward, we are excited to continue leveraging Firestore and Google Cloud to power our AI initiatives and deliver exceptional value to our customers.
Get started
Are you curious to learn more about how to use Firestore in your organization?
Watch our Next 2024 breakout session to discover recent Firestore updates, learn more about how HighLevel is experiencing significant total cost of ownership savings, and more!
This project has been a team effort. Shout out to the Platform Data team — Pragnesh Bhavsar in particular who has done an amazing job leading the team to ensure our data infrastructure runs at such a massive scale without hiccups. We also want to thank Varun Vairavan and Kiran Raparti for their key insights and guidance. For more from Karan Agarwal, follow him on LinkedIn.
Usually, financial institutions process multiple millions of transactions daily. Obviously, when running on cloud technology, any security lapse in their cloud infrastructure might have catastrophic consequences. In serverless setups for compute workloads Cloud Run on Google Cloud is employed. That’s why we are happy to announce the general availability of Google Cloud’s custom org policies to fortify Cloud Run environments and ensure it can be aligned seamlessly to fulfill the weakest up to stringent regulatory standards.
Financial service institutions operate under stringent global and local regulatory frameworks and bodies, such as regulations from the EU’s European Banking Authority, US Securities and Exchange Commission, or the Monetary Authority of Singapore. Also, the sensitive nature of financial data necessitates robust security measures. Hence, maintaining a comprehensive security posture is of major importance, encompassing both coarse-grained and fine-grained controls to address internal and external threats.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ef677699940>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Tailored Security, Configurable to Customer’s Needs
Network Access: Reduce unauthorized access attempts by precisely defining VPC configurations and ingress settings.
Deployment Security: Mandatory binary authorization is able to prevent potentially harmful deployments.
Resource Efficiency: Constraints on memory and CPU usage ensure getting the most out of cloud resources.
Stability & Consistency: Limiting the use of Cloud Run features to those in general vailability (GA) and enforcing standardized naming conventions enables a predictable, manageable environment.
This level of customization enables building a Cloud Run environment that’s not just secure, but also perfectly aligned with unique operational requirements.
Addressing the Complexities of Commerzbank’s Cloud Run Setup
Within Commerzbank’s Big Data & Advanced Analytics division, the company leverages cloud technology for its inherent benefits, particularly serverless services. Cloud Run is a crucial component of our serverless architecture and stretches across many applications due to its flexibility. While Cloud Run already offered security features such as VPC Service Controls, multi-regionality, and CMEK support, granular control over all Cloud Run’s capabilities was initially limited.
Diagram illustrating simplified policy management with Custom Org Policies
Better Together
The introduction of Custom Org Policies for Cloud Run now allows Commerzbank to directly map its rigorous security controls, ensuring compliant use of the service. This enhanced control enables the full-scale adoption and scalability of Cloud Run to support our business needs.
The granular control possible due to Custom Org Policies has been a game-changer. Commerzbank and customers like it can now tailor their security policies to their exact needs, preventing potential breaches and ensuring regulatory compliance.
A Secure Foundation for Innovation
Custom Org Policies have become an indispensable part of the cloud security toolkit. Their ability to enforce granular, tailored controls has boosted Commerzbank’s Cloud Run security and compliance. This newfound confidence allows them to innovate with agility, knowing their cloud infrastructure is locked down.
If you’re looking to enhance your Cloud Run security and compliance, we highly recommend exploring Custom Org Policies. They’ve been instrumental in Commerzbank’s journey, and we’re confident they can benefit your organization, too.
Looking Ahead: We’re also eager to explore how to leverage custom org policies for other Google Cloud services as Commerzbank continues to expand its cloud footprint. The bank’s commitment to security and compliance is unwavering, and custom org policies will remain a cornerstone of Commerzbank’s strategy.
We’re excited to share that Gartner has recognized Google as a Leader in the 2024 Gartner® Magic Quadrant™ for Data Integration Tools. As a Leader in this report, we believe Google’s position is a testament to delivering continuous customer innovation in areas such as unified data to AI governance, flexible and accessible data engineering experiences, and AI-powered data integration capabilities.
Today, most organizations operate with just 10% of the data they generate, which is often trapped in silos and disconnected legacy systems. The rise of AI unlocks the potential of the remaining 90%, enabling you to unify this data — regardless of format — within a single platform.
This convergence is driving a profound shift in how data teams approach data integration. Traditionally, data integration was seen as a separate IT process solely for enterprise business intelligence. But with the increased adoption of the cloud, we’re witnessing a move away from legacy on-premises technologies and towards a more unified approach that enables various users to access and work with a more robust set of data sources.
At the same time, organizations are no longer content with simply collecting data; they need to analyze it and activate it in real-time to gain a competitive edge. This is why leading enterprises are either migrating to or building their next-gen data platforms with BigQuery, converging the world of data lakes and warehouses. BigQuery’s unified data and AI capabilities combined with Google Cloud’s comprehensive suite of fully managed services, empower organizations to ingest, process, transform, orchestrate, analyze, and activate their data with unprecedented speed and efficiency. This end-to-end vision delivers on the promise of data transformation, so businesses can unlock the full value of their data and drive innovation.
Choice and flexibility to meet you where you are
Organizations thrive on data-driven decisions, but often struggle to wrangle information scattered across various sources. Google Cloud tools simplify data integration, by letting you:
Streamline data integration from third-party applications – With BigQuery Data Transfer Service, onboarding data from third-party applications like Salesforce or Marketo becomes dramatically simplified, eliminating complex coding and saving valuable time and data movement costs.
Create SQL-based pipelines – Dataform helps create robust, SQL-based pipelines, orchestrating the entire data integration flow easily and scalably. This flexibility empowers organizations to connect all their data dots, wherever they are, so they can unlock valuable insights faster.
Use gen-AI powered data preparation – BigQuery data preparation empowers analysts to clean and prepare data directly within BigQuery, using Gemini’s AI for intelligent transformations to streamline processes and help ensure data quality.
Bridging operational and analytical systems
Data teams know how frustrating it can be to have valuable analytical insights trapped in a data warehouse, disconnected from the operational systems where they could make a real impact. You don’t want to get bogged down in the complexities of ELT vs. ETL vs. ETL-T — you need solutions that prioritize SLAs to ensure on-time and consistent data delivery. This means having the right connectors to meet your needs, especially with the growing importance of real-time data. Google Cloud offers a powerful suite of integrated tools to bridge this gap, helping you easily connect your analytical insights with your operational systems to drive real-time action. With Google Cloud’s data tools, you can:
Perform advanced similarity searches and AI-powered analysis – Vector support across BigQuery and all Google databases lets you perform advanced similarity searches and AI-powered analysis directly on operational data.
Query operational data without moving it – Data Boost enables analysts to query data in place across sources like Bigtable and Vertex AI, while BigQuery’s continuous queries facilitate reverse ETL, pushing updated insights back into operational systems.
Implement real-time data integration and change data capture – Datastream captures changes and delivers them with low latency. Dataflow, Google Managed Service for Kafka, Pub/Sub, and new support for Apache Flink further enhance the reverse ETL process, fueling operational systems with fresh, actionable insights derived from analytics, all while using popular open-source software.
Governance at the heart of a unified data platform
Having strong data governance is critical, not just a checkbox item. It’s the foundation of ensuring your data is high-quality, secure, and compliant with regulations. Without it, you risk costly errors, security breaches, and a lack of trust in the insights you generate. BigQuery treats governance as a core component, not an afterthought, with a range of built-in features that simplify and automate the process, so you can focus on what matters most — extracting value from your data.
Easily search, curate and understand data with accelerated data exploration – With BigQuery data insights powered by Gemini, users can easily search, curate, and understand the data landscape, including the lineage and context of data assets. This intelligent discovery process helps remove the guesswork and accelerates data exploration.
Automatically capture and manage metadata – BigQuery’s automated data cataloging capabilities automatically capture and manage metadata, minimizing manual harvesting and helping to ensure consistency.
Google Cloud’s infrastructure is purpose-built with AI in mind, allowing users to easily leverage generative AI capabilities at scale. Users can train models, generate vector embeddings and indexes, and deploy data and AI use cases without leaving the platform. AI is infused throughout the user journey, with features like Gemini-assisted natural language processing, secure model integration, AI-augmented data exploration, and AI-assisted data migrations. This AI-centric approach delivers a strong user experience for data practitioners with varying skill sets and expertise.
2024 Gartner Magic Quadrant for Data Integration Tools -Thornton Craig et al, December 3, 2024. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Google. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved.
Editor’s note: In the heart of the fintech revolution, Current is on a mission to transform the financial landscape for millions of Americans living paycheck to paycheck. Founded on the belief that everyone deserves access to modern financial tools, Current is redefining what it means to be a financial institution in the digital age. Central to their success is a cloud-native infrastructure built on Google Cloud, with Spanner, Google’s globally distributed database with virtually unlimited scale, serving as the bedrock of their core platform.
More than 100 million Americans struggle to make ends meet, including the 23% of low-income Americans the Federal Reserve estimates do not have a bank account. Current was created to address their needs with a unique business model focused on payments, rather than the deposits and withdrawals of traditional financial institutions. We offer an easily accessible experience designed to make financial services available to all Americans, regardless of age or income.
Our innovative approach — built on proprietary banking core technology with minimal reliance on third-party providers — enables us to rapidly deploy financial solutions tailored to our members’ immediate needs. More importantly, these solutions are flexible enough to evolve alongside them in the future.
In our mission to deliver an exceptional experience, one of the biggest challenges we faced was creating a scalable and robust technological foundation for our financial services. To address this, we developed a modern core banking system to power our platform. Central to this core is our user graph service, which manages all member entities — such as users, products, wallets, and gateways.
Many unbanked and disadvantaged Americans lack bank accounts due to a lack of trust in institutions as much as because of any lack of funds. If we were going to win their trust and business, we knew we had to have a secure, seamless, and reliable service.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5581c88730>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
A cloud-native core with Spanner
Our previous self-hosted graph database solution lacked cloud-native capabilities and horizontal scalability. To address these limitations, we strategically transitioned to managed persistence layers, which significantly improves our risk posture. Features like point-in-time restore and multi-regional redundancy enhanced our resilience, reduced recovery time objectives (RTO) and improved recovery point objectives (RPO). Additionally, push-button scaling optimized our cloud budget and operational efficiency.
This cloud-native platform necessitated a database solution with consistent writes, horizontal scalability, low read latency under load, and multi-region failover. Given our extensive use of Google Cloud, we prioritized its database offerings. Spanner emerged as the ideal solution, fulfilling all our requirements. It offers consistent writes, horizontal scalability, and the ability to maintain low read latency even under heavy load. Its seamless scalability — particularly the decoupling of compute and storage resources — proved invaluable in adapting to our dynamic consumer environment.
This robust and scalable infrastructure empowers Current to deliver reliable and efficient financial services, critical for building and maintaining member trust. We are the primary financial relationship for millions of Americans who are trusting us with their money week after week.Our experience migrating from a third-party database to Spanner proved that transitioning to a globally scalable, highly available database can be easy and seamless. Spanner’s unique ability to scale compute and storage independently proved invaluable in managing our dynamic user base.
Our strategic migration to Spanner employed a write-ahead commit log to ensure a seamless transition. By prioritizing the migration of reads and verifying their accuracy before shifting writes, we minimized risk and maximized efficiency. This process resulted in a zero-downtime, zero-loss cutover, where we could first transition reads to Spanner on a service-by-service basis, confirm accuracy, and finally migrate writes.
Ultimately, our Spanner-powered user graph service delivered the consistency, reliability, and scalability essential for our financial platform. We had renewed confidence in our ability to serve our millions of customers with reliable service and new abilities to scale our existing services and future offerings.
Unwavering Reliability and Enhanced Operational Efficiency
Spanner has dramatically improved our resilience, reducing RTO and RPO by more than 10x, cutting times to just one hour. With Spanner’s streamlined data restoration process, we can now recover data with a few simple clicks. Offloading operational management has also significantly decreased our team’s maintenance burden. With nearly 5,000 transactions per second, we continue to be impressed by Spanner’s performance and scalability.
Additionally, since migrating to Spanner, we have reduced our availability-related incidents to zero. Such incidents could disrupt essential banking functions like accessing funds or making payments, leading to customer dissatisfaction and potential churn, as well as increased operational costs for issue resolution. Elimination of these occurrences is critical for building and maintaining member trust, enhancing retention, and improving the developer experience.
Building Financial Resilience with Google Cloud
Looking ahead, we envision a future where our platform continues to evolve, delivering innovative financial solutions that meet the ever-changing needs of our members. With Spanner as the foundation of our core platform — you could call it the core of cores — we are confident in building a resilient and reliable platform that enables millions of more Americans to improve their financial outcomes.
In today’s congested digital landscape, businesses of all sizes face the challenge of optimizing their marketing budgets. They must find ways to stand out amid the bombardment of messages vying for potential customers’ attention. Moreover, they grapple with rising customer acquisition costs and dwindling retention rates, impeding their profitability.
Adding to this complexity is the abundance of consumer data, which businesses often struggle to harness effectively to target the right audience. To address these challenges, companies are seeking data-driven approaches to enhance their advertising effectiveness, to help ensure their continued relevance and profitability.
Moloco offers AI-powered advertising solutions that drive user acquisition, retention, and monetization efforts. Moloco Ads, its demand-side platform (DSP), utilizes its customers’ unique first-party data, helping them to target and acquire high-value users based on real-time consumer behavior — ultimately, delivering higher conversion rates and return on investment.
To meet this demand, Moloco leverages predictions from a dozen deep neural networks, while continuously designing and evaluating new models. The platform ingests 10 petabytes of data and processes bid requests per day at a peak rate of 10.5 million queries per second (QPS).
Moloco has seen tremendous growth over the last three years, with its business growing over 8X and multiple customers spending more than $50 million annually. Moloco’s rapid growth required an infrastructure that could handle massive data processing and real-time ML predictions while remaining cost effective. As Moloco’s models grew in complexity, training times increased, hindering productivity and innovation. Separately, the Moloco team realized that they also needed to optimize serving efficiency to scale low-latency ad experiences for users across the globe.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e55818530a0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
Training complex ML models with GKE
After evaluating multiple cloud providers and their solutions, Moloco opted for Google Cloud for its scalability, flexibility, and robust partner ecosystem.The infrastructure provided by Google Cloud aligned with Moloco’s requirements for handling its rapidly growing data and machine learning workloads that are instrumental to optimizing customers’ advertising performance.
Google Kubernetes Engine (GKE) was a primary reason for Moloco selecting Google Cloud over other cloud providers. As Moloco discovered, GKE is more than a container orchestration tool; it’s a gateway to harnessing the full potential of AI and ML. GKE provides scalability and performance optimization tools to meet diverse ML workloads, and supports a wide range of frameworks, allowing Moloco to customize the platform according to their specific needs.
GKE serves as a foundation for a unified AI/ML platform, integrating with other Google Cloud services, facilitating a robust environment for the data processing and distributed computing that underpin Moloco’s complex AI and ML tasks. GKE’s ML data layer offers the high-throughput storage solutions that are crucial for read-heavy workloads. Features like cluster autoscaler, node-auto provisioner, and pod autoscalers ensure efficient resource allocation.
“Scaling our infrastructure as Moloco’s Ads business grew exponentially was a huge challenge. GKE’s autoscaling capabilities enabled the engineering team to focus on development without spending a ton of effort on operations.” – Sechan Oh, Director of Machine Learning, Moloco
Shortly after migrating to Google Cloud, Moloco began using GKE for model training. However, Moloco quickly found that using traditional CPUs was not competitive at its scale, in terms of both cost and velocity. GKE’s ability to autoscale on multi-host Tensor Processing Units (TPUs), Google’s specialized processing units for machine learning workloads, was critical to Moloco’s success, allowing Moloco to harness TPUs at scale, resulting in significant enhancements in training speed and efficiency.
Moloco further leveraged GKE’s AI and ML capabilities to optimize the management of its compute resources, minimizing idle time and generating cost savings while improving performance. Notably, GKE empowered Moloco to scale its ML infrastructure to accommodate exponential business growth without straining its engineering team. This enabled Moloco’s engineers to concentrate on developing AI and ML software instead of managing infrastructure.
“The GKE team collaborated closely with us to enable auto scaling for multi host TPUs, which is a recently added feature. Their help has really enabled amazing performance on TPUs, reducing our cost per training job by 2-4 times.” – Kunal Kukreja, Senior Machine Learning Engineer, Moloco
In addition to training models on TPUs, Moloco also uses GPUs on GKE to deploy ML models into production. This lets the Moloco platform handle real-time inference requests effectively and benefit from GKE’s scalability and operational stability, enhancing performance and supporting more complex models.
Moloco collaborated closely with the Google Cloud team throughout the implementation process, leveraging their expertise and guidance. The Google Cloud team supported Moloco in implementing solutions that ensured a smooth transition and minimal disruption to operations. Specifically, Moloco worked with the Google Cloud team to migrate its ML workloads to GKE using the platform’s autoscaling and pod prioritization capabilities to optimize resource utilization and cost efficiency. Additionally, Moloco integrated Cloud TPUs into its training pipeline, resulting in significantly reduced training times for complex ML models. Furthermore, Moloco optimized its serving infrastructure with GPUs, ensuring low-latency ad experiences for its customers.
A powerful foundation for ML training and inference
Moloco’s collaboration with Google Cloud profoundly transformed its capacity for innovation.
“By harnessing Google Cloud’s solutions, such as GKE and Cloud TPU, Moloco dramatically reduced ML training times by up to tenfold.”–Sechan Oh, Director of Machine Learning, Moloco
This in turn facilitated swift model iteration and experimentation, empowering Moloco’s engineers to innovate with unprecedented speed and efficiency. Moreover, the scalability and performance of Google Cloud’s infrastructure enabled Moloco to manage increasingly intricate models and expansive datasets, to create and implement cutting-edge machine learning solutions. Notably, Moloco’s low-latency advertising experiences, bolstered by GPUs, fostered enhanced customer satisfaction and retention.
Moloco’s success demonstrates the power of Google Cloud’s solutions to enable businesses achieve their full potential. By leveraging GKE, Cloud TPU, and GPUs, Moloco was able to scale its infrastructure, accelerate its ML training, and deliver exceptional ad experiences to its customers. As Moloco continues to grow and innovate, Google Cloud will remain a critical partner in its success.
Meanwhile, GKE is transforming the AI and ML landscape by offering a blend of scalability, flexibility, cost-efficiency, and performance. And Google Cloud continues to invest in GKE so it can handle even the most demanding AI training workloads. For example, GKE now supports 65,000-node clusters, offering unmatched scale for training or inference. For more, watch this demo of 65,000 nodes on a single GKE cluster.
Based on your feedback, Partner Summit 2025 will begin on Tuesday, April 8 – one day before Google Cloud Next kicks off – to offer a dedicated day of partner breakout sessions and learning opportunities before the main event begins. The Partner Summit Lounge, partner keynote, lightning talks, and more will all be available April 9–11, 2025.
Partner Summit is your exclusive opportunity to:
Accelerate your business by aligning on joint business goals, learning about new programmatic and incentive opportunities, and diving deep into cutting-edge insights in our Partner Summit breakout sessions and lightning talks.
Build new connections as you network with other partners and Googlers while you explore the activities and perks located in our exclusive Partner Summit Lounge.
Get a look at what’s next from Google Cloud leadership at the dedicated partner keynote to learn about where cloud is headed – and how our partners are central to our mission.
Make the most of our partnership with personalized advice from Google Cloud team members on incentives, certifications, co-marketing, and more at our Meet the Experts booths.
Get ready to learn, connect, and build the future of business with us. Early bird registration is now open for $999. This special rate is only available through February 14, 2025, or until tickets are sold out.
Google Cloud Next returns to Las Vegas, April 9–11, 2025* and I’m thrilled to share that registration is now live! We welcomed 30,000 attendees to our largest flagship conference in Google Cloud history this past April, and 2025 will be even bigger and better than ever.
Join us for an unforgettable week of hands-on experiences, inspiring content, problem-solving with our top partners and seize the opportunity to learn from top experts and peers tackling the same challenges you are day in and day out. Walk away with new ideas, breakthrough skills and actionable knowledge only available at Google Cloud Next 2025.
Early bird registration is now available for just $999 for a limited time**.
Here’s why you need to be at Next:
Experience AI in Action: Immerse yourself in the latest technology; build your next agent; explore our demos, hackathons, and workshops; and learn how others are harnessing the power of AI to propel their businesses to new heights.
Forge Powerful Connections: Network with peers, industry experts, and the brightest minds in tech to exchange ideas, spark collaborations, and shape the future of your industry.
Build and Learn Live: With a wealth of demos and workshops, hackathons, keynotes, and deep dives, Next is the place to be for the builders, dreamers, and doers shaping the future of technology.
* Select programming to take place in the afternoon of April 8. ** Space is limited, and this offer is only valid through 11:59 PM PT on February 14, 2025, or until tickets are sold out.
Through our collaboration, the Air Force Research Laboratory (AFRL) is leveraging Google Cloud’s cutting-edge artificial intelligence (AI) and machine learning (ML) capabilities to tackle complex challenges across various domains, from materials science and bioinformatics to human performance optimization. AFRL, the center for scientific research and development for the U.S. Air Force and Space Force, is embracing the transformative power of AI and cloud computing to accelerate its mission of developing and transitioning advanced technologies to the air, space, and cyberspace forces.
This collaboration not only enhances AFRL’s research capabilities, but also aligns with broader Department of Defense (DoD) initiatives to integrate AI into critical operations, bolster national security, and maintain technological advantage by demonstrating game-changing technologies that enable technical superiority and help the Air Force adopt to cutting edge technologies as soon as they are released. By harnessing Google Cloud’s scalable infrastructure, comprehensive generative AI offerings and collaborative environment, the AFRL is driving innovation and ensuring the U.S. Air Force and Space Force remain at the forefront of technological advancement.
Let’s delve into examples of how the AFRL and Google Cloud are collaborating to realize the benefits of AI and cloud services:
Bioinformatics breakthroughs: The AFRL’s bioinformatics research was once hindered by time-consuming manual processes and data bottlenecks, causing delays in moving and sharing data, getting access to US-based tools, using standard storage and hardware, and having the right system communications and integrations across third party infrastructure. Because of this, cross-team collaboration and experiment expansion was severely limited and inefficiently tracked. With very little cloud experience, the team was able to create a siloed environment where they used Google Cloud’s infrastructure, such as Google Compute Engine, Cloud Workstations, and Cloud Run to build analytic pipelines that helped them test, store, and analyze data in an automated and streamlined way. That data pipeline automation paved the way for further exploration and expansion on a use case that had never been done before.
Web app efficiency for lab management: The AFRL’s complex lab equipment scheduling process resulted in challenges in providing scalable, secure access to important content and information for users in different labs. To mitigate these challenges and ease maintenance for non-programmer researchers and lab staff, the team built a custom web application based on Google App Engine, integrated with Google Workspace and Apps Scripts, so that they could capture usage metrics for future hardware investment decisions and automate admin tasks that were taking time away from research. The result was significantly faster ability to make changes without administrator intervention, a variety of self-service options for users to schedule time on equipment and request training, and an enhanced, scalable design architecture with built-in SSO that helped streamline internal content for multiple labs.
Modeling insights into human performance: Understanding and optimizing human performance is critical for the AFRL’s mission. The FOCUS Mission Readiness App, built on Google Cloud utilizes various infrastructure services, such as Cloud Run, Cloud SQL, and GKE and integrates with the Garmin Connect APIs to collect and analyze real-time data from wearables.
By leveraging Google Cloud’s BigQuery and other analytics tools, this app provides personalized insights and recommendations for fatigue interventions and predictions that help capture valuable improvement mechanisms in cognitive effectiveness and overall well-being for Airmen.
Streamlined AI model development with Vertex AI:
The AFRL wanted to replicate the functionality of university HPC clusters, especially since there was a diversity of users that needed extra compute and not everyone was trained on how to use these tools. They wanted an easy GUI and to maintain active connections where they could develop AI models and test their research with confidence. They leveraged Google Cloud’s Vertex AI and Jupyter Notebooks through Workbench, Compute Engine, Cloud Shell, Cloud Build and much more to get a head start in creating a pipeline that could be used for sharing, ingesting, and cleaning their code. Having access to these resources helped create a flexible environment for researchers to do model development and testing in an accelerated manner.
Cloud capabilities and AI/ML tools provide a flexible and adaptable environment that empowers our researchers to rapidly prototype and deploy innovative solutions. It’s like having a toolbox filled with powerful AI building blocks that can be combined to tackle our unique research challenges.
Dr. Dan Berrigan
Air Force Research Laboratory
The AFRL’s collaboration with Google Cloud exemplifies how AI and cloud services can be a driving force behind innovation, efficiency, and problem-solving across agencies. As the government continues to invest in AI research and development, collaborations like this will be crucial for unlocking the full potential of AI and cloud computing, ensuring that agencies across the federal landscape can leverage these transformative technologies to create a more efficient, effective, and secure future for all.
Learn more about how we’ve helped government agencies accelerate their mission and impact with AI.
Watch the Google Public Sector Summit On Demand to gain crucial insights on the critical intersection of AI and Security in the public sector.
Written by: Ilyass El Hadi, Louis Dion-Marcil, Charles Prevost
Executive Summary
Whether through a comprehensive Red Team engagement or a targeted external assessment, incorporating application security (AppSec) expertise enables organizations to better simulate the tactics and techniques of modern adversaries. This includes:
Leveraging minimal access for maximum impact: There is no need for high privilege escalation. Red Team objectives can often be achieved with limited access, highlighting the importance of securing all internet-facing assets.
Recognizing the potential of low-impact vulnerabilities through vulnerability chaining: Low- and medium-impact vulnerabilities can be exploited in combination to achieve significant impact.
Developing your own exploits: Skilled adversaries or consultants will invest the time and resources to reverse-engineer and/or find zero-day vulnerabilities in the absence of public proof-of-concept exploits.
Employing diverse skill sets: Red Team members should include individuals with a wide range of expertise, including AppSec.
Fostering collaboration: Combining diverse skill sets can spark creativity and lead to more effective attack simulations.
Integrating AppSec throughout the engagement: Offensive application security contributions can benefit Red Teams at every stage of the project.
By embracing this approach, organizations can proactively defend against a constantly evolving threat landscape, ensuring a more robust and resilient security posture.
Introduction
In today’s rapidly evolving threat landscape, organizations find themselves engaged in an ongoing arms race against increasingly sophisticated cyber criminals and nation-state actors. To stay ahead of these adversaries, many organizations turn to Red Team assessments, simulating real-world attacks to expose vulnerabilities before they are exploited. However, many traditional Red Team assessments typically prioritize attacking network and infrastructure components, often overlooking a critical aspect of modern attack surfaces: web applications.
This gap hasn’t gone unnoticed by cyber criminals. In recent years, industry reports consistently highlight the evolving trend of attackers exploiting public-facing application vulnerabilities as a primary entry point into organizations. This aligns with Mandiant’s observations of common tactics used by threat actors, as observed in our 2024 M-Trends Report: “In intrusions where the initial intrusion vector was identified, 38% of intrusions started with an exploit. This is a six percentage point increase from 2022.”
The 2024 M-Trends Report also documents that 28.7% of Initial Compromise access is obtained through exploiting public-facing web applications (MITRE T1190).
At Mandiant, we recognize this gap and are committed to closing it by integrating AppSec expertise into our Red Team assessments. This optional approach is offered to customers who wish to increase the coverage of their external perimeters to gain a deeper understanding of their security posture. While most of the infrastructure typically receive a considerable amount of security scrutiny, web applications and edge devices often lack the same level of consideration, making them prime targets for attackers.
This integrated approach is not limited to full-scope Red Team engagements. Organizations with varying maturity levels can also leverage application security expertise within the context of focused external perimeter assessments. These assessments provide a valuable and cost-effective way to gain insights into the security of internet-facing applications and systems, without the need for a Red Team exercise.
The Role of Application Security in Red Team Assessments
The integration of AppSec specialists into Red Team assessments manifests in a unique staffing approach. The role of this specialist is to augment the Red Team’s capabilities with the ever-evolving exploitation techniques used by adversaries to breach organizations from the external perimeter.
The AppSec specialist will often get involved as early as possible on an engagement, even during the scoping and early planning stages. They perform a meticulous review of the target perimeter, mapping out the various application inventory and identifying vulnerabilities within the various components of web applications and application programming interfaces (APIs) exposed to the internet.
While examination is underway, Red Team operators concurrently focus on other crucial aspects of the assessment, including infrastructure preparation, crafting convincing phishing campaigns, developing and refining tools, and creating effective payloads that will evade the target environment’s controls and defense mechanisms.
Once an AppSec vulnerability of critical impact is discovered, the team will generally proceed to its exploitation, notifying our primary point of contact of our preliminary findings and validating the potential impacts of our discovery. It is important to note that a successful finding doesn’t always result in a direct foothold in the target environment. The intelligence gathered through the extensive reconnaissance and perimeter review phase can be repurposed for various aspects of the Red Team mission. This could include:
Identifying valuable reconnaissance targets or technologies to fine-tune a social engineering campaign
Further tailoring an attack payload
Establishing a temporary foothold that might lead to further exploitation
Hosting malicious payloads for later stages of the attack simulation
Once the external perimeter examination phase is complete, our Red Team operators will begin carrying out the remaining mission objectives, empowered with the AppSec team’s insights and intelligence, including identified vulnerabilities and associated exploits. Even though the Red Team operators will perform most of the remaining activities at this point, the AppSec consultants will stay close to the engagement and often engage to further support internal exploitations efforts. For example, applications that are only accessible internally generally get a lot less scrutiny and are consequently assessed much less frequently than externally accessible assets.
By incorporating AppSec expertise, we’ve achieved a significant increase of engagements where our Red Team successfully gained a significant advantage during a customer’s external perimeter review, such as obtaining a foothold or gaining access to confidential information. This overall approach translates to a more realistic and valuable assessment for our customers, ensuring comprehensive coverage of both network and application security risks. By uncovering and addressing vulnerabilities across the entire attack surface, Mandiant empowers organizations to proactively defend against a wide array of threats, strengthening their overall security posture.
Case Studies: Demonstrating the Impact of Application Security Support
In this section, we focus on four of the multiple real-world scenarios where the support of Mandiant’s AppSec Team has significantly enhanced the effectiveness of Red Team assessments. Each case study highlights the attack vectors, the narrative behind the attack, key takeaways from the experience, and the associated assumptions and misconceptions.
These case studies highlight the value of incorporating application security support in Red Team engagements, while also offering valuable learning opportunities that promote collaboration and knowledge sharing.
Unlocking the Vault: Exposed API Key to Sensitive Internal Document Access
Context
A company in the energy sector engaged Mandiant to assess the efficiency of its cybersecurity team’s abilities in detection, prevention, and response. Because the organization had grown significantly in the past years following multiple acquisitions, Mandiant suggested an increased focus on their external perimeter. This would allow the organization to measure the subsidiaries’ external security posture, compared to the parent organization’s.
Target of Interest
Following a thorough reconnaissance phase, the AppSec Team began examination of a mobile application developed by the customer for its business partners. Once the mobile application was decompiled, a hardcoded API key granting unauthorized access to an external API service was discovered. Leveraging the API key, authenticated reconnaissance on the API service was conducted, which led to the discovery of a significant vulnerability within the application’s PDF generation feature: a full-read Server-Side Request Forgery (SSRF), enabled through HTML injection.
Vulnerability Identification
During the initial reconnaissance phase, the team observed that numerous internal systems’ hostnames were publicly accessible through certificate transparency logs. With that in mind, the objective was to exploit the SSRF vulnerability to determine if any of these internal systems were reachable via the external API service. Eventually, one such host was identified: a commercial ASP.NET document management solution. Once the solution’s name and version were identified, the AppSec Team searched for known vulnerabilities online. Among the findings was a recent CVE entry regarding insecure ViewState deserialization, which included details about the affected dynamic-link library (DLL) name.
Exploitation
With no public exploit proof-of-concepts available, the team searched for the DLL without success until the file was found in VirusTotal’s public corpus. The DLL was then decompiled into C# code, revealing the vulnerable function, which provided all the necessary components for a successful exploitation. Next, the application security consultants leveraged the post-authentication SSRF vector to exploit the ViewState deserialization vulnerability, affecting the internal application. This attack chain led to a reliable foothold into the parent organization’s internal network.
Takeaways
The organization’s demilitarized zone (DMZ) was now breached, and the remote access could be passed off to the Red Team operators. This enabled the operators to perform lateral movement into the network and achieve various predetermined objectives. However, the customer expressed high satisfaction with the demonstrated impact prior to lateral movement, especially since the application server housed numerous sensitive documents. This underscores a common misconception that exploiting the external perimeter must necessarily result in facilitating lateral movement within the internal network. Yet, the impact was evident even before lateral movement, simply by gaining access to the customer’s sensitive data.
Breaking Barriers: Blind XSS as a Gateway to Internal Networks
Context
A company operating in the technology industry engaged Mandiant for a Red Team assessment. This company, with a very mature security program, requested that no phishing be performed because they were already conducting numerous internal phishing and vishing exercises. They highlighted that all previous Red Team engagements had relied heavily on various social engineering methods, and the success rate was consistently low.
Target of Interest
During the external reconnaissance efforts, the AppSec Team identified multiple targets of interest, such as a custom-built customer relationship management (CRM) solution. Leveraging the Wayback Machine on the CRM hostname, a legacy endpoint was discovered, which appeared obsolete but still accessible without authentication.
Vulnerability Identification
Despite not being accessible through the CRM’s user interface, the endpoint contained a functional form to request support. The AppSec Team injected a blind cross-site scripting (XSS) payload into the form, which loaded an external JavaScript file containing post-exploitation code. When successful, this method allows an adversary to temporarily hijack the targeted user’s browser tab, allowing attackers to perform actions on behalf of the user. Moments later, the team received a notification that the payload successfully executed within the context of a user browsing an internal customer support administration panel.
The AppSec Team analyzed the exfiltrated Document Object Model (DOM) to further understand the payload’s execution context and assess the data accessible within this internal application.The analysis revealed references to Apache Tapestry framework version 3, a framework initially released in 2004. Shortly after identifying the internal application’s framework, Mandiant deployed a local Tapestry v3 instance to identify potential security pitfalls. Through code review, Mandiant discovered a zero-day deserialization vulnerability in the core framework, which led to remote code execution (RCE). Apache Software Foundation assigned CVE-2022-46366 for this RCE.
Exploitation
The zero-day, which affected the internal customer support application, was exploited by submitting an additional blind XSS payload. Crafted to trigger upon form submission, the payload autonomously executed in an employee’s browser, exploiting the internal application’s deserialization flaw. This led to a crucial foothold within the client’s infrastructure, enabling the Red Team to progress with their lateral movement until all objectives were successfully accomplished.
Takeaways
This real-world scenario highlights a common misconception that cross-site scripting holds minimal relevance in Red Team assessments. The significance and impact of this particular attack vector in this case study were evident: it acted as a gateway, breaching the external network and leveraging an employee’s internal network position as a proxy to exploit the internal application. Mandiant had not previously identified XSS vulnerabilities on the external perimeter, which further highlights how the security posture of the external perimeter can be much more robust than that of the internal network.
Logger Danger: From Log Files to Unauthorized Cloud Access
Context
An organization in the transportation sector engaged Mandiant to perform a Red Team assessment, with the goal of emulating an initial access broker (IAB) threat group, focused on breaching externally exposed systems and services. Those groups, who typically resell illegitimate access to compromised victims’ environments, were previously identified as a significant threat to the organization by the Google Threat Intelligence (GTI) team while building a threat profile to help support assessment activities.
Target of Interest
Among hundreds of external applications identified during the reconnaissance phase, one stood out: a commercial Java-based supply chain management solution hosted in the cloud. This application brought additional attention upon discovery of an online forum post describing its installation procedures. Within the post, a link to an unlisted YouTube video was shared, offering detailed installation and administration guidance. Upon reviewing the video, the AppSec Team noted the URL for the application’s trial installer, still accessible online despite not being referenced or indexed anywhere else.
Following installation and local deployment, an administration manual was available within the installation folder. This manual contained a section for a web-based performance monitor plugin that was deployed by default with the application, along with its default credentials. The plugin’s functionality included logging performance metrics and stack traces locally in files upon encountering unhandled errors. Furthermore, the plugin’s endpoint name was uniquely distinct, making it highly unlikely to be discovered with conventional directory brute-forcing methods.
Vulnerability Identification
The AppSec Team successfully logged into the organization’s performance monitor plugin by using the default credentials sourced from the administration manual and resumed local testing to identify post-authentication vulnerabilities. Conducting code review in parallel with manual testing, a log management feature was identified, which allowed authenticated users to manipulate log filenames and directories. The team also observed they could induce errors through targeted, malformed HTTP requests. In conjunction with the log filename manipulation, it was possible to force arbitrary data to be stored at an arbitrary file location on the underlying server’s file system.
Exploitation
The strategy involved intentionally triggering exceptions, which the performance monitor would then log in an attacker-defined Jakarta Server Pages (JSP) file within the web application’s root directory. The AppSec Team crafted an exploit that injected arbitrary JSP code into an HTTP request’s parameter, forcing the performance monitor to log errors into the attacker-controlled JSP file. Upon accessing the JSP log file, the injected code executed, enabling Mandiant to breach the customer’s cloud environment and access thousands of sensitive logistics documents.
Takeaways
A common assumption that breaches should lead to internal on-premises network access or to Active Directory compromise was challenged in this case study. While lateral movement was constrained by time, the primary objective was achieved: emulating an initial access broker. This involved breaching the cloud environment, where the client lacked visibility compared to its internal Active Directory network, and gaining access to business-critical crown jewels.
Collaborative Intrusion: Webhooks to CI/CD Pipeline Access
Context
A company in the automotive sector engaged Mandiant to perform a Red Team assessment, with the goal of obtaining access to their continuous integration and continuous delivery/deployment (CI/CD) pipeline. Due to the sheer number of externally exposed systems, the AppSec Team was staffed to support the Red Team’s reconnaissance and breaching efforts.
Target of Interest
Most of the interesting applications were redirecting to the customer’s single-sign on (SSO) provider. However, one application had a different behavior. By querying the Wayback Machine, the team uncovered an endpoint that did not redirect to the SSO. Instead, it presented a blank page with a unique favicon. With the goal of identifying the application’s underlying technology, the favicon’s hash was calculated and queried using Shodan. The results returned many other live applications sharing the same favicon. Interestingly, some of these applications operated independently of SSO, aiding the team in identifying the application’s name and vendor.
Vulnerability Identification
Once the application’s name was identified, the team visited the vendor’s website and accessed their public API documentation. Among the API endpoints, one stood out—it could be directly accessed on the customer’s application without redirection to the SSO. This API endpoint did not require authentication and only took an incremental numerical ID as its parameter’s value. Upon querying, the response contained sensitive employee information, including email addresses and phone numbers. The team systematically iterated through the API endpoint, incrementing the ID parameter to compile a comprehensive list of employee email addresses and phone numbers. However, the Red Team refrained from leveraging this data, as another intriguing application was discovered. This application exposed a feature that could be manipulated into sending fully user-controlled emails from the company’s no-reply@ email address.
Capitalizing on these vulnerabilities, the Red Team initiated a phishing campaign, successfully gaining a foothold in the customer’s network before the AppSec Team could identify an external breach vector. As efforts continued on the internal post-exploitation, the application security consultants shifted their focus to support the Red Team’s efforts within the internal network.
Exploitation
Digging into network shares, the Red Team found credentials of a developer for an enterprise source control application account. The AppSec Team sifted through reconnaissance data and flagged that the same source control application server was exposed externally. The credentials were successfully used to log in, as multi factor authentication was absent for this user. Within the GitHub interface, the team uncovered a pre-defined webhook linked to the company’s internal Jenkins—an integration commonly employed for facilitating communication between source control systems and CI/CD pipelines. Leveraging this discovery, the team created a new webhook. When manually triggered by the team, this webhook would perform an SSRF to internal URLs. This eventually led to the exploitation of an unauthenticated Jenkins sandbox bypass vulnerability (CVE-2019-1003030), and ultimately in remote code execution, effectively compromising the organization’s CI/CD pipeline.
Takeaways
In this case study, the efficacy of collaboration between the Red Team and the AppSec Team was demonstrated. Leveraging insights gathered collectively, the teams devised a strategic plan to achieve the main objective set by the customer: accessing its CI/CD pipelines. Moreover, we challenged the misconception that singular critical vulnerabilities are indispensable for reaching objectives. Instead, we revealed the reality where achieving goals often requires innovative detours. In fact, a combination of vulnerabilities or misconfigurations, whether they are discovered by the AppSec Team or the Red Team, can be strategically chained together to accomplish the mission.
Conclusion
As this blog post demonstrated, the integration of application security expertise into Red Team assessments yields significant benefits for organizations seeking to understand and strengthen their security posture. By proactively identifying and addressing vulnerabilities across the entire attack surface, including those commonly overlooked by traditional approaches, businesses can minimize the risk of breaches, protect critical assets, and hopefully avoid the financial and reputational damage associated with successful attacks.
This integrated approach is not limited to Red Team engagements. Organizations with varying maturity levels can also leverage application security expertise within the context of focused external perimeter assessments. These assessments provide a valuable and cost-effective way to gain insights into the security of internet-facing applications and systems, without the need for a Red Team exercise.
Whether through a comprehensive Red Team engagement or a targeted external assessment, incorporating application security expertise enables organizations to better simulate the tactics and techniques of modern adversaries.
Google Cloud is delighted to announce the opening of our 41st cloud region in Querétaro, Mexico. This marks our third cloud region in Latin America, joining Santiago, Chile, and São Paulo, Brazil. From Querétaro, we’ll provide fast, reliable cloud services to businesses and public sector organizations throughout Mexico and beyond. This new region offers low latency, high performance, and local data residency, empowering organizations to innovate and accelerate digital transformation initiatives.
Helping organizations in Mexico thrive in the cloud
Google Cloud regions are major investments to bring best-in-class infrastructure, cloud and AI technologies closer to customers. Enterprises, startups, and public sector organizations can leverage Google Cloud’s infrastructure economy of scale and global network to deliver applications and digital services to their end users.
With this new region in Querétaro, Mexico, Google Cloud customers enjoy:
Speed: Serve your end users with fast, low-latency experiences, and transfer large amounts of data between networks easily across Google’s global network.
Security: Keep your organizations’ and customers’ data secure and compliant, including meeting the requirements of CNBV contractual frameworks, and maintain local data residency.
Capacity: Scale to meet growing user and business needs.
Sustainability: Reduce the carbon footprint of your IT environment and help meet sustainability targets.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3edc867b96d0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Google Cloud customers are eager to benefit from the new possibilities that this cloud region offers:
“At Prosa, we have been undergoing a transformation process for the past three years that involves adopting technology and developing digital skills within our teams. The partnership with Google has been key to carrying out projects, evolving towards digital business models, enabling the ecosystem, promoting the API-ification of services, and improving data analysis. This alliance is only deepened with the launch of the new Google Cloud region, which will facilitate the integration of participants into the payment ecosystem in a secure and highly available manner, improving the customer experience and delivering value more quickly and agilely,” said Salvador Espinosa, CEO of Prosa, a payment technology company that processed more than 10 million transactions in 2023.
Building a new Google Cloud region in Querétaro, Mexico is welcomed by the Mexican public sector.
“The new Google cloud region in Mexico will be key to build a digital government accountable to citizens, deepening our path to digital transformation. Since 2018, the Auditoria Superior de la Federación (ASF) has pioneered digital transformation in Mexico, promoting innovation and the responsible use of technology, while using advanced technologies like Google Cloud’s Vertex AI, among other proprietary tools, to enhance data analysis, automate processes, and improve collaboration. This enables more accurate decision-making, optimized oversight of public spending, increased inspection coverage, and transparent use of resources. Thanks to the cloud, we see a future where technology is a strategic ally to execute efficient, agile and exhaustive digital audits, detect irregularities early, and strengthen accountability. ASF’s focus on transparency and efficiency aligns with President Claudia Sheinbaum’s public innovation policy.” – Emilio Barriga Delgado, Special Auditor of Federalized Expenditure, Auditoria Superior de la Federación
The new cloud region also opens new opportunities for our global ecosystem of over 100,000 incredibly diverse partners.
“For Amarello and our customers, the availability of a new region in Mexico demonstrates the great growth of Google Cloud and its commitment to Mexico. It’s also a great milestone for the country, putting us on par with other economies. This will create jobs that will speed up our clients’ adoption of strategic projects and latency-sensitive technological services such as financial services or mission-critical operations. At the same time, the new region will enable projects that require information to be maintained within the national territory, now on the most innovative and secure public cloud.” – Mauricio Sánchez Valderrama, managing partner, Amarello Tecnologías de Información
And for global companies looking to tap into the Mexican market:
As networks shift to a cloud-first approach, and hybrid work enables work from anywhere, businesses in the Mexico region can now securely accelerate innovation, boost efficiency, and enhance customer experiences with Palo Alto Networks AI-powered solutions, like Prisma SASE, built in the cloud to secure the cloud at scale. The powerful collaboration between Google Cloud and Palo Alto Networks reinforces our commitment to security and innovation so organizations can confidently embrace the AI-driven future, knowing their users, data, and applications are protected from evolving threats.” Anupam Upadhyaya, Vice President, Product Management, Palo Alto Networks
Delivering on our commitment to Latin America
In 2022, we announced a five-year, $1.2 billion commitment to Latin America, focusing on four key areas: digital infrastructure, digital skills, entrepreneurship, and inclusive, sustainable communities.
We’re equally committed to creating new career opportunities for people in Mexico and Latin America: We’re working with over 550 universities across Latin America to offer a robust and continuously updated portfolio of learning resources so students can seize the opportunities created by new digital technologies like AI and the cloud. As a result, we’ve already granted more than 14,000 digital skill badges to students and individual developers in Mexico over the last 24 months.
Another example of our commitment is the “Súbete a la nube” program that we created in partnership with the Inter-American Development Bank (IDB), with a focus on women and the southern region of the country. To date, 12,500 people have registered for essential digital skills training in cloud computing through the program.
Today, we’re also announcing a commitment to train 1 million Mexicans in AI and cloud technologies over the coming years. Google Cloud will continue to skill Mexico’s local talent with a variety of no-cost training programs for students, developers and customers. Some of the ongoing training programs will include no-cost, localized courses available through YouTube, credentials through the Google Cloud Skills Boost platform, community support by Google Developer Groups, and scholarships for the Google Career Certificates that help prepare learners for high-growth, in-demand jobs in fields like cybersecurity and data analytics, so the cloud can truly democratize innovation and technology.
This new Google Cloud region is also a step towards providing generative AI products and services to Latin American customers. Cloud computing will increasingly be a key gateway towards the development and usage of AI, helping organizations compete and innovate at global scale.
Google Cloud is dedicated to being the partner of choice for customers undergoing digital transformation. We’re focused on providing sustainable, low-carbon options for running applications and infrastructure. Since 2017, we’ve matched 100% of our global annual electricity use with renewable energy. We’re aiming even higher with our 2030 goal: operating on 24/7 carbon-free energy across every electricity grid where we operate, including Mexico.
We’re incredibly excited to open the Querétaro, Mexico region, bringing low-latency, reliable cloud services to Mexico and Latin America, so organizations can take advantage of all that the cloud has to offer. Stay tuned for even more Google Cloud regions coming in 2025 (and beyond), and click here to learn more about Google Cloud’s global infrastructure.
AI agents are revolutionizing the landscape of gen AI application development. Retrieval augmented generation (RAG) has significantly enhanced the capabilities of large language models (LLMs), enabling them to access and leverage external data sources such as databases. This empowers LLMs to generate more informed and contextually relevant responses. Agentic RAG represents a significant leap forward, combining the power of information retrieval with advanced action planning capabilities. AI agents can execute complex tasks that involve multiple steps that reason, plan and make decisions, and then take actions to execute goals over multiple iterations. This opens up new possibilities for automating intricate workflows and processes, leading to increased efficiency and productivity.
LlamaIndex has emerged as a leading framework for building knowledge-driven and agentic systems. It offers a comprehensive suite of tools and functionality that facilitate the development of sophisticated AI agents. Notably, LlamaIndex provides both pre-built agent architectures that can be readily deployed for common use cases, as well as customizable workflows, which enable developers to tailor the behavior of AI agents to their specific requirements.
Today, we’re excited to announce a collaboration with LlamaIndex on open-source integrations for Google Cloud databases including AlloyDB for PostgreSQL and Cloud SQL for PostgreSQL.
These LlamaIndex integrations, available to download via PyPi llama-index-alloydb-pg and llama-index-cloud-sq-pg, empower developers to build agentic applications that can connect with Google databases. The integrations include:
In addition, developers can also access previously published LlamaIndex integrations for Firestore, including for Vector Store and Index Store.
Integration benefits
LlamaIndex supports a broad spectrum of different industry use cases, including agentic RAG, report generation, customer support, SQL agents, and productivity assistants. LlamaIndex’s multi-modal functionality extends to applications like retrieval-augmented image captioning, showcasing its versatility in integrating diverse data types. Through these use cases, joint customers of LlamaIndex and Google Cloud databases can expect to see an enhanced developer experience, complete with:
Streamlined knowledge retrieval: Using these packages makes it easier for developers to build knowledge-retrieval applications with Google databases. Developers can leverage AlloyDB and Cloud SQL vector stores to store and semantically search unstructured data to provide models with richer context. The LlamaIndex vector store integrations let you filter metadata effectively, select from vector similarity strategies, and help improve performance with custom vector indexes.
Complex document parsing: LlamaIndex’s first-class document parser, LlamaParse, converts complex document formats with images, charts and rich tables into a form more easily understood by LLMs; this produces demonstrably better results for LLMs attempting to understand the content of these documents.
Secure authentication and authorization: LlamaIndex integrations to Google databases utilize the principle of least privilege, a best practice, when creating database connection pools, authenticating, and authorizing access to database instances.
Fast prototyping: Developers can quickly build and set up agentic systems with readily available pre-built agent and tool architectures on LlamaHub.
Flow control: For production use cases, LlamaIndex Workflows provide the flexibility to build and deploy complex agentic systems with granular control of conditional execution, as well as powerful state management.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e61ee34f490>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
A report generation use case
Agentic RAG workflows are moving beyond simple question and answer chatbots. Agents can synthesize information from across sources and knowledge bases to generate in-depth reports. Report generation spans across many industries — from legal, where agents can do prework such as research, to financial services, where agents can analyze earning call reports. Agents mimic experts that sift through information to generate insights. And even if agent reasoning and retrieval takes several minutes, automating these reports can save teams several hours.
LlamaIndex provides all the key components to generate reports:
Structured output definitions with the ability to organize outputs into Report templates
Intelligent document parsing to easily extract and chunk text and other media
Knowledge base storage and integration across the customer’s ecosystem
Agentic workflows to define tasks and guide agent reasoning
Now let’s see how these concepts work, and consider how to build a report generation agent that provides daily updates on new research papers about LLMs and RAG.
1. Prepare data: Load and parse documents
The key to any RAG workflow is ensuring a well-created knowledge base. Before you store the data, you need to ensure it is clean and useful. Data for the knowledge bases can come from your enterprise data or other sources. To generate reports for top research articles, developers can use the Arxiv SDK to pull free, open-access publications.
But rather than use the ArxivReader to load and convert articles to plain text, LlamaParse supports varying paper formats, tables, and multimodal media leading to improved accuracy of document parsing.
To improve the knowledge base’s effectiveness, we recommend adding metadata to documents. This allows for advanced filtering or support for additional tooling. Learn more about metadata extraction.
2. Create a knowledge base: storage data for retrieval
Now, the data needs to be saved for long-term use. The LlamaIndexGoogle Cloud database integrations support storage and retrieval of a growing knowledge base.
2.1. Create a secure connection to the AlloyDB or Cloud SQL database
Utilize the AlloyDBEngine class to easily create a shareable connection pool that securely connects to your PostgreSQL instance.
Create only the necessary tables needed for your knowledge base. Creating separate tables reduces the level of access permissions that your agent needs. You can also specify a special “publication_date” metadata column that you can filter on later.
2.2. Customize the underlying storage with the Document Store, Index Store, and Vector Store. For the vector store, specify the metadata field “publication_date” that you created previously.
2.4. Create tools from indexes to be used by the agent.
code_block
<ListValue: [StructValue([(‘code’, ‘search_tool = QueryEngineTool.from_defaults(rn query_engine=index.as_query_engine(),rn description=”Useful for retrieving specific snippets from research publications.”,rn)rnrnsummary_tool = = QueryEngineTool.from_defaults(rn query_engine=summary_tool.as_query_engine(),rn description=”Useful for questions asking questions about research publications.”,rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e61ee34f070>)])]>
3. Prompt: create an outline for the report
Reports may have requirements on sections and formatting. The agent needs instructions for formatting. Here is an example outline of a report format:
code_block
<ListValue: [StructValue([(‘code’, ‘outline=”””rn# DATE Daily report: TOPICrnrn## Executive Summaryrnrn## Top Challenges / Description of problemsrnrn## Summary of papersrnrn| Title | Authors | Summary | Links |rn| —– | ——- | ——- | —– |rn|LOTUS: Enabling Semantic Queries with LLMs Over Tables of Unstructured and Structured Data | Liana Patel, Siddharth Jha, Carlos Guestrin, Matei Zaharia | … | https://arxiv.org/abs/2407.11418v1 |rn”””‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e61ee34fdf0>)])]>
4. Define the workflow: outline agentic steps
Next, you define the workflow to guide the agent’s actions. For this example workflow, the agent tries to reason what tool to call: summary tools or the vector search tool. Once the agent has reasoned it doesn’t need additional data, it can exit out of the research loop to generate a report.
LlamaIndex Workflows provides an easy to use SDK to build any type of workflow:
Now that you’ve set up a knowledge base and defined an agent, you can set up automation to generate a report!
code_block
<ListValue: [StructValue([(‘code’, ‘query = “What are the recently published RAG techniques”rnreport = await agent.run(query=query)rnrn# Save the reportrnwith open(“report.md”, “w”) as f:rn f.write(report[‘response’])’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e61ee34fa90>)])]>
There you have it! A complete report that summarizes recent research in LLM and RAG techniques. How easy was that?
Get started today
In short, these LlamaIndex integrations with Google Cloud databases enables application developers to leverage the data in their operational databases to easily build complex agentic RAG workflows. This collaboration supports Google Cloud’s long-term commitment to be an open, integrated, and innovative database platform. With LlamaIndex’s extensive user base, this integration further expands the possibilities for developers to create cutting-edge, knowledge-driven AI agents.
Ready to get started? Take a look at the following Notebook-based tutorials: