gcp

2025 08 08

GCP – Google is a Leader in the Gartner® Magic Quadrant for Strategic Cloud Platform Services

For the eighth consecutive year, Gartner® has named Google a Leader in the Gartner Magic Quadrant™ for Strategic Cloud Platform Services, and this year Google is also now ranked the highest for completeness of vision.

Download the complimentary 2025 Magic Quadrant for Strategic Cloud Platform Services.

How do we keep getting recognized year over year? In our opinion, and what our customers consistently relay, are three major advantages of working with Google Cloud infrastructure:

A purpose-built modern cloud, specifically designed for AI-first apps and services.
Workload-optimized compute, storage, and the most versatile cloud networking that make it easier to migrate and modernize key enterprise workloads — SAP, VMware, Microsoft, Oracle, OpenShift, and even mainframes.
Leading reliability and security among hyperscalers — customers see Google as an extension of their security team.

But that’s not the full story. What does Google Cloud do behind the scenes to provide these advantages to our customers? Let’s dive into the five principles that Google Cloud lives by as we continue to build and enhance our infrastructure:

1. Run workloads on precisely optimized infrastructure

At Google Cloud, we believe that simply adding more hardware is an unsustainable and ineffective approach to scaling application performance. That’s why we develop strategic, workload-optimized infrastructure technologies. For example, Titanium is a system of hardware and software offloads, boosting performance, reliability, and security and maximizing workload efficiency.

In the past 12-months we also enhanced our Compute Engine platform to drive enterprise transformation with more massive developments:

Fourth-generation compute instances and Hyperdisk block storage, focusing on optimizing performance and costs across workloads, while delivering enterprise-grade scalability, reliability, security, and workload consistency, ultimately enabling businesses to grow efficiently and invest more in innovation.
We also developed our own custom Arm®-based CPUs, Google Axion processors, aimed at maximizing performance, reducing infrastructure costs, and supporting sustainability goals for general-purpose workloads.
Continued enhancements to our fully managed database services, such as AlloyDB for PostgreSQL and Cloud SQL, and easy migration from traditional databases to Google Cloud infrastructure.

2. Build AI-centric infrastructure

Google, like in 2024, is again ranked #1 for AI/ML in the 2025 Gartner Critical Capabilities for Strategic Cloud Platform Services. And we’ve only added to our depth and breadth since then. For evidence of our commitment to AI, there’s AI Hypercomputer, built on Google AI technology developed over the past decade, underpinning nearly every AI workload run on Google Cloud.

This integrated supercomputing system delivers more intelligence at a consistently low price for both training and serving AI workloads. It powers Vertex AI, our managed machine learning (ML) platform, that unifies the entire ML lifecycle—from data preparation to model deployment and monitoring. From an infrastructure perspective, it allows data scientists and developers to focus on building models rather than provisioning and managing the infrastructure required to run them.

Beyond core infrastructure, we’re also dedicated to fostering an open and innovative generative AI partner ecosystem, helping companies rapidly transform their software development, business processes, and information retrieval.

We’ve also released a plethora of new AI products and services, including: A

Ironwood, announced at Next ‘25, is our 7th generation TPU, built specifically for inference, offering 5x more peak compute capacity and 6x more high-bandwidth memory than its predecessor and achieving a staggering 42.5 exaFLOPS of compute in its larger configuration while being 2x more power efficient.
Google Cloud Managed Lustre, a fully managed parallel file system built on the DDN EXAScaler Lustre file system, which provides PB scale at under 1ms latency, millions of IOPS, and TB/s of throughput for AI workloads; Rapid Storage, a new Cloud Storage zonal bucket that provides industry-leading <1ms random read and write latency, 20x faster data access, 6 TB/s of throughput, and 5x lower latency for random reads and writes compared to other leading hyperscalers; Anywhere Cache, which provides an SSD-backed zonal read cache for Cloud Storage buckets; Hyperdisk Exapools, entering preview this quarter, providing exabytes of block storage capacity and TB/s of throughput for AI clusters.
Gemini CLI (Command Line Interface), an open-source AI agent that brings Gemini 2.5 Pro directly into the terminal with unmatched free usage limits (60 model requests per minute, 1,000 requests per day) and integrates with Gemini Code Assist for AI-first coding and problem-solving.

3. Meet the AI moment with containers

With Google Kubernetes Engine (GKE), Google is once again a Leader in the Gartner Magic Quadrant for Container Management this year. And like our advancements elsewhere, we feel it’s only gotten better. Built on years of experience and a deep commitment to Kubernetes, GKE has emerged as the foundation for the next generation of AI workloads. In fact, we use GKE to power our own leading AI services at scale, including Vertex AI, leveraging the same cutting-edge technologies and best practices we share with you.

To continue empowering your AI innovation, we’ve rolled out several impressive new releases:

GKE Autopilot, with an impressive 30% of active GKE clusters created in 2024 operating in Autopilot mode, showcases its effectiveness in simplifying operations and enhancing resource efficiency for critical workloads.
Cluster Director, now GA, allows for the deployment and management of massive clusters of accelerators (GPUs and TPUs) as a single, high-performance unit, crucial for demanding AI models and distributed workloads.
For optimizing AI inference, the new GKE Inference Quickstart and GKE Inference Gateway (both in public preview) simplify model deployment with benchmarked profiles and provide intelligent routing, leading to up to a 30% reduction in serving costs and up to a 60% decrease in tail latency.

4. Empower customers with true sovereignty

At Google Cloud, we believe that digital sovereignty is about more than just compliance; it’s about providing organizations with flexibility, control, choice, and robust security without compromising functionality to enable innovation. We do that by delivering technology capabilities that align with our customers’ diverse needs, backed by extensive engagement with local partners and policymakers.

Our strength in this area comes from our massively scaled global infrastructure, encompassing over 42 cloud regions, 127 zones, and 202 network edge locations, alongside significant subsea cable investments. Two of our most important recent developments here include:

Forging key partnerships with independent local and regional partners across Asia, Europe, the Middle East, and the United States, such as Schwarz Group, T-Systems, S3NS (with Thales), Minsait, Telecom Italia, and World Wide Technology
Designing a portfolio of sovereign solutions designed to fit your unique business needs, regulatory requirements, and risk profiles. With three precisely designed solutions, Google Cloud Data Boundary, Google Cloud Dedicated, and Google Cloud Air-Gapped, organizations can choose what you need based on their unique requirements.

5. Deliver a “planet-scale” network

Our strength in networking is built upon over 25 years of Google’s foundational innovations, connecting billions of people globally to essential services like Gmail, YouTube, and Search. This deep expertise has allowed us to build planet-scale network infrastructure that powers Google Cloud and our Cross-Cloud Network solutions. Our vast backbone network features 202 points of presence (PoPs), over 2 million miles of fiber, and 33 subsea cables, all backed by a 99.99% reliability SLA, providing a robust and resilient global platform.

With the rapid emergence of AI, today’s industries are demanding unprecedented network capabilities for training, inference, and serving AI models, including massive capacity, seamless connectivity, and robust security. We’re addressing this with continuous innovation in our cloud networking products and Cross-Cloud Network solutions, enabling customers to easily build and deliver distributed AI applications.

To meet these demanding requirements, we’ve launched a suite of impressive new networking solutions, including:

Cloud WAN, our new, fully managed, reliable, and secure enterprise backbone designed to transform wide area network (WAN) architectures by leveraging Google’s planet-scale network. It offers up to 40% faster performance compared to the public internet and up to a 40% savings in total cost of ownership (TCO) over customer-managed solutions.
Cloud Interconnect and the new Cross-Site Interconnect (options in Cloud WAN, currently in preview), make Google Cloud the first major cloud provider to offer transparent Layer 2 connectivity over its network.
For AI-optimized networking, 400G Cloud Interconnect (4x more bandwidth for massive data ingestion), networking support for up to tens of thousands of GPUs per cluster, and Zero-Trust RDMA security for high-performance GPU/TPU traffic.

Take the next steps on your journey with Google Cloud

From optimizing legacy apps to building next-gen AI and everything in between, Google Cloud lets you tackle your biggest challenges with a platform that works across your data centers, other clouds, and the edge. With decades of experience and a proven track record of reliability, Google Cloud infrastructure is equipped to handle your most visionary workloads and is the ideal partner to drive your business transformation.

You can download a complimentary copy of the 2025 Magic Quadrant for Strategic Cloud Platform Services on our website.

Read More for the details.

2025 08 08

GCP – Looker debuts MCP Server to broaden AI developer access to data

Tibor Kiss Cloud, Google Cloud gcp

As companies integrate AI into their workflows, connecting new tools to their existing data while ensuring consistent security and accuracy becomes increasingly important. We’re introducing Looker Model Context Protocol (MCP) Server, an integration in the MCP Toolbox for Databases. This allows AI applications such as chatbots and custom agents to connect to trusted data from the environments AI developers use every day.

Looker already helps thousands of organizations to access, analyze, and act on a single, consistent, and governed view of their data through its robust semantic layer, connecting to hundreds of data sources such as BigQuery, AlloyDB, and Cloud SQL. With the launch of the Looker in MCP Toolbox, we are extending our leadership in trusted generative AI for BI by bringing this functionality to the emerging world of AI applications and agents.

MCP is an open standard technology that allows large language models (LLMs) and AI applications to access other products consistently and securely. Looker’s MCP Toolbox integration connects applications to the LLM along with structured metadata and specific request parameters. In addition, the MCP Server can expose unstructured natural language information about how the data source is called and what type of information it returns.

MCP essentially acts as a universal translator, enabling AI models to:

Discover and use tools dynamically: Rather than hardcoded integrations, AI agents can identify and interact with available capabilities in real-time.
Access relevant, up-to-date context: AI models can pull live, verified information directly from its source, significantly reducing hallucinations and improving response accuracy.
Ensure secure and governed data access: MCP provides a host-mediated security model, allowing fine-grained control over what data AI agents can access and how.

Looker MCP Server + Claude — Accessing Looker from Claude Desktop

Intelligent AI apps, meet intelligent data

The debut of Looker’s MCP Server, combined with its semantic layer, transforms the opportunity for data-driven AI. There is no need for AI to write SQL. The AI queries Looker’s semantic layer and Looker generates the correct, optimized SQL. Here’s what this means for your organization:

Trusted data for AI, on-demand: Looker’s semantic layer ensures that all your business metrics and definitions are consistent and governed. With the Looker MCP Server, AI agents can now directly query this single source of truth, receiving accurate and reliable data-driven insights without the risk of misinterpretation or outdated information.
Enhanced security and data governance for AI: Looker’s MCP Toolbox integration inherits Looker’s robust security model, allowing administrators to define precise access controls for AI agents. You can dictate which AI applications can access what data, at what granularity, and for what purpose, all within the familiar Looker environment. Sensitive data remains protected, and audit trails ensure compliance.
Accelerated AI application development: Developers building AI-powered applications that need to interact with enterprise data often face complex integration challenges. By exposing Looker’s rich data models via a standardized MCP interface, AI developers can now easily connect their agents to a pre-defined, trusted data layer, reducing development time and effort.
Integration with the tools your AI developers use: Looker’s MCP Toolbox integration can be accessed today through any agent that supports MCP, including offerings such as Gemini’s Command Line Interface, Anthropic’s Claude Desktop, and Cursor.

Get started with Looker MCP Server

Looker’s MCP Server via MCP Toolbox continues Google Cloud’s commitment to making data truly useful and accessible, for all users, including the next generation of intelligent AI applications. We believe this release will empower organizations to unlock unprecedented value from their data through modern AI tools, driving smarter decisions and accelerating innovation across every facet of their business.

To get started with Looker MCP Server, check out our Quickstart guide on Github.

Read More for the details.

2025 08 07

GCP – Secure your storage: Best practices to prevent dangling bucket takeovers

Tibor Kiss Cloud, Google Cloud gcp

Storage buckets are where your data lives in the cloud. Much like digital real estate, these buckets are your own plot of land on the internet. When you move away and no longer need a specific bucket, someone else can reuse the plot of land it refers to — if the old address is still accessible to the public.

This is the core idea behind a dangling bucket attack. It happens when you delete a storage bucket, but references to it still exist in your application code, mobile apps, and public documentation. An attacker can then simply claim the same bucket name in their own project, effectively hijacking your old address to potentially serve malware and steal data from users who unknowingly still rely on a bucket that is no longer officially in use.

Fortunately, you can protect your applications and users from this threat with the following four steps.

First, implement a safe decommissioning plan

When you delete a bucket, do so carefully. A deliberate decommissioning process is critical. Before you type gcloud storage rm, follow these steps:

Step 1: Audit and learn

Before deleting anything, take the time to understand who and what are still accessing the bucket. Use logs to check for recent traffic. If you see active traffic requests coming from old versions of your app, third-party services, and users, investigate them. Pay extra attention to requests attempting to pull executable code, machine learning models, dynamic web content (such as Java Script), and sensitive configuration files.

You might see a lot of requests coming from bots, data crawlers, and scanners by checking the user agent of the requester. Their requests are essentially background noise, and don’t indicate that the bucket is actively required for your systems to function correctly. These are not dangerous and can be safely disregarded because they don’t represent legitimate traffic from your applications and users.

Step 2: Delete with confidence

Many automated processes and user activities don’t happen every day, so it’s important to wait at least a week before deleting the bucket. Waiting for at least a week increases the confidence that you’ve observed a full cycle of activity, including:

Weekly reports: Scripts that generate reports and perform data backups on a weekly schedule.
Batch jobs: Automated tasks that might only run over the weekend or on a specific day of the week.
Infrequent user access: Users who may only use a feature that relies on the bucket’s data once a week.

After you’ve verified that no legitimate traffic is hitting the bucket for at least a week, and you’ve updated all of your legacy code, then you can proceed with deleting the bucket. Deleting a Google Cloud project effectively deletes all resources associated with it, including all Google Cloud Storage buckets.

Next, find and fix code that references dangling buckets

Preventing future issues is key, but you may have references to dangling buckets in your environment right now. Here’s a plan to hunt them down and fix them.

Step 3: Proactive discovery

Analyze your logs: This is one of your most powerful tools. Query your Cloud Logging data for server and application logs showing repeated 404 Not Found errors for storage URLs. For example, a high volume of failed requests to the same non-existent bucket name is a major red flag (and to remediate it, we recommend you continue with Step 3 and then proceed to Step 4.)

Scan your codebase and documentation: Perform a comprehensive scan of all your private and open-source code repositories (including old and archived ones), configurations, and documentation for any references to your storage bucket names that may no longer be in use. One of the ways to find them is to look for the following patterns:

gs://{bucket-name}
storage.googleapis.com/{bucket-name}
{bucket-name}.storage.googleapis.com
commondatastorage.googleapis.com/{bucket_name}
{bucket_name}.commondatastorage.googleapis.com

You can find whether a bucket still exists by querying https://storage.googleapis.com/{your-bucket-name}. If you see response NoSuchBucket, it means you identified a dangling bucket reference.

code_block: <ListValue: [StructValue([(‘code’, ‘<Error>rn<Code>NoSuchBucket</Code>rn<Message>The specified bucket does not exist.</Message>rn</Error>’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef76a4bbc40>)])]>

If the bucket exists (and you do not get a NoSuchBucket error), you should verify that it actually belongs to your organization — a threat actor may have already claimed the name.

The easiest way to check for ownership is to try to read the bucket’s Identity and Access Management (IAM) permissions.

If you run a command like gcloud storage buckets get-iam-policy gs://{bucket-name} and receive an Access Denied or 403 Forbidden error, this is a sign bucket is claimed by someone else. It proves the bucket exists, but your account doesn’t have permission to manage it — indicating it has been taken over. This reference should be treated as a risk and be removed.

For your convenience, we provide a script below that can find dangling references in a given file.

code_block: <ListValue: [StructValue([(‘code’, ‘import rernimport sysrnfrom typing import Optional, Setrnrnimport requestsrnfrom requests.exceptions import RequestExceptionrnrndef check_bucket(bucket_name: str) -> Optional[requests.Response]:rn try:rn with requests.Session() as session:rn response = session.head(f”https://storage.googleapis.com/{bucket_name}”)rn return responsern except RequestException as e:rn print(f”An error occurred while checking bucket {bucket_name}: {e}”)rn return Nonernrnrndef sanitize_bucket_name(bucket_name: str) -> Optional[str]:rn # Remove common prefixes and quotesrn bucket_name = bucket_name.replace(“gs://”, “”)rn bucket_name = bucket_name.replace(“\””, “”)rn bucket_name = bucket_name.replace(“\'”, “”)rn bucket_name = bucket_name.split(“/”)[0]rnrn # Validate the bucket name format according to GCS naming conventionsrn if re.match(“^[a-z0-9-._]+$”, bucket_name) is None:rn return Nonern return bucket_namernrnrndef extract_bucket_names(line: str) -> Set[str]:rn all_buckets: Set[str] = set()rnrn pattern = re.compile(rn r’gs://([a-z0-9-._]+)|’rn r'([a-z0-9-._]+)\.storage\.googleapis\.com|’rn r’storage\.googleapis\.com/([a-z0-9-._]+)|’rn r'([a-z0-9-._]+)\.commondatastorage\.googleapis\.com|’rn r’commondatastorage\.googleapis\.com/([a-z0-9-._]+)’,rn re.IGNORECASErn )rnrn for match in pattern.finditer(line):rn # The first non-empty group is the bucket namern if raw_bucket := next((g for g in match.groups() if g is not None), None):rn if sanitized_bucket := sanitize_bucket_name(raw_bucket):rn all_buckets.add(sanitized_bucket)rnrn return all_bucketsrnrnrndef main(filename: str) -> None:rn with open(filename, ‘r’) as f:rn for i, line in enumerate(f, 1):rn bucket_names = extract_bucket_names(line)rn for bucket_name in bucket_names:rn response = check_bucket(bucket_name)rn if response.status_code == 404:rn print(f”Dangling bucket found: {bucket_name} (line {i}), {line}”)rnrnrnif __name__ == “__main__”:rn if len(sys.argv) != 2:rn print(“Usage: python find_dangling_buckets.py <filename>”)rn sys.exit(1)rn rn main(sys.argv[1])’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef76a4bb3a0>)])]>

Please be aware that this script and recommendations can only find hardcoded references, not those generated dynamically during runtime. Also, codebase might have hardcoded bucket names that do not follow the pattern but are being used by Google Cloud Storage clients.

Step 4: Reclaim and secure

If you find a dangling bucket name that might represent security risk to you or your clients, act fast.

If you do not own the dangling bucket:

Use all available data from the previous step to find dangling buckets and remove any hardcoded references in your code or documentation. Deploy the fix to your users to permanently resolve the issue.

If you own the dangling bucket:

Reclaim the name: Create a new storage bucket with the exact same name in a secure project you control. This prevents an attacker from claiming it.
Lock it down: Apply a restrictive IAM policy to the reclaimed bucket. Deny all access to allUsers and allAuthenticatedUsers and enable Uniform Bucket-Level Access. Enable Public Access Prevention control to turn the bucket into a private “sinkhole.”

By building these practices into your development lifecycle and operational procedures, you can effectively close the door on dangling bucket takeovers. Securing your cloud environment is a continuous process, and these steps will add powerful layers of protection for you and your users.

To learn more about managing storage buckets, you can review our documentation here.

Read More for the details.

2025 08 07

GCP – How Google and NASA are testing AI for medical care in space

Tibor Kiss Cloud, Google Cloud gcp

As NASA embarks on a new era of human spaceflight, beginning with the Artemis campaign’s aim to return to the Moon, preparations are underway to ensure crew health and wellness. This includes exploring whether remote care capabilities can deliver detailed diagnoses and treatment options if a physician is not onboard or if real-time communication with Earth is limited. Supporting crew health through space-based medical care is becoming increasingly important as NASA missions venture deeper into space.

To address this challenge, Google and NASA collaborated on an innovative proof-of-concept for an automated Clinical Decision Support System (CDSS) known as the “Crew Medical Officer Digital Assistant” (CMO-DA). Designed to assist astronauts with medical help during extended space missions, this multi-modal interface leverages AI. The goal is to potentially support human exploration of the Moon, Mars, and beyond.

The CMO-DA tool could help astronauts autonomously diagnose and treat symptoms when crews are not in direct contact with Earth-based medical experts. Trained on spaceflight literature, the AI system uses cutting-edge natural language processing and machine learning techniques to safely provide real-time analyses of crew health and performance. The tool is designed to support a designated crew medical officer or flight surgeon in maintaining crew health and making medical decisions driven by data and predictive analytics.

Initial trials tested CMO-DA on a wide range of medical scenarios. Outputs were measured using the Objective Structured Clinical Examination framework, a tool used to evaluate the clinical skills of medical students and working healthcare professionals. Early results showed promise for reliable diagnoses based on reported symptoms. Google and NASA are now collaborating with medical doctors to test and refine the model, aiming to enhance autonomous crew health and performance during future space exploration missions.

This innovative system isn’t just about supporting space exploration; it’s about pushing the boundaries of what’s possible with AI to provide essential care in the most remote and demanding environments. This tool represents an important milestone for AI assisted medical care and our continued exploration of the cosmos. It holds potential for advancing space missions and could also benefit people here on Earth by providing early access to quality medical care in remote areas.

At Google Public Sector, we’re passionate about supporting your mission. Learn more about how Google’s AI solutions can empower your agency and register to attend our Google Public Sector Summit taking place October 29, 2025, in Washington, D.C. There, you will have an opportunity to hear from public sector leaders and industry experts, and get hands-on with Google’s latest AI technologies.

Read More for the details.

2025 08 07

GCP – Supercharge your AI: GKE inference reference architecture, your blueprint for production-ready inference

Tibor Kiss Cloud, Google Cloud gcp

The age of AI is here, and organizations everywhere are racing to deploy powerful models to drive innovation, enhance products, and create entirely new user experiences. But moving from a trained model in a lab to a scalable, cost-effective, and production-grade inference service is a significant engineering challenge. It requires deep expertise in infrastructure, networking, security, and all of the Ops (MLOps, LLMOps, DevOps, etc.).

Today, we’re making it dramatically simpler. We’re excited to announce the GKE inference reference architecture: a comprehensive, production-ready blueprint for deploying your inference workloads on Google Kubernetes Engine (GKE).

This isn’t just another guide; it’s an actionable, automated, and opinionated framework designed to give you the best of GKE for inference, right out of the box.

Start with a strong foundation: The GKE base platform

Before you can run, you need a solid place to stand. This reference architecture is built on the GKE base platform. Think of this as the core, foundational layer that provides a streamlined and secure setup for any accelerated workload on GKE.

Built on infrastructure-as-code (IaC) principles using Terraform, the base platform establishes a robust foundation with the following:

Automated, repeatable deployments: Define your entire infrastructure as code for consistency and version control.
Built-in scalability and high availability: Get a configuration that inherently supports autoscaling and is resilient to failures.
Security best practices: Implement critical security measures like private clusters, Shielded GKE Nodes, and secure artifact management from the start.
Integrated observability: Seamlessly connect to Google Cloud Observability for deep visibility into your infrastructure and applications.

GKE inference architecture observability

Starting with this standardized base ensures you’re building on a secure, scalable, and manageable footing, accelerating your path to production.

Why the inference-optimized platform?

The base platform provides the foundation, and the GKE inference reference architecture is the specialized, high-performance engine that’s built on top of it. It’s an extension that’s tailored specifically to solve the unique challenges of serving machine learning models.

Here’s why you should start with our accelerated platform for your AI inference workloads:

1. Optimized for performance and cost

Inference is a balancing act between latency, throughput, and cost. This architecture is fine-tuned to master that balance.

Intelligent accelerator use: It streamlines the use of GPUs and TPUs, so you can use custom compute classes to ensure that your pods land on the exact hardware they need. With node auto-provisioning (NAP), the cluster automatically provisions the right resources, when you need them.
Smarter scaling: Go beyond basic CPU and memory scaling. We integrate a custom metrics adapter that allows the Horizontal Pod Autoscaler (HPA) to scale your models. Scaling is based on real-world inference metrics like queries per second (QPS) or latency, ensuring you only pay for what you use.

Faster model loading: Large models mean large container images. We leverage the Container File System API and Image streaming in GKE along with Cloud Storage FUSE to dramatically reduce pod startup times. Your containers can start while the model data streams in the background, minimizing cold-start latency.

2. Built to scale any inference pattern

Whether you’re doing real-time fraud detection, batch processing analytics, or serving a massive frontier model, this architecture is designed to handle it. It provides a framework for the following:

Real-time (online) inference: Prioritizes low-latency responses for interactive applications.
Batch (offline) inference: Efficiently processes large volumes of data for non-time-sensitive tasks.
Streaming inference: Continuously processes data as it arrives from sources like Pub/Sub.

The architecture leverages GKE features like the cluster autoscaler and the Gateway API for advanced, flexible, and powerful traffic management that can handle massive request volumes gracefully.

3. Simplified operations for complex models

We’ve baked in features to abstract away the complexity of serving modern AI models, especially LLMs. The architecture includes guidance and integrations for advanced model optimization techniques such as quantization (INT8/INT4), tensor and pipeline parallelism, and KV Cache optimizations like Paged and Flash Attention.

Furthermore, with GKE in Autopilot mode, you can offload node management entirely to Google, so you can focus on your models, not your infrastructure.

Get started today!

Ready to build your inference platform on GKE? The GKE inference reference architecture is available today in the Google Cloud Accelerated Platforms GitHub repository. The repository contains everything that you need to get started, including the Terraform code, documentation, and example use cases.

We’ve included examples for deploying popular workloads like ComfyUI and a general-purpose online inference with GPUs and TPUs to help you get started quickly.

By combining the rock-solid foundation of the GKE base platform with the performance and operational enhancements of the inference reference architecture, you can deploy your AI workloads with confidence, speed, and efficiency. Stop reinventing the wheel and start building the future on GKE.

The future of AI on GKE

The GKE inference reference architecture is more than just a collection of tools, it’s a reflection of Google’s commitment to making GKE the best platform for running your inference workloads. By providing a clear, opinionated, and extensible architecture, we are empowering you to accelerate your AI journey and bring your innovative ideas to life.

We’re excited to see what you’ll build with the GKE inference reference architecture. Your feedback is welcome! Please share your thoughts in the GitHub repository.

Experts from Google Cloud Tech talk through the GKE inference reference architecture.

Read More for the details.

2025 08 06

GCP – Accelerating FedRAMP 20x: How Google Cloud is automating compliance

Tibor Kiss Cloud, Google Cloud gcp

Google is committed to helping federal agencies meet their mission, more securely and more efficiently, with innovative cloud technologies. Today, we’re reinforcing our commitment to FedRAMP 20x, an innovative pilot program that marks a paradigm shift in federal cloud authorization. FedRAMP 20x is a new assessment process designed to move away from traditional narrative-based requirements towards continuous compliance and automated validation of machine-readable evidence. Our approach is built around Google Cloud Compliance Manager (now available for public preview) and is designed to transform the path to FedRAMP authorization for our partners and customers.

Compliance Manager accelerates the FedRAMP authorization process by automating end to end management of compliance for partners and customers building on Google Cloud. By providing automated, externally validated cloud controls to demonstrate compliance with FedRAMP 20x Key Security Indicators (KSIs), Compliance Manager allows partners to spend fewer resources manually collecting evidence and is designed to reduce the time required to achieve FedRAMP authorization. Compliance Manager will natively support FedRAMP 20x compliance with general availability later this year.

During a recent proof of concept demonstration to the FedRAMP Program Management Office (PMO), Google showcased how Compliance Manager enables strategic Google Cloud partners such as stackArmor to submit applications for 20x Phase One authorization and beyond.

Google Cloud’s latest capabilities are an exciting step forward in accelerating the FedRAMP 20x cloud-native approach to security assessment and validation. We need true innovation from industry to realize this vision of automated security and Google Cloud is leading the way by building it natively into their platform. As Google goes to market in support of FedRAMP 20x, we can’t help but wonder who’s next?

Pete Waterman

Director, FedRAMP

Compliance Manager’s ability to automate KSI compliance is also being assessed by Coalfire, a FedRAMP recognized Third Party Assessment Organization (3PAO). Coalfire is providing independent validation that agencies can benefit from a much faster, more automated path to deploying secure Google Cloud solutions, directly accelerating their access to critical cloud technologies.

Google is dedicated to accelerating federal compliance through both the existing FedRAMP Rev5 authorization path and the pilot FedRAMP 20x process. Recent Rev5 High authorizations for Google Cloud services including Agent Assist, Looker (Google Cloud core), and Vertex AI Vector Search.

If you are spending more effort than expected on compliance and audits, you can get started with Compliance Manager and streamline compliance and audits for your organization. Want to learn more? Register for the Google Public Sector Summit on October 29, 2025, in Washington, D.C., where you will gain crucial insights and skills to navigate this new era of innovation and harness the latest cloud technologies.

Read More for the details.

2025 08 06

GCP – BigQuery under the hood: Short query optimizations in the advanced runtime

Tibor Kiss Cloud, Google Cloud gcp

In a prior blog post, we introduced BigQuery’s advanced runtime, where we detailed enhanced vectorization and discussed techniques like dictionary and run-length encoded data, vectorized filter evaluation and compute pushdown, and parallelizable execution.

This blog post dives into short query optimizations, a key component of BigQuery’s advanced runtime. These optimizations can significantly speed up the “short” queries all while using fewer BigQuery “slots” (our term for computational capacity). They are commonly used by business intelligence (BI) tools such as LookerStudio or custom applications powered by BigQuery.

Similar to other BigQuery optimization techniques, the system uses a set of internal rules to determine if it should consolidate a distributed query plan into a single, more efficient step for short queries. These rules consider factors like:

The estimated amount of data to be read
How effectively the filters are reducing the data size
The type and physical arrangement of the data in storage
The overall query structure
The runtime statistics of past query executions

Along with enhanced vectorization, short query optimizations are an example of how we work to continuously improve performance and efficiency for BigQuery users.

Optimizations specific to short query execution

BigQuery’s short query optimizations dramatically speed up short, eligible queries to significantly reduce slot usage and improve query latency. Normally, BigQuery breaks down queries into multiple stages, each with smaller tasks processed in parallel across a distributed system. However, for suitable queries, short query optimizations skip this multi-stage distributed execution, leading to substantial gains in performance and efficiency. When possible, it also uses multithreaded execution, including for queries with joins and aggregations. For these queries, BigQuery automatically determines if a query is eligible and dispatched to a single stage. BigQuery also employs history-based optimization (HBO), which learns from past query executions. HBO helps BigQuery decide whether a query should run in a single stage or multiple stages based on its historical performance, ensuring the single stage approach remains beneficial even as workloads evolve.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebccb70bf10>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

Full view of data

Short query optimizations process the entire query as a single stage, giving the runtime complete visibility into all tables involved. This allows the runtime to read both sides of the join and gather precise metadata, including column cardinalities. For instance, join columns in the query below have a low cardinality, so it is efficiently stored using dictionary and run-length encodings (RLE). Consequently, the runtime devises a much simpler query execution plan by leveraging encoding during execution.

The following query calculates top-ranked products and their performance metrics for an e-commerce scenario. It’s based on a type of query observed in Google’s internal data pipelines that benefited from short query optimizations. The following query uses this BigQuery public dataset, allowing you to replicate the results. Metrics throughout this blog were captured during internal Google testing.

code_block: <ListValue: [StructValue([(‘code’, ‘WITHrn AllProducts AS (rn SELECTrn id,rn name AS product_namern FROMrn `bigquery-public-data.thelook_ecommerce.products`rn ),rn AllUsers AS (rn SELECTrn idrn FROMrn `bigquery-public-data.thelook_ecommerce.users`rn ),rn AllSales AS (rn SELECTrn oi.user_id,rn oi.sale_price,rn ap.product_namern FROMrn `bigquery-public-data.thelook_ecommerce.order_items` AS oirn INNER JOIN AllProducts AS aprn ON oi.product_id = ap.idrn INNER JOIN AllUsers AS aurn ON oi.user_id = au.idrn ),rn ProductPerformanceMetrics AS (rn SELECTrn product_name,rn ROUND(SUM(sale_price), 2) AS total_revenue,rn COUNT(*) AS units_sold,rn COUNT(DISTINCT user_id) AS unique_customersrn FROMrn AllSalesrn GROUP BYrn product_namern ),rn RankedProducts AS (rn SELECTrn product_name,rn total_revenue,rn units_sold,rn unique_customers,rn RANK() OVER (ORDER BY total_revenue DESC) as revenue_rankrn FROMrn ProductPerformanceMetricsrn )rnSELECTrn revenue_rank,rn product_name,rn total_revenue,rn units_sold,rn unique_customersrnFROMrn RankedProductsrnORDER BYrn revenue_rankrnLIMIT 25;’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ebccb720670>)])]>

By skipping the shuffle layer, overall query execution requires less CPU, memory, and network bandwidth. In addition to that, short query optimizations take full advantage of enhanced vectorization described in the Understanding BigQuery enhanced vectorization blog post.

Figure 2_ One stage plan of query from figure 1 — Figure 2: Internal Google testing: One stage plan for internal testing of query from Figure 1.

Queries with joins and aggregations

In data analytics, it’s common to join data from several tables and then calculate aggregate results. Typically, a query performing this distributed operation will go through many stages. Each stage can involve shuffling data around, which adds overhead and slows things down. BigQuery’s short query optimizations can dramatically improve this process. When enabled, BigQuery intelligently recognizes if the amount of data being queried is small enough to be handled by a much simpler plan. This optimization leads to substantial improvements: for the query described in Figure 3, during internal Google testing we observed 2x to 8x faster execution times and an average of 9x reduction in slot-seconds.

code_block: <ListValue: [StructValue([(‘code’, “SELECTrn p.category,rn dc.name AS distribution_center_name,rn u.country AS user_country,rn SUM(oi.sale_price) AS total_sales_amount,rn COUNT(DISTINCT o.order_id) AS total_unique_orders,rn COUNT(DISTINCT o.user_id) AS total_unique_customers_who_ordered,rn AVG(oi.sale_price) AS average_item_sale_price,rn SUM(CASE WHEN oi.status = ‘Complete’ THEN 1 ELSE 0 END) AS completed_order_items_count,rn COUNT(DISTINCT p.id) AS total_unique_products_sold,rn COUNT(DISTINCT ii.id) AS total_unique_inventory_items_soldrnFROMrn `bigquery-public-data.thelook_ecommerce.orders` AS o,rn `bigquery-public-data.thelook_ecommerce.order_items` AS oi,rn `bigquery-public-data.thelook_ecommerce.products` AS p,rn `bigquery-public-data.thelook_ecommerce.inventory_items` AS ii, `bigquery-public-data.thelook_ecommerce.distribution_centers` AS dc, `bigquery-public-data.thelook_ecommerce.users` AS urnWHERErno.order_id = oi.order_id AND oi.product_id = p.id AND ii.product_distribution_center_id = dc.id AND oi.inventory_item_id = ii.id AND o.user_id = u.idrnGROUP BYrn p.category,rn dc.name,rn u.countryrnORDER BYrn total_sales_amount DESCrnLIMIT 1000;”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ebccb7208b0>)])]>

We can see how the execution graph changes when short query optimizations are applied to the query in Figure 3.

Figure 4_ Join-aggregate query using distributed execution mode — Figure 4: Internal Google testing: execution the join-aggregate query in 9 stages using BigQuery distributed execution.

Figure 5_ join-aggregate query using a single stage — Figure 5: Internal Google testing: with advanced runtime short query optimizations, join-aggregate query completes in 1 stage.

Optional job creation

Short query optimizations and optional job creation mode are two independent yet complementary features that enhance query performance. While optional job creation mode contributes significantly to the efficiency of short queries regardless of short query optimizations, they work even better together. When both are enabled, the advanced runtime streamlines internal operations and utilizes the query cache more efficiently, which leads to even faster delivery of the results.

Better throughput

By reducing the resources required for queries, short query optimizations not only deliver performance gains, but also significantly improves the overall throughput. This efficiency means that more queries can be executed concurrently within the same resource allocation.

The following graph, captured from Google internal data pipeline, shows an example query that benefits from short query optimizations. The blue line shows the maximum QPS (or throughput) that can be sustained. The red line shows QPS on the same reservation after advanced runtime was enabled. In addition to better latency, the same reservation can now handle over 3x higher throughput.

Figure 6_ Throughput with and without advanced runtime — Figure 6: Internal Google testing: Throughput comparison, red line shows improvements from short query optimizations in the advanced runtime

Optimal impact

BigQuery’s short query optimizations feature is designed primarily for BI queries intended for human-readable output. Even though BigQuery utilizes a dynamic algorithm to determine eligible queries, it works with other performance-enhancing features like history-based optimizations, BI Engine, optional job creation, etc. Nevertheless, some workload patterns will benefit from short query optimizations more than others.

This optimization may not significantly improve queries that read or produce a lot of data. To optimize for short query performance, it is crucial to keep the query working set and result size small through pre-aggregation and filtering. Implementing partitioning and clustering strategies appropriate to your workload can also significantly reduce the amount of data processed, and utilizing the optional job creation mode is beneficial for short-lived queries that can be easily retried.

Short query optimizations in action

Let’s see how these optimizations actually impact our test query by looking closer at the query in Figure 1 and its query plan in Figure 2. The query shape is based on actual workloads observed in production and made to work against a BigQuery public dataset, so you can test it for yourself.

Despite the query scanning only 6.5 MB, running this query without advanced runtime takes over 1 second and consumes about 20 slot-seconds (execution time may vary depending on available resources in the project).

Figure 7_ Sample query execution details without advanced runtime — Figure 7: Internal Google testing: Sample query execution details without Advanced Runtime

With BigQuery’s enhanced vectorization in the advanced runtime, during internal Google testing this query finishes in 0.5 seconds while consuming 50x less resources.

Figure 8_ Sample query with Advanced Runtime — Figure 8: Internal Google testing: Sample query execution details with Advanced Runtime Short Query Optimizations

The magnitude of improvement here is less common, showing an example of real workload improvement from Google internal pipelines. We have also seen classic BI queries with several aggregations, filters, group by and sort, or snowflake joins achieve faster performance and better slot utilization.

Try it for yourself

Short query optimizations boost query price/performance, allowing for higher throughput and lower latencies for common BI small queries. It achieves this by combining cutting-edge algorithms with Google’s latest innovations across storage, compute, and networking. This is just one of many performance improvements that we’re continually delivering to BigQuery customers like enhanced vectorization, history based optimizations, optional-job-creation mode, column metadata index (CMETA) and others.

Now that both key pillars of advanced runtime are in public preview, all you have to do to test it with your workloads is to enable it using the single ALTER PROJECT command as documented here. This enables both enhanced vectorization and short query optimizations. If you already did that earlier for enhanced vectorization, your project is automatically also enabled for short query optimizations.

Try it now with your own workload following steps in BigQuery advanced runtime documentation here, and share your feedback and experience with us at bqarfeedback@google.com.

Read More for the details.

2025 08 06

GCP – Immutable, Air-Gapped, and Integrated: Data Protection for your Cloud SQL instances just got better

Tibor Kiss Cloud, Google Cloud gcp

In a world where data is your most valuable asset, protecting it isn’t just a nice-to-have — it’s a necessity. That’s why we are thrilled to announce a significant leap forward in protecting the data in your Cloud SQL instances, with Enhanced Backups for Cloud SQL.

This powerful new capability integrates Google Cloud Backup and DR Service directly into Cloud SQL, providing a robust, centralized, and secure solution to help ensure business continuity for your database workloads. The Backup and DR Service already protects Compute Engine VMs, Persistent Disks, and Hyperdisk, extending its ability to protect all of your workloads.

Modern defense for modern threats

Enhanced Backups for Cloud SQL provides advanced protection by storing database backups in logically air-gapped and immutable backup vaults. Managed by Google and completely separate from your source project, these vaults provide a critical defense against threats that could compromise your entire environment.

For customers like JFrog, Cloud SQL Enhanced Backup with Google Cloud Backup and DR is proving to be a superior and robust alternative:

“Using this integration will help us significantly bolster our security posture by offering logically air-gapped and immutable backup vaults, creating a vital defense layer against diverse data-loss scenarios.” – Shiran Melamed, DevOps Group Leader, JFrog

Control, compliance, and peace of mind

We designed Enhanced Backups to be both powerful and easy to use, giving you fine-grained control over your data protection strategy. These capabilities are now available in Preview for both Cloud SQL Enterprise and Enterprise Plus editions, and offer key features to help ensure your data is always secure and recoverable:

Immutable, air-gapped vaults: Protect your data with immutable backups stored in a secure, logically air-gapped vault. Setting minimum enforced retention and retention locks ensure backups cannot be deleted or changed for a predefined period, while a zero-trust access policy provides granular control.
Business continuity: Your data is safeguarded against both source-instance and source-project deletion, so you can recover your data even if the source project itself becomes unavailable.
Flexible policies that fit your needs: Your business isn’t one-size-fits-all, and your backup strategy shouldn’t be either. We offer highly customizable backup schedules, including hourly, daily, weekly, monthly, and yearly options. You can store backups for periods ranging from days to decades.
Centralized command and control: Manage everything from a single, unified dashboard in the Google Cloud console. Monitor job status, identify unprotected resources, and generate reports, all in one place.

But you don’t have to take our word for it. See how customers like SQUARE ENIX and Rotoplas are already benefiting from Enhanced Backups for Cloud SQL:

“At SQUARE ENIX, protecting our users’ data is paramount. Google Cloud SQL’s Enhanced Backup integrated with the Backup and DR service is essential to our resiliency strategy. Its robust protection against instance- and even project-level deletion, combined with a secure, isolated vault and long-term retention, provides a critical safeguard for our most valuable asset. This capability will give us confidence in our data’s integrity and recoverability, allowing our teams to focus on creating the unforgettable experiences our users expect.” – Kazutaka Iga, SRE,SQUARE ENIX

“Google Cloud SQL’s Enhanced Backup feature along with Google Professional Services support is a value add to our backup strategy at Rotoplas. The ability to centralize management, flexibly schedule backups, and store them independent of the source project gives us unprecedented control. This streamlined approach simplifies our operations and enhances security, ensuring our data is always protected and easily recoverable.” – Agustín Chávez Cabrera, Devops manager, Rotoplas

Get started with Enhanced Backups

Getting started with Enhanced Backups is simple. Here’s how you can enable this enhanced protection for your Cloud SQL instances:

1. Create or select a backup vault: In the Backup and DR service, either create a new backup vault or use an existing one.

2. Create a backup plan: Define a backup plan for Cloud SQL within your chosen backup vault, setting your desired backup frequency and retention rules.

3. Apply the backup plan to the Cloud SQL instances: Apply your new backup plan to existing or new Cloud SQL instances.

Once you apply a backup plan, your backups will automatically be scheduled and moved to the secure backup vault based on the rules you defined. The entire experience can be managed through the tools you already use — whether it’s the Google Cloud console, gcloud command-line tool, or APIs — so there’s no additional infrastructure for you to deploy or manage.

Protect your data now

With Enhanced Backups for Cloud SQL, you can build a superior data protection strategy that enhances security, simplifies operations, and strengthens your overall data resilience for Cloud SQL instances.

Get started and use it yourself. The new features are available now in supported regions.

Experience the new management solution in the console.
Watch this demo video and see the new features in action.
Explore the documentation to learn more about Enhanced Backups for Cloud SQL, disk backups, and VM backups. today.

Read More for the details.

2025 08 06

GCP – Wayfair Transforms Contact Center Operations with ChromeOS and ChromeOS Flex

Tibor Kiss Cloud, Google Cloud gcp

Editor’s note: Today’s post is by Asad Rahman, Director of IT for Wayfair, a leading e-commerce company specializing in furniture and home goods. Wayfair chose ChromeOS and ChromeOS Flex devices to support contact center staff, improve productivity, and streamline security tasks.

“ChromeOS is a natural fit for how Wayfair operates today. We’ve embraced Google Workspace and Gemini to power collaboration and insights, and our teams are fast-moving and globally distributed. ChromeOS keeps pace with us, providing a modern infrastructure that’s simple, scalable, and streamlined.”

Snapshot:

Easy deployment across multiple setups: Successfully transitioned 127 employees across 150+ devices with ChromeOS Flex.
Cost savings: Potential savings of over $1.2 million over the next 6 months.
Improved security: ChromeOS layered security model, automatic updates, and sandboxing have dramatically reduced the attack surface.

As the destination for all things home, Wayfair’s goal is to make it easy for customers to create a home that is just right for them. Delivering on that promise for millions of customers and an online catalog of over 30 million items requires scalable and secure technology that empowers our global teams. ChromeOS has been a reliable partner in our operations for years, powering digital displays in our offices and serving our retail and grab-and-go spots. We are now building on that success by expanding our ChromeOS deployment to our call centers, further enhancing the productivity and collaboration that are crucial to providing a seamless customer experience.

Recognizing the need for change

For years, we wrestled with a frustrating paradox. We had fully embraced Google Workspace leveraging tools like Gemini and NotebookLM. Yet, our endpoints were still stuck on a different platform. We had a vision to provide a cost-efficient, cloud-based, remote, secure environment that would eliminate expensive VDI licenses and complex engineering overhead, improve user experience, and align to a zero-trust architecture for future workforce expansion. Our users needed to collaborate seamlessly and do their jobs efficiently, but the solution we had just wasn’t cutting it. Sales and service agents, along with our BPO partners, faced daily performance and reliability challenges. Log-in times were a nightmare, often upwards of 10 minutes in our virtualized environments, and live updates would bog down productivity in the middle of the workday. Our end-users were constantly asking for something simpler, something that just worked.

On the IT side, we were stretched thin. Most of our resources and time were spent maintaining our stack instead of innovating. A reduction in personnel coupled with the departure of our lead IT architect further underscored the need for a solution that was inherently easier to manage and less resource-intensive. We knew we needed a change, and we needed it fast.

Easy, secure, and built for our future

The decision to lean into ChromeOS for our call centers felt like a natural evolution. Given our existing familiarity in other areas, and our grounding in Google Workspace, it just made sense. We needed a user experience that was intuitive and easy while being simple to manage for our IT team. The security and ease of deployment, while not our initial driving force, quickly became a critical factor once we saw the potential for a broader rollout.

We decided to launch ChromeOS in our Georgia based call center. The good news? We could leverage our existing devices by simply deploying ChromeOS Flex. This meant a smoother transition with less capital expenditure. We successfully transitioned 127 employees across over 150 devices.

Easy deployment across multiple setups

The deployment process was remarkably easy. Our call center agents were already somewhat familiar with the ChromeOS interface, which certainly helped. ChromeOS worked seamlessly with tools that our users depend on, like voice over the internet communication and workplace management tools. Additionally, we can rest assured that our business processing outsource partners have secure remote access to company resources since ChromeOS can work with various hardware solutions, like BYOD. The feedback we received has been stellar. We are now expanding beyond our in-office agents and deploying ChromeOS to our remote workers—working both stateside and globally.

$1.2 million in savings

The great news is that our initial estimates for moving to ChromeOS are looking even better than we projected! We’re now looking at a potential savings of more than $1.2 million over the next 6 months. With ChromeOS, we’ve also successfully reduced our fixed costs, meaning we actually see savings when we reduce users, a significant improvement. Plus, we’re no longer tied into those long-term, expensive, iron-clad contracts, giving us much more flexibility.

This transition has been great for our IT team as well. Previously, the engineering effort to keep everything running was pretty complex and expensive, but ChromeOS requires fewer resources from them. This is a stark contrast to our old system, where users understandably faced frustration with performance slowdowns, especially audio and lag issues. Those problems resulted in a huge number of support issues—we were dealing with over 7,500 tickets a year! The shift to ChromeOS has streamlined things for everyone.

Don’t send me back: The unmistakable benefits of ChromeOS

The results have been nothing short of transformative. Overall feedback was positive, with users praising ChromeOS simplicity and speed, and the elimination of VDI screen switching. For years, VDI issues were consistently one of our top three IT tickets. Since moving to ChromeOS, those tickets have been reduced to none. Now, if there’s an issue, it’s almost always user error, not a problem with the underlying technology.

I’ve heard direct quotes from our users like, “Don’t send me back!” That’s the kind of enthusiasm that validates every decision we made. Log-in times are now measured in seconds, not the agonizing 10 minutes they once were. We’ve also seen less latency on certain applications, which has made a real difference in our agents’ ability to serve our customers efficiently.

And when it comes to security? ChromeOS security model, automatic updates, and sandboxing have dramatically reduced our attack surface. Our security team has provided positive feedback, noting the peace of mind that comes with knowing devices are inherently more secure, with less manual intervention required from their end. The built-in protection is a game-changer for a company of our size and with our current IT resources.

Looking ahead, we’ll begin assessing the remaining call center population, with the goal of replacing all legacy laptops and desktops with ChromeOS by the end of year. Embracing ChromeOS has been one of the best decisions we’ve made. It’s not just about simpler IT; it’s about empowering our employees, improving productivity, and building a more secure and resilient foundation for Wayfair, one “Don’t send me back,” at a time.

Read More for the details.

2025 08 06

GCP – Redefining enterprise data with agents and AI-native foundations

Tibor Kiss Cloud, Google Cloud gcp

The world is not just changing; it’s being re-engineered in real-time by data and AI. The way we interact with data is undergoing a fundamental transformation, moving beyond human-led analysis to a collaborative partnership with intelligent agents. This is the agentic shift, a new era where specialized AI agents work autonomously and cooperatively to unlock insights at a scale and speed that was previously unimaginable. At Google Cloud, we’re not just participants in this shift — we are building the core intelligence, interconnected ecosystems, and AI-native data platforms that power it.

To make this agentic reality possible, you need a different kind of data platform — not a collection of siloed tools, but a single, unified, AI-native cloud. That’s Google’s Data Cloud. At its heart are our unified analytical and operational engines, that are removing the historic divide between business transaction data and strategic analysis. Google Data Cloud provides agents with a complete, real-time understanding of the business, transforming it from a collection of processes into a self-aware, self-tuning, reliable organization.

Today, we are delivering major innovations across three key areas that bring this vision to life:

A new suite of data agents: specialized AI agents designed to act as expert partners for every data user, from data scientists and engineers to business analysts.
An interconnected network for agent collaboration: a suite of APIs, tools, and protocols that allow developers to integrate Google agents with their own agents and AI efforts creating a single, intelligent ecosystem.
A unified, AI native foundation: a platform that enables intelligent agents by unifying data, providing persistent memory, and embedding AI-driven reasoning.

Specialized data agents as expert partners

The agentic era begins with a new workforce of specialized AI agents, providing an AI-native interface to turn intent into action.

For data engineers: We are introducing the Data Engineering Agent (Preview) in BigQuery to simplify and automate complex data pipelines. You can now use natural language prompts to streamline the entire workflow, from data ingestion from sources like Google Cloud Storage to performing transformations and maintaining data quality. Simply describe what you need — “Create a pipeline to load a CSV file, cleanse these columns, and join it with another table” — and the agent generates and orchestrates the entire workflow.

Fig. 1 – Data engineering agent for automation of complex data pipelines

For data scientists: We are reimagining an AI-first Colab Enterprise Notebook experience available in BigQuery and Vertex AI, featuring a new Data Science Agent (Preview). Powered by Gemini, the Data Science Agent triggers entire autonomous analytical workflows, including exploratory data analysis (EDA), data cleaning, featurization, machine learning predictions, and much more. It creates a plan, executes the code, reasons about the results, and presents its findings, all while allowing you to provide feedback and collaborate in sync.

Fig. 2 – Data science agent to transform each stage of data science tasks

For business users and analysts: Last year, we introduced the Conversational Analytics Agent, empowering users to get answers from their data using natural language. Today, we’re taking that agent to the next level, with our Code Interpreter (Preview). This enhancement supports the many critical business questions that go beyond what simple SQL can answer — for example, “Perform a customer segmentation analysis to group customers into distinct cohorts?” Powered by Gemini’s advanced reasoning capabilities, and developed in partnership with Google DeepMind, the Code Interpreter translates complex natural language questions into executable Python code. It delivers a complete analytical flow — generating code, providing clear natural language explanations, and creating interactive visualizations — all within the governed and secure environment of the Google Data Cloud.

Fig 3 – Conversational Analytics with Code Interpreter for advanced analysis

Building the interconnected agent ecosystem

The agentic ecosystem is not a closed loop; it’s an open platform for builders. The true potential of the agentic shift is realized when developers not only use existing agents, but also extend and connect them to their own intelligent systems, creating a broader network. Our first-party agents provide powerful, out-of-the-box capabilities as well as foundational building blocks including APIs, tools, and protocols to build custom agents, integrate conversational intelligence into existing applications, and orchestrate complex, multi-agent workflows that solve unique business problems.

To enable this, we are launching Gemini Data Agents APIs, with first being the new Conversational Analytics API (Preview). This API provides the building blocks to integrate Looker’s powerful natural language processing and Code Interpreter capabilities directly into your own applications, products, and workflows. This allows you to create unique, engaging, and accessible data experiences that meet your specific business needs.

Beyond conversational experiences, we are providing the tools to create custom agents from the ground up. Our new Data Agents API and the Agent Development Kit (ADK) allow you to build specialized agents tailored to your unique business processes. The foundation for all this secure interaction is our investment in Model Context Protocol (MCP), including the MCP Toolbox for Databases and the addition of the new Looker MCP Server (Preview).

Fig 4 – Gemini CLI querying semantic layer from Looker MCP Server

A unified and AI-native data foundation

Intelligent agents and the networks they form cannot operate on a traditional data stack. They need a cognitive foundation that unifies data from across the enterprise and provides new capabilities to understand meaning and provide a persistent memory to reason against.

A core requirement of this AI-native foundation is that it unifies live transactional and historical analytical data stored in OLTP and OLAP systems. We started down this path with a columnar engine for AlloyDB to supercharge analytics for PostgreSQL workloads. Today, we are extending that performance commitment to our flagship scale-out database with the new Spanner columnar engine (Preview); analytical queries on Spanner columnar engine perform up to 200× faster than on Spanner’s row store on the SSD tier — right on your transactional data. As part of our unified Data Cloud, this innovation directly benefits our analytical engine, BigQuery via Data Boost, which leverages the Spanner columnar engine to close the gap between transactional and analytical workloads and make it faster for BigQuery to analyze live operational data.

With this unified data plane in place, the next requirement is giving agents a comprehensive memory that is grounded in your company’s factual data. To ensure trustworthy agents and prevent hallucinations, they must use a technique called Retrieval-Augmented Generation (RAG). The foundation of effective RAG is vector search that spans both real-time operational data and deep historical, analytical data. This is why we embed vector search and generation capabilities directly into our data foundations — to give agents access to both transactional and analytical memory.

However, optimizing vector search is complex, often forcing developers to make tough trade-offs between performance, quality, and operational overhead. In AlloyDB AI, new capabilities like adaptive filtering (Preview) solve this for transactional memory, automatically maintaining vector indexes and optimizing for fast queries on live operational data. To provide deep analytical memory, we are also bringing autonomous vector embeddings and generation to BigQuery. Now, BigQuery can automatically prepare and index multimodal data for vector search, a crucial step in building a rich, long-term semantic memory for your agents.

Finally, on top of this unified and accessible data, we are embedding AI reasoning directly into our query engines. With the new AI Query Engine in BigQuery (Preview), all data practitioners can perform AI-powered computations on structured and unstructured data right inside BigQuery, quickly and easily getting answers to subjective questions like “Which of these customer reviews sound the most frustrated?”

AI Query Engine brings the power of LLMs directly to SQL

The future is agentic

The announcements today — from specialized agents for every user to the AI-native foundation that powers them — are more than just a roadmap. They are the building blocks for the new agentic enterprise. By bringing together a new workforce of intelligent agents, enabling them to collaborate within an open and interconnected network, and grounding them in a unified data cloud that erases the line between operational and analytical worlds, we are providing a platform that lets you be an innovator, not just an integrator. This is a fundamental shift in how your organization will interact with its data, moving from complex human-led analysis to a powerful partnership between your teams and intelligent agents. The agentic era is here. We are incredibly excited to see what you will build, and we invite you to join us on this journey to redefine what’s possible with data.

Read More for the details.

2025 08 06

GCP – Spanner columnar engine: Powering next-generation analytics on operational data

Tibor Kiss Cloud, Google Cloud gcp

For years, organizations have struggled with the workload conflict between online transaction processing (OLTP) and analytical query processing. OLTP systems such as Spanner are optimized for high-volume, low-latency transactions, and use row-oriented storage that’s efficient for individual record access. Analytical workloads, conversely, require rapid aggregations and scans across large datasets. These tasks are traditionally handled by separate data warehouses that employ columnar storage and incoming data pipelines from transaction systems. Separating OLTP and analytical workflows requires periodic data transfers, which often leads to stale data, complex ETL pipelines, and operational overhead.

Today, we’re thrilled to announce Spanner columnar engine, which brings new analytical capabilities directly to Spanner databases. Just as AlloyDB’s columnar engine enhanced PostgreSQL analytics, Spanner’s new columnar engine lets you analyze vast amounts of operational data in real-time, all while maintaining Spanner’s global consistency, high availability, and strong transactional guarantees — and without impacting transactional workloads.

The power of Spanner columnar engine helps organizations, such as Verisoul.ai, eliminate the problem of data silos typically found when combining high volume transaction systems with fast analytics. “Detecting fraud in real time is only half the story—showing customers the ‘why’ helps them act faster and turn trust into measurable ROI,” says Raine Scott and Niel Ketkar, founders of Verisoul.ai, a machine-learning platform that stops fake users and fraud. “Spanner’s new columnar engine allows high-velocity transactional writes and rich analytics in one place, eliminating data copies and replication lag so customers get instant answers.”

Columnar storage meets vectorized execution

1. Columnar architecture diagram — Figure: Spanner columnar engine architecture

The heart of the Spanner columnar engine is its innovative architecture, which combines columnar storage with vectorized query execution.

Columnar Storage in Spanner: Hybrid Architecture

Unlike traditional row-oriented storage, where an entire row is stored contiguously, columnar storage stores data column by column. This offers several advantages for analytical workloads:

Reduced I/O: Analytical queries often access only a few columns at a time. With columnar storage, only the relevant columns need to be read from disk, significantly reducing I/O operations.
Improved compression: Data within a single column is typically of the same data type and often exhibits similar storage patterns, leading to much higher compression ratios. This means more data can fit in memory and fewer bytes need to be read.
Efficient scans: When scanning a column, consecutive values can be processed together, for more efficient data processing.

Spanner columnar engine integrates a columnar format alongside its existing row-oriented storage. This unified transactional and analytical processing design allows Spanner to maintain its OLTP performance while accelerating analytical queries up to 200X on your live operational data.

Vectorized execution: turbocharging your queries

To complement columnar storage, the columnar engine makes use of Spanner’s vectorized execution capabilities. While traditional query engines process data tuple-by-tuple (row by row), a vectorized engine processes data in batches (vectors) of rows. This approach dramatically improves CPU utilization, with:

Reduced function call overhead: Instead of calling a function for each individual row, vectorized engines call functions once for an entire batch, significantly reducing overhead.
Optimized memory access: Vectorized processing often results in more cache-friendly memory access patterns, further boosting performance.

The combination of columnar storage and vectorized execution means that analytical queries on Spanner can run orders of magnitude faster, allowing for real-time insights on your global-scale data.

Better with BigQuery: Accelerating federated queries

The Spanner columnar engine takes its integration with Google’s Data Cloud ecosystem a step further, specifically enhancing integrations between Spanner and BigQuery. For enterprises that leverage BigQuery for data warehousing and analytics, federating queries directly to Spanner has always been a valuable capability. Now, with the Spanner columnar engine, this integration becomes even more potent, by delivering faster insights on operational data.

Data Boost, Spanner’s fully managed, elastically scalable compute service for analytical workloads, is at the forefront of this acceleration. When BigQuery issues a federated query to Spanner, and that query can benefit from columnar scans and vectorized execution, Data Boost automatically leverages the Spanner columnar engine. This provides:

Faster analytical insights: Complex analytical queries initiated from BigQuery that target your Spanner data execute significantly faster, bringing near-real-time operational data into your broader analytical landscape.
Reduced impact on OLTP: Data Boost helps ensure that analytical workloads are offloaded from your primary Spanner compute resources, preventing impact on transactional operations.
Simplified data architecture: You can get the best of both worlds – Spanner’s transactional consistency and BigQuery’s analytical prowess – without the need for complex ETL pipelines to duplicate data.

This integration empowers data analysts and scientists to combine Spanner’s live operational data with other datasets in BigQuery for richer, more timely insights and decision-making.

Columnar engine in action: Accelerating your analytical queries

Let’s look at some sample queries that should see significant acceleration with the Spanner columnar engine. These types of queries, common in analytical and graph workloads, benefit from columnar scans and vectorized processing.

Scenario: Imagine a large e-commerce database; for demonstration purposes, we’ll use the same schema as the TPC-H benchmark.

Query 1: Revenue from discounted shipments in a given Year

SQL

code_block: <ListValue: [StructValue([(‘code’, ‘@{scan_method=columnar}rnSELECTrn sum(l.l_extendedprice * l.l_discount) AS revenuernFROMrn lineitem lrnWHERErn l.l_shipdate >= date “1994-01-01″rn AND l.l_shipdate < date_add(date “1994-01-01”, INTERVAL 1 year)rn AND l.l_discount BETWEEN 0.08 – 0.01 AND 0.08 + 0.01rn AND l.l_quantity < 25;’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e333c7ada60>)])]>

Acceleration: This query heavily benefits from scanning only the l_shipdate, l_extendedprice, l_discount, and l_quantity columns from the lineitem table. Vectorized execution rapidly applies the date, discount, and quantity filters to identify qualifying rows.

Query 2: Total quantity of non-discounted items

SQL

code_block: <ListValue: [StructValue([(‘code’, ‘@{scan_method=columnar}rnSELECTrn sum(l_quantity)rnFROMrn lineitemrnWHERErn l_discount = 0;’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e333c325250>)])]>

Acceleration: This query heavily benefits from scanning only the l_discount and l_quantity columns from the lineitem table. Vectorized execution rapidly applies the equality filter (l_discount = 0) to identify matching rows.

Query 3: Item count and discount range for specific tax brackets..

SQL

code_block: <ListValue: [StructValue([(‘code’, ‘@{scan_method=columnar}rnSELECTrn count(*),rn min(l_discount),rn max(l_discount)rnFROMrn lineitemrnWHERErn l_tax IN (0.01, 0.02);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e333c325340>)])]>

Acceleration: This query benefits heavily from scanning only the l_tax and l_discount columns from the lineitem table. Vectorized execution rapidly applies the IN filter on the l_tax column to identify all matching rows.

Query 4: Scan friend relationships to find the N most connected people in the graph using Spanner Graph

GQL

code_block: <ListValue: [StructValue([(‘code’, ‘@{scan_method=columnar}rnGRAPH social_graphrnMATCH (p:Person)-[k:Knows]->(:Person)rnRETURN COUNT(k) AS friend_count GROUP BY p ORDER BY friend_count DESC LIMIT 10;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e334f6545b0>)])]>

Acceleration:This query benefits heavily from scanning subgraphs and by loading only relevant columns from the graph.

Query 5: Perform a K-nearest neighbor vector similarity search to retrieve the top 10 most semantically similar embeddings with perfect recall

GQL

code_block: <ListValue: [StructValue([(‘code’, ‘@{scan_method=columnar}rnSELECT e.Id as key, COSINE_DISTANCE(@vector_param, e.Embedding) as distancernFROM Embeddings ernORDER BY distance LIMIT 10;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e334f654e20>)])]>

Acceleration:This query benefits heavily from scanning contiguously stored vector embeddings and by loading only relevant columns from the table.

Get started with Spanner columnar engine today!

The Spanner columnar engine is designed for businesses looking to unlock faster, deeper, real-time insights from their operational data without compromising Spanner’s foundational strengths. We are incredibly excited about the possibilities this opens up for developers and data analysts alike. We invite you to be among the first to try the Spanner columnar engine. Request access to the Preview of Spanner columnar engine today by signing up at bit.ly/spannercolumnar. We look forward to seeing what you build!

Read More for the details.

2025 08 06

GCP – Announcing AI-first Colab notebook experience for Google Cloud

Tibor Kiss Cloud, Google Cloud gcp

At Google I/O 2025, we announced a new, reimagined AI-first Colab with agentic capabilities, making it a true coding partner that understands your current code, actions, intentions, and goals. Today, we are excited to bring these capabilities to Google Cloud BigQuery and Vertex AI via the Colab Enterprise notebook. Designed to simplify and transform data science and analytics workflows for organizations, the new capabilities in the Colab Enterprise notebook can:

Automate end-to-end data science workflows through the built-in Data Science Agent (DSA), which creates multi-step plans, generating and executing code, reasons about the results, and presents its findings.
Generate, explain and transform code, as well as explain errors and fix them automatically. It can also provide code assistance while you type.
Create visualizations from simple prompts.

Let’s take a closer look.

Simplify workflows with Data Science Agent

Data science can be complex, iterative, and time-consuming. You must first translate your business problem into a machine learning task, identify and clean raw data, transform it, train a model, evaluate it and then repeat the loop to optimize it. This requires skill and time. The Data Science Agent (DSA) in Colab accelerates data science development with agentic capabilities that facilitate data exploration, transformation and ML modeling.

You start with a simple prompt such as “Train a model to predict ‘income bracket’ from table bigquery-public-data.ml_datasets.census_adult_income“ in the notebook chat. The Data Science Agent then generates a detailed plan covering all aspects of data science modeling from data loading, exploration, cleaning, visualization, feature engineering, data splitting, model training/optimization and evaluation.

You can accept, cancel, or modify this plan. The generated code is executed on the Colab runtime. If the agent makes an error it can autocorrect and generate new code rectifying it. You maintain full control, approving each step and can make manual edits if desired. This iterative approach ensures transparency and trust.

The agent also has full contextual awareness of your notebook, understanding existing code, outputs, and variables to provide tailored code for each step of the plan, allowing you to also make iterative changes to your existing code.

1 DSA Simplifies Workflows — Data Science Agent helps simplify workflows

Once you are satisfied with the notebook you’ve co-developed with AI, you can then schedule it for automated runs, or use it in a multi-step DAG with BigQuery Pipelines.

Multi-cell code generation for anything you want to do with data

AI-first Colab Enterprise notebooks also support code generation for a wide range of tasks and follow the same interaction pattern as the Data Science Agent mentioned above. For example, using the chat interface you can prompt to:

Generate code for arbitrary Python-based data transformation, visualization, analytics (e.g., run a causal analysis).
Manage Colab environment (e.g., install new libraries).
Generate code for interacting with other Google Cloud services (e.g., manage a function deployment to Cloud Run).

The human-in-the-loop interaction design allows for approval, changes and editing of the generated code.

2 Code Generation — Code generation using the chat interface

You can also transform your existing code. Simply describe a change in natural language (e.g., “add error handling to this data loading function” or “refactor this monolithic function into smaller, more modular parts”) and the agent will identify and modify the relevant code for you.

Easy visualizations

The Python visualization ecosystem is rich with many choices such as Matplotlib, Seaborn, Plotly etc. While these already work well in Colab notebooks, using these libraries requires writing boilerplate code and high familiarity with the library to get a chart with good fit and finish.

AI-First Colab Notebooks excel in generating Python code for such visualizations. Simply start with a prompt like “Generate a chart displaying…” referencing your data source which can be a BigQuery table, a local Dataframe in Colab or even an uploaded file. Next just approve and run the code to have the visualization generated for you. To modify the visualization, for example, change axis to log axis or change color of a chart, simply prompt for the incremental changes and the agent will adjust the code to your needs

3 Easy Visualizations — Generate Python code for easy visualizations

Explaining and fixing errors

Colab has an built-in error explanation and fixing flow. If your AI generated or user authored code cell runs into an error, you can click the ‘Explain Error’ shortcut which opens the notebook chat, which explains the error and generates the remediation code in diff view for approval.

4 Explain and Fix Errors — Explain and fix errors

Fast and intelligent code completion

Code completion in Colab Enterprise offers implicit suggestions as you type, accelerating your workflow by reducing keystrokes. Accept suggestions with a tab or modify them.

5 code completion — Code completion in Colab Enterprise

Get started today

The AI-first Colab Enterprise with its Data Science Agent is transforming how data professionals work. Across BigQuery and Vertex AI, the Colab Enterprise experience is seamless and the notebooks created are interoperable, regardless of where they are created.

To access Colab Enterprise:

BigQuery: Navigate to Google Cloud Console > BigQuery > Notebook
Vertex AI: Navigate to Google Cloud Console > Vertex AI > Colab Enterprise.

The AI-first notebook experience with Data Science Agent is currently available in US and Asia regions in Preview and will be rolled out to other Google Cloud regions in the coming days.

If you have a feature request, a question on availability in your region or feedback, reach us at vertex-notebooks-previews-external@google.com or fill out this form.

Read More for the details.

2025 08 05

GCP – Google is a Leader in the 2025 IDC MarketScape: FinOps Cloud Costs Optimization

Tibor Kiss Cloud, Google Cloud gcp

Our customers come first, and we’ve focused on building FinOps tools that help them understand their cloud spend, optimize for efficiency, and prevent cost surprises. We’re excited to be recognized for this work, and named a leader in the 2025 IDC MarketScape for FinOps cloud cost optimization.

“This study evaluated the five global hyperscalers and their FinOps cloud cost optimization capabilities, assessing several dimensions within strategy and product capabilities. A strength of Google Cloud FinOps is its integration with Gemini, helping customers mature, automate, and accelerate cost optimization. This is in addition to the thought leadership Google has been demonstrating with its product strategy and driving industry support for open standards.” – Jevin Jensen, IDC Research Vice President and triple-certified FinOps practitioner and engineer

Here are the top 10 of our top FinOps innovations that are helping Google Cloud FinOps tools:

We stream net-cost data in real time, so your information stays current with actual cloud costs. 99% of that data arrives within 24 hours, and many services update several times a day.
We provide granular, sub-resource cost data out-of-the-box – without additional hoops to jump through, like agent installs – which means you can understand your cost drivers faster. For example, for more than two years, we have broken up Kubernetes costs into clusters, namespaces, and pods.
The FinOps Hub centralizes all cost optimization activities in one place, highlighting inefficiencies so business professionals can collaborate with development teams to drive meaningful change.

4. We integrated generative AI into FinOps workflows early, creating specialized business use cases for all users. This saves time when finding cost insights and optimization opportunities, with grounded answers to ensure accuracy and relevance.

3-GCA for FinOps — Gemini Cloud Assist for FinOps in action.

5. We focus on the FinOps user, and the rest follows. Over the years, we have built up an amazing group of FinOps practitioners we work closely with to evolve our FinOps products. We also have a FinOps executive advisory board that allows us to look forward and understand where the industry is evolving.

6. We believe we can make billing enjoyable. The microinteractions, zero states, guided tours, and elegant material design, all work together to create experiences that feel intuitive and Googley.

7. We provide customers a FinOps score to help you make data-informed decisions when building business cases for committed use discounts or identifying spend that needs better organization through tagging or budget coverage. Using this score you can see how you benchmark against peers.

4-FinOps Score — Google Cloud customers get their own FinOps score and can see how they compare with their peers.

8. We have fast cost-anomaly detection that runs hourly, with high precision. And we also offer root cause analysis information for our users to take action quickly.

9. We provide real-time scenario modelling for rate optimizations, managing terabytes of data in our UI quickly and easily. Customer controls let you shape and model the data as needed.

5-FinOps Hub Scenario Modeling — FinOps Hub scenario modelling in action.

10. We provide these FinOps tools at no additional charge to Google Cloud customers. We don’t charge extra for extended data lookback windows, UI views and analysis, or FinOps Hub cost optimizations. This helps customers spend less time on understanding their bills and more time driving business innovation.

Read the full IDC MarketScape excerpt to learn more about our capabilities.

^{Source: “IDC MarketScape: Worldwide FinOps Cloud Costs Optimization Hyperscalers 2025 Vendor Assessment” by Jevin Jensen, July 2025, IDC #US53679825}

^{IDC MarketScape vendor analysis model is designed to provide an overview of the competitive fitness of ICT suppliers in a given market.  The research methodology utilizes a rigorous scoring methodology based on both qualitative and quantitative criteria that results in a single graphical illustration of each vendor’s position within a given market. The Capabilities score measures vendor product, go-to-market and business execution in the short-term. The Strategy score measures alignment of vendor strategies with customer requirements in a 3-5-year timeframe. Vendor market share is represented by the size of the circles. Vendor year-over-year growth rate relative to the given market is indicated by a plus, neutral or minus next to the vendor name.}

Read More for the details.

2025 08 05

GCP – Announcements for AI Hypercomputer: The latest infrastructure news for ML practitioners

Tibor Kiss Cloud, Google Cloud gcp

Curious about the latest in AI infrastructure from Google Cloud? Every three months we share a roundup of the latest AI Hypercomputer news, resources, events, learning opportunities, and more. Read on to learn new ways to simplify AI infrastructure deployment, improve performance, and optimize your costs.

AI innovation is moving at an unprecedented rate and achieving remarkable milestones— for example, Gemini Deep Think achieved the Gold Medal standard at the latest International Mathematics Olympiad, and Google now serves over 980 trillion monthly tokens. We are able to achieve these milestones through AI-optimized hardware, leading software with open frameworks, and flexible consumption models.

At Google Cloud, we provide you with access to the same capabilities behind Gemini, Veo 3, and more through AI Hypercomputer, our integrated AI supercomputing system. AI Hypercomputer allows you to train and serve AI models at massive scale, delivering superior performance, lower latency, and best-in-class price/performance. For example, the latest enhancements to Cluster Director simplify the complex task of managing your compute, network, and storage for both training and inference workloads, while new contributions to llm-d help to significantly accelerate the deployment of large-scale inference models on Google Kubernetes Engine (GKE).

Read on for all the latest news.

Dynamic Workload Scheduler

At the top of the stack, AI Hypercomputer offers flexible consumption via Dynamic Workload Scheduler, which optimizes compute resources with your choice of workload scheduling approaches. For workloads with predictable duration, Calendar mode lets you obtain short-term assured capacity at a discount, without long-term commitments. Similar to reserving a hotel room, you know when and where you can train or serve your models; learn how to get started here. Flex start mode, meanwhile, provides better economics and obtainability for on-demand resources that have flexible start-time requirements. This is great for batch processing, training or finetuning, and lets you either start the job once all the resources are available, or begin as resources trickle in. Flex start mode is now available for you to use in preview. Calendar mode in preview supports A3 and A4 VMs while Flex Start mode supports all GPU VMs. Both Calendar and Flex start mode support TPU v5e, TPU v5p, and Trillium.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e3926520af0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Leading software, open frameworks

AI Hypercomputer provides choice and flexibility through support the most popular AI and ML libraries and frameworks.

Cluster Director
To make deploying and managing large-scale clusters easier, AI Hypercomputer offers Cluster Director, a management and orchestration layer for GPU resources that provides cluster management, job scheduling, performance optimization at scale, and comprehensive visibility and insights. At Next ‘25, we highlighted new capabilities including a new GUI, observability and straggler detection features, and now you can take advantage of them. Learn more about the latest advances here.

llm-d project releases version 0.2
To help you serve LLMs at scale, the new open source llm-d project was announced in May for distributed and disaggregated inference with vLLM. The project just released version 0.2, providing performance-optimized patterns for prefill/decode disaggregation and multi-node mixture of expert deployments. It also integrates new kernel optimizations in vLLM, and offers new deployers, improved benchmarking, and improved scheduler extensibility. Learn more on the llm-d blog.

MaxText and MaxDiffusion updates
MaxText and MaxDiffusion are open-source solutions that provide high-performance implementations for training LLMs, and for training and serving diffusion models, respectively. Each solution makes it easy to get started with JAX by providing a robust platform with reproducible recipes, Docker images for TPU and GPUs, and built-in support for popular open-source models like DeepSeek and Wan 2.1. Integrations like multi-tier checkpointing and Accelerated Processing Kit (xpk) for cluster setup and workload management help ensure users get the access to these latest techniques without having to reinvent the wheel for their unique workloads.

Whether you’re new to training LLMs with JAX, or you’re looking to scale to tens of thousands of accelerators, MaxText is the best place to start. To simplify and accelerate onboarding, we’re revamping our UX and onboarding experience by building out our documentation, demystifying concepts like sharding at scale (docs), checkpointing (docs) and designing a model for TPUs (docs). We’re also adding Colab notebook support (initial work) to give simple examples of new features and models. Hear how Kakao and other customers are using MaxText to train their models from Google Cloud Next ‘25.

For more advanced MaxText users, we’re continuing to improve model performance with techniques like pipeline parallelism, where our latest enhancements provide a dramatic step-time reduction by eliminating redundant communication when sharding. For new models, we now support DeepSeek R1-0528 and the Qwen 3 dense models, in addition to Gemma 3, Llama 4, Mixtral, and other popular models. We also added Multi-Token Prediction (MTP) in July, making it easy to incorporate this technique.

We’re also expanding the focus of MaxText to provide post-training techniques through integration with the new JAX-native post-training library Tunix. Now, with MaxText and Tunix, you’ll have end to end pre- and post-training capabilities with easy-to-use recipes. With the Tunix integration, we’ll first offer Supervised Fine-Tuning (SFT) before expanding to Reinforcement Learning (RL) techniques like Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), and Proximal Policy Optimization (PPO).

Join our Discord and share your feedback (feature requests, bugs, documentation) directly to the Github repos.

On the MaxDiffusion front, we’re excited to announce support for serving text-to-video models with Wan 2.1 to complement the existing text-to-image training and inference support of FLUX and SDXL. MaxDiffusion already supports training and inference for FLUX Dev and Schnell, Stable Diffusion 2.1, and Stable Diffusion XL. With this end-to-end solution, users can train their own diffusion models in the MaxDiffusion framework, or simply post-train an existing OSS model and then serve it on TPUs and GPUs.

AI-optimized hardware

Monitoring library for TPUs
To help you monitor and optimize the efficiency of your training and inference workloads, we recently released a new monitoring library for Google Cloud TPUs. These new observability and diagnostic tools provide granular insights into performance and accelerator utilization, enabling you to continuously improve the efficiency of your Cloud TPU workloads.

Managed Lustre
We also recently announced improvements to Google Cloud Managed Lustre for high-performance computing (HPC) and AI workloads. Managed Lustre now has four distinct performance tiers, providing throughput options of 125 MB/s, 250 MB/s, 500 MB/s, and 1000 MB/s per TiB of capacity, so you can tailor performance to your specific needs. It also now supports up to 8 PiB of capacity, catering to the larger datasets common in modern AI and HPC applications. Furthermore, the service is tightly integrated with Cluster Director and GKE. For example, GKE includes a managed CSI driver, which allows containerized AI training workflows to access data with ultra-low latency and high throughput.

Onwards and upwards

As we continue to push the boundaries of AI, we’ll update and optimize AI Hypercomputer based on the our learnings from training Gemini to serving 980+ trillion tokens a month. To learn more about using AI Hypercomputer for your own AI workloads, read here. To stay up to date on our progress or ask us questions, join our community!

Read More for the details.

2025 08 05

GCP – How Wells Fargo is using Google Cloud AI to empower its workforce with agentic tools

Tibor Kiss Cloud, Google Cloud gcp

The financial services industry has hit a technology tipping point. AI is fundamentally reshaping how people interact with their financial institutions, and this has forced banks to deliver unprecedented agility, efficiency, and personalization.

Today, Wells Fargo and Google Cloud are excited to announce an expansion of their strategic relationship, which will transform how Wells Fargo uses and deploys agentic AI at scale.

This expanded collaboration will equip Wells Fargo employees — including branch bankers, investment bankers, marketers, and customer relations and corporate teams — with AI agents and tools from Google Cloud.

Wells Fargo is an early adopter of Google Agentspace, Google Cloud’s unified, secure platform to build, manage, and adopt AI agents at scale. Wells Fargo employees and teams will now be able to reach meaningful insights faster and unlock new levels of efficiency and innovation using agents. These agents will help them find and synthesize information faster, automate tasks and workflows, and increase organizational agility.

This collaboration marks a defining moment for agentic deployment in financial services, underscored by Wells Fargo’s commitment to leveraging AI to enhance customer experiences, streamline operations, and cultivate a culture of innovation powered by its people.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e53342e0f70>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Real-World Agentic Applications Across Wells Fargo

By providing easier access to information with agentic search and the ability to customize AI agents, Wells Fargo is helping its employees unlock significant new levels of efficiency and innovation across the entire bank.

For its corporate and investment bank, Wells Fargo is building agents to help employees answer, triage, and summarize complex foreign exchange post-trade inquiries, as well as navigate policies and procedures across internal data sources and systems. Once deployed, these tools can free up time for bankers and traders, allowing them to focus more intently on client relationships. Additionally, these agents can provide bankers with real-time market insights, ultimately enhancing the overall client experience.

Consider contract management, where Wells Fargo handles roughly a quarter of a million documents related to vendor agreements. A custom agent can rapidly query these extensive documents — identifying contracts with specific clauses, payment terms, contract types, and other important contract information, driving consistency and efficiency.

AI agents are set to revolutionize customer service for Wells Fargo, across all channels – including digital, branch, and call centers. Agents can automate routine tasks like balance inquiries and debit card replacements, significantly reducing wait times and freeing bankers to focus on complex tasks and deepening customer relationships. These agents also offer 24/7 hyper-personalized experiences at scale, analyzing vast datasets to provide tailored advice and product recommendations.

Additional capabilities that Wells Fargo will leverage using Agentspace include:

Deeper insights and efficiency: Google’s NotebookLM, offered in Agentspace, will be a cornerstone for various functions within Wells Fargo. This powerful tool lets employees upload websites, documents, presentations, spreadsheets, and other materials, which they can then easily query, interact with, and analyze. This can help with a range of tasks including sophisticated research; content creation; and generating summaries and actionable insights. Employees across the enterprise will leverage NotebookLM for quick, accurate answers, allowing them to work faster and smarter, ultimately freeing up valuable time to focus on more strategic work.

Intelligent information discovery: Moving beyond traditional keyword searches, Wells Fargo will deploy Google-quality, multimodal search. Employees will be able to conversationally interact with enterprise data, asking complex questions and receiving relevant, synthesized information from a vast array of internal sources; these could include employee handbooks, corporate policies, and operational tools. This allows for quick, accurate responses, whether for a nuanced policy query or details on current service offerings.

Responsible AI: a core tenet

Both Wells Fargo and Google Cloud are deeply committed to the responsible development and deployment of AI. This strategic collaboration is underpinned by rigorous ethical and regulatory frameworks so that these powerful tools are used in a way that promotes accuracy, fairness, transparency, accountability, and security.

Wells Fargo has established strong AI governance to align implementation with the company’s values and regulatory priorities, building trust with both employees and customers.

Transforming operations, accelerating innovation

Ultimately, the expanded collaboration is about Wells Fargo remaining at the forefront of the financial services industry through its embrace of new technology to better serve its customers. By enabling these advanced AI products internally first, Wells Fargo is building a strong foundation to accelerate the delivery of new capabilities and enhanced experiences to its customers.

The journey is just beginning, but the vision is clear: a future where generative AI empowers every employee at Wells Fargo, transforming how they work, collaborate, and serve customers. Together, Wells Fargo and Google Cloud are building that future.

Learn more about how Google Agentspace can help transform your operations, offerings, customer experience, and more.

Read More for the details.

2025 08 04

GCP – Your epic quest awaits: Conquer the Agentverse

Tibor Kiss Cloud, Google Cloud gcp

Google Cloud Labs is assembling a party of adventurers for a live-action quest: The Agentverse. This is no mere lecture hall; it’s a hands-on training ground where you’ll forge your skills in the heat of battle. You’ll leave this mission not with notes, but with legendary skills that you’ve earned yourself.

At the heart of this quest lies the Agentverse, a digital ecosystem that leverages intelligent AI agents—powerful constructs of logic and language that are designed to collaborate and reshape operations. However, wielding such formidable power isn’t without its trials. Transitioning AI from a conceptual “vibe” to a robust, secure, and scalable system presents significant challenges, including integration complexity, security concerns, and data governance.

To overcome these very obstacles, this quest provides a definitive playbook for mastering the agentic future on Google Cloud. You’ll engage in a hands-on, gamified mission where you’ll learn an end-to-end roadmap for taking an AI idea from its initial concept to full-scale operational reality. You’ll also discover how the roles of developer, architect, data engineer, and site reliability engineer (SRE) converge to create, manage, and scale a thriving Agentverse, emphasizing the need for technical upgrades and unified integration platforms for successful deployment.

This isn’t a challenge for the faint of heart. This is a call to heroes.

The muster point: Your adventure begins here

A portal to the Agentverse will open in a city near you. But be warned, each questing party is limited to a small fellowship of heroes. You must register to declare your intent to join the quest.

_{Registration links coming soon for NYC, Seattle, and Vancouver—stay tuned!}

The Fellowship of the Agentverse

Every legendary party thrives on a balance of unique powers. To triumph in the Agentverse, you’ll choose your path, master your role, and unlock powerful new abilities. Consider your calling, for legendary skills await!

The Shadowblade (The Developer)

You’re the master of the swift and silent strike, turning a fleeting thought into a precision weapon. Your quest is to master Controlled “Vibe Coding” and the Model Context Protocol (MCP), forging ideas into reliable enterprise components that act with precision and predictability.

The Summoner (The System Architect)

You don’t fight alone; you command a legion. As the Summoner, you’ll learn to architect and command resilient multi-agent systems, drawing the blueprints for powerful AI agents to work in concert using proven patterns and Agent-to-Agent (A2A) communication. Your grand vision transforms a chaotic mob into a disciplined, collaborative, and unstoppable army.

The Scholar (The Data Engineer)

You are the keeper of arcane knowledge, the party’s true power source. While others focus on the physical, you draw energy from the great wells of data. Your most powerful incantation is Retrieval-Augmented Generation (RAG), a spell that summons vast, governed knowledge from BigQuery and other data sources, breathing true intelligence and context into your agents.

The Guardian (The SRE and DevOps)

You are the unyielding bastion against the forces of chaos. While others create and command, you ensure their creations endure. You’ll master the sacred operational disciplines, the modern battle-hymns known as AgentOps, raising your shield of observability, security, and deployment protocols to protect the entire agentic ecosystem, guaranteeing mission-critical performance and defending against latency and failure.

Express your interest by filling out the earlier registration form for the location that you want. The Agentverse awaits its champions. Will you answer the call?

Read More for the details.

2025 08 04

GCP – Optimize your cloud costs using Cloud Hub Optimization and Cost Explorer

Tibor Kiss Cloud, Google Cloud gcp

Application owners are looking for three things when they think about optimizing cloud costs:

What are the most expensive resources?
Which resources are costing me more this week or month?
Which resources are poorly utilized?

To help you answer these questions quickly and easily, we announced Cloud Hub Optimization and Cost Explorer, in private preview, at Google Cloud Next 2025. And today, we are excited to announce that both Cloud Hub Optimization and Cost Explorer are now in public preview.

Application cost and utilization

As an app owner, your primary objective is keeping your application healthy at all times. Yet, monitoring all the individual components of your application, which may straddle dozens of Projects, can be quite overwhelming. AppHub Applications allow you to reorganize cloud around your application, giving you the information and controls you need at your fingertips.

In addition to supporting Google Cloud Projects, Cloud Hub Optimization and Cost Explorer leverage App Hub applications to show you the cost-efficiency of your application’s workloads and services instantly. This is great for instance when you are trying to pinpoint deployments running on GKE clusters that might be wasting valuable resources, such as GPUs.

Not just another cost dashboard

When you bring up Cloud Hub Optimization, you can immediately see the resources that are costing you the most, along with the percentage change in their cost. With this highly granular cost information, you can now attribute your costs to specific resources and resource owners to reason about any changes in costs.

We have additionally integrated granular cost data from Cloud Billing and resource utilization data from Cloud Monitoring to give you a comprehensive picture of your cost efficiency. This includes average vCPU utilization for your Project, which helps you find the most promising optimization candidates across hundreds of Google Cloud Projects.

The Cost Explorer dashboard also shows you your costs logically organized at the product level, for even more cost explainability. Instead of seeing a lump sum cost for Compute Engine, you can now see your exact spend on individual products including Google Kubernetes Engine (GKE) clusters, Persistent Disks, Cloud Load Balancing, and more.

Simple is powerful

Customers who have tried these new tools love the information that is surfaced as well as the simplicity of the interfaces.

“My team has to keep an eye on cloud costs across tens of business units and hundreds of developers. The Cloud Hub Optimization and Cost Explorer dashboards are a force multiplier for my team as they tell us where to look for cost savings and potential optimization opportunities.” – Frank Dice, Principal Cloud Architect, Major League Baseball

Customers especially appreciate the breadth of product coverage available out of the box without any additional setup, and the fact that there is no additional charge to using these features.

What’s next

As your organization “shifts left” on cloud cost management, we are working to help application owners and developers understand and optimize their cloud costs. You can try Cloud Hub Optimize and Cost Explorer here.

You can also see a live demo of how Cloud Hub Optimization and Cost Explorer can be used to identify underutilized GKE clusters within seconds in the Google Cloud Next 2025 talk Maximize Your Cloud ROI.

^{Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.}

Read More for the details.

2025 08 04

GCP – How MLB keeps fans connected to the game – one cache hit at a time

Tibor Kiss Cloud, Google Cloud gcp

Editor’s note: Major League Baseball (MLB) delivers data in real time to millions of fans, apps, and broadcasters — tracking everything from pitch speeds to player positions. To keep pace, the Baseball Data Platform team turned to Memorystore for Valkey, Google Cloud’s fully managed in-memory data service. It’s helped MLB handle billions of daily requests, recover from outages in seconds, and focus more on innovation than infrastructure management.

There’s a saying in baseball: It’s a game of inches. A ball that lands fair by a few blades of grass, a runner safe by the edge of a cleat – tiny margins can change everything.

For the Baseball Data Platform team at Major League Baseball (MLB), it’s also a game of milliseconds. Our team is responsible for building and maintaining the Stats API, the backbone of MLB’s data delivery platform. It powers everything from live game trackers to broadcast graphics, internal apps, and more. On a busy day, we handle nearly 10 billion requests at the edge and 15,000 API requests per second at peak load.

Whether fans tap in through an app, a screen in the stadium, or a second-screen experience at home, we’re the ones making sure the data shows up accurately and instantly. And Memorystore for Valkey helps us do that. Of

Bases loaded, cache full

Our original caching layer was built on self-managed Memcached VMs. But as our data needs evolved, so did the pressure on our systems. If a VM failed, its cached data was lost, and recovery could take hours. During that window, we’d divert traffic to a backup region and rebuild caches, all while watching the risk of a second failure rise.

Meanwhile, our data had grown more complex. We were now streaming high-volume telemetry from the field, including player movement data that maps 29 points on each athlete’s body in 3D space at 30 frames per second. The sheer volume and velocity of this data tested the limits of our stack.

That’s what led us to Memorystore for Valkey, Google Cloud’s managed in-memory service built on Valkey. What stood out immediately was the built-in high availability, cross-region replication, and the ability to scale clusters with zero downtime. And because it’s a managed service, we could finally shift our focus from maintaining cache infrastructure to optimizing the experience around it.

aside_block: <ListValue: [StructValue([(‘title’, ‘Build smarter with Google Cloud databases!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f2506b9c430>), (‘btn_text’, ‘Download the e-book today!’), (‘href’, ‘https://cloud.google.com/resources/content/databases-customer-stories-2025’), (‘image’, None)])]>

Covering the field with Memorystore for Valkey

Today, some of our most critical data flows run through Memorystore for Valkey. To make the most of it, we’ve developed a caching strategy built around three major use cases.

1. Ballpark buffering

This is our edge-layer caching, deployed directly in Google Distributed Cloud at the stadium level. Here, we process and temporarily store operator-generated metadata and real-time tracking data before pushing it into our cloud services. Memorystore for Valkey acts as a message buffer in this setup. It gives us a layer of protection between the ballpark systems and our backend. That way, if there’s ever a network hiccup, we don’t lose data – we just replay it from the cache once connectivity is restored.

We’re running this on Valkey 8.0, with latency as low as 1–2 milliseconds during live games and peak command volumes around 11,000 per second. It’s fast, reliable, and invisible to the people in the stands – as it should be.

Fig. 1 – Architecture diagram of ballpark caching

2. Live GUMBO, served fast

We call it GUMBO, but it’s not stew – it’s our Grand Unified Master Baseball Object, a JSON structure that represents the full state of a live game. This includes play-by-play updates, pitch data, player positions, and more. After every pitch and every play, the GUMBO gets updated. Fans and internal systems alike need that data to be fresh and fast, no matter where they are. That’s why we use a multi-region read setup for GUMBO, with Memorystore for Valkey helping us meet our SLA of under two seconds for cross-region availability.

Previously, we had to build and manage our own replication pipeline to make this work. With Memorystore for Valkey’s cross-region replication (CRR), we’ve started simplifying that stack, replacing custom infrastructure with built-in capabilities that are faster and easier to operate.

Fig. 2 – Architecture diagram of live Gumbo caching

3. Everything else: Stats, rosters, leaderboards

Not all data is real-time. Some of it updates every few seconds, some every few minutes, and some post-game. But all of it – batting stats, player rosters, standings, schedules – needs to be delivered fast when requested. For these semi-live and core datasets, we use a read-through caching model. If a user request hits an empty cache, our system fetches the data from the database, updates the cache, and returns the result.

We’ve configured two Memorystore clusters per region: a ‘regular’ cluster and a ‘heavy’ cluster. Together, they handle upwards of 200,000 commands per second with average memory usage below 75%, and we’re still seeing room to grow.

Staying ahead of the game

With Memorystore for Valkey at the core of our caching strategy, we’ve stepped up to the plate in terms of performance, reliability, and operational efficiency.

The shift to managed caching has freed up our team to focus on what’s next. We’re investing more in tuning performance, refining cache patterns, and building tools to deepen our observability. We’ve started exploring better ways to monitor key distribution and slot activity within our clusters to help us detect hotspots before they become bottlenecks.

We’re also interested in automating how we scale clusters throughout the season. Game schedules can affect traffic in predictable ways, and being able to dynamically adjust cache size – either manually or through a rules-based approach – would help us right-size our infrastructure with even more precision.

With Memorystore for Valkey, we’re in a position to move faster, build smarter, and deliver better experiences for everyone who depends on our data – from fans in the stands to broadcasters, analysts, and club staff.

^{Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.}

Read More for the details.

2025 08 04

GCP – How Cake, Vietnam’s leading digital bank, found the right mix of simplicity and security with ChromeOS and Chrome Enterprise

Tibor Kiss Cloud, Google Cloud gcp

Editor’s note: Today’s post is by Hiển Từ Thế (Jay), Chief Technology Officer for Cake Digital Bank, a prominent digital-only bank in Vietnam offering a comprehensive suite of financial services entirely through its mobile application. As a technology forward company, they chose ChromeOS to support staff with seamless deployment, ease of management, and streamlined security tasks.

What does it take to be a successful digital bank? For Cake, Vietnam’s largest digital bank, it means thinking like a technology company first. Launched in 2021, Cake has rapidly emerged as a key player in the country’s burgeoning fintech landscape, reaching profitability in just a few years—a milestone achieved by only the top 5% of digital banks worldwide.

As a technology-driven bank, we constantly navigate the need to stay secure and compliant while continuing to innovate. To address this challenge, we adopted a more flexible approach—one in which ChromeOS and Chrome Enterprise supported our transition.

Being a leading digital bank in Vietnam, with an average increase of more than 30% in the number of customers each year, while still having to comply with risk management and banking regulations, we face many challenges in IT system management such as growth rate, product and service diversification, flexible expansion, maximum security and safety. We needed a solution that was secure, easy to manage, and empowered our employees to do their best work without friction.

Securing enterprise browsing from the start

We began by deploying Chrome Enterprise to ensure secure access to corporate apps across all employee devices. With built-in protections like Safe Browsing, sandboxing, file scanning, and password alerts, Chrome Enterprise helps us reduce risk from malicious sites, downloads, and unsafe user behaviors.

To meet more advanced needs, we adopted Chrome Enterprise Premium, which lets us enforce centralized access controls, data loss prevention, and real-time protection against threats like phishing and ransomware. We also apply context-aware access policies based on identity, location, and network to support a zero-trust security model.

Extending protection with cloud-first devices

We took the next step by rolling out Chromebooks, which run on a read-only OS that blocks executables—minimizing ransomware risk and reducing our attack surface. ChromeOS adds another layer of protection with sandboxing, device-level DLP policies, and seamless automatic updates that don’t disrupt work.

We also used ChromeOS Flex, a no-cost cloud-first OS offered by Google, to convert and revitalize our existing PCs and Macs—no new hardware required. In a short period of time, we only needed one IT staff to convert 50 devices. The process was fast, cost-effective, and easy for employees, with no extensive training needed. We started with frontline, support, and back-office teams, and the results were immediate.

The results speak for themselves

By implementing Chrome Enterprise and ChromeOS, we didn’t just solve our security and management challenges—we enhanced our entire model.

Chrome Enterprise enhances our data security by centralizing control within the browser. This approach prevents sensitive information from being stored on local machines, significantly reducing the risk of data leakage. Through browser settings and Chrome Enterprise policies, we can effectively manage data, control access, and remotely configure settings such as device controls and screen lock, ensuring our secure data remains protected.

Streamlined bank audits and empowered IT

The read-only nature of ChromeOS makes navigating our rigorous bank audits and achieving certifications like ISO and BCIS significantly easier. Furthermore, our lean IT team is more empowered than ever. Using policy as code templates, we can make sure the same protections and access restrictions are applied to everyone. Our team can deploy and manage our entire fleet with incredible efficiency. With ChromeOS, we don’t have data stored locally on our devices and it is very easy to control and manage the flow of company information as there is no data going out.

Chrome Enterprise and ChromeOS provide an integrated and secure stack that aligns with our tech-forward mission. The Google Admin console offers centralized control over all device and browser settings, enabling us to enforce policies, audit compliance, and safeguard our data with ease. By utilizing pre-configured templates, we eliminate complex setups and ensure consistent policy application across all users and devices.

Foundation for the future of fintech

For us, choosing ChromeOS and Chrome Enterprise wasn’t just a technical upgrade; it was a business decision that gave us peace of mind while reinforcing our identity. It allows us to attract top talent with a modern, flexible work environment while assuring regulators and customers that their security is our top priority. As we continue to pioneer the future of banking in Southeast Asia, our journey proves that with the right tools, you really can have a flexible working environment while maintaining security compliance in the fintech space.

Read More for the details.

2025 08 01

GCP – Introducing audit-only mode for Access Transparency

Tibor Kiss Cloud, Google Cloud gcp

As part of our commitment to cloud workload security and transparency, today, we’re introducing a new, lightweight audit-only mode for Access Approval to enable access approvals in an “on demand only” model. This new capability is available at no extra charge in the Security section of the Google Cloud Console.

Previously, Access Approval delivered robust security by ensuring all Google Cloud accesses were reviewed. While incredibly effective as a mitigation control, this comprehensive approach meant administrators frequently reviewed access to both sensitive and non-sensitive data, which could add administrative overhead. It also wasn’t specifically designed to easily enable audit log-powered reactive control strategies — a need we’ve heard from many customers. Our new audit-only mode builds on that strong foundation, offering the flexibility to tailor Access Approval to your specific product needs and security workflows.

The new Access Approval combines the benefits of Access Approval (access notifications, revocable Access Approval events, Cloud Console or API based user experience) with new functionality to run in audit mode and to limit approvals to specific products.

Additionally, workload administrators can easily switch Access Approval policies at any time to temporarily shift policy. For example, you can prevent any Google Cloud access without approval during a critical launch week.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebe34751610>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Here’s how you can use it.

Detect a finding via analysis of an Access Transparency log (such as a write action).
Navigate to Access Approval.
Locate the event from the “approvalID” provided in the Access Transparency log.
Add Access Approvals by revoking access to the data associated with the access event.
Google will now require customer approval to access the resource in that access event going forward.

Our customers have said that adding an additional source of audit log data linked to mitigation workflows can be invaluable. For example, for organizations with strict change-management processes, enabling Access Approval in full is a suitable control for these workloads. For other organizations, Google Cloud’s Access Approval audit mode with access mitigation is part of a comprehensive disaster mitigation plan that is available on demand without interrupting general administrative workflows.

With the new audit-only mode policy in Access Approval, workload administrators can now add Access Approval to on-demand security mitigation plans — all without incurring additional operating burden on access events. With Access Approval, you hold the control options to limit Google Cloud’s administrative and support access to your data on-demand, when you choose to apply it.

To get started today with Access Approval’s “Transparency” audit mode, read our setup guide.

Read More for the details.

gcp

1. Run workloads on precisely optimized infrastructure

2. Build AI-centric infrastructure

3. Meet the AI moment with containers

4. Empower customers with true sovereignty

5. Deliver a “planet-scale” network

Take the next steps on your journey with Google Cloud

Intelligent AI apps, meet intelligent data

Get started with Looker MCP Server

First, implement a safe decommissioning plan

Next, find and fix code that references dangling buckets

Start with a strong foundation: The GKE base platform

Why the inference-optimized platform?

1. Optimized for performance and cost

2. Built to scale any inference pattern

3. Simplified operations for complex models

Get started today!

The future of AI on GKE

Optimizations specific to short query execution

Full view of data

Queries with joins and aggregations

Optional job creation

Better throughput

Optimal impact

Short query optimizations in action

Try it for yourself

BigQuery under the hood: Enhanced vectorization in the advanced runtime

Modern defense for modern threats

Control, compliance, and peace of mind

Get started with Enhanced Backups

Protect your data now

Specialized data agents as expert partners

Building the interconnected agent ecosystem

A unified and AI-native data foundation

The future is agentic

Columnar storage meets vectorized execution

Columnar Storage in Spanner: Hybrid Architecture

Vectorized execution: turbocharging your queries

Better with BigQuery: Accelerating federated queries

Columnar engine in action: Accelerating your analytical queries

Get started with Spanner columnar engine today!

Simplify workflows with Data Science Agent

Multi-cell code generation for anything you want to do with data

Easy visualizations

Explaining and fixing errors

Fast and intelligent code completion

Get started today

Dynamic Workload Scheduler

Leading software, open frameworks

AI-optimized hardware

Onwards and upwards

The muster point: Your adventure begins here

The Fellowship of the Agentverse

The Shadowblade (The Developer)

The Summoner (The System Architect)

The Scholar (The Data Engineer)

The Guardian (The SRE and DevOps)

Application cost and utilization

Not just another cost dashboard

Simple is powerful

What’s next

Bases loaded, cache full

Covering the field with Memorystore for Valkey

1. Ballpark buffering

2. Live GUMBO, served fast

3. Everything else: Stats, rosters, leaderboards

Staying ahead of the game