Editor’s note: Today’s post is by Alyson Butler, Director of Team Member Experiences atTELUS, a communications technology company with more than 20 million customer connections. TELUS has integrated ChromeOS, Google Workspace, Chrome Enterprise Premium, and Cameyo to support its transformation into a globally recognized technology leader.
Thirty-five years ago, TELUS started out by providing telecommunication services to Canadians. More than three decades later, we’re still connecting people and the data they need, but we’ve grown into a large, diversified company offering a full range of innovative digital solutions for IT, healthcare, customer experience and agriculture.
As Director of Team Member Experiences at TELUS, I lead the teams responsible for managing the global digital workplace experience for nearly 60,000 team members worldwide. Our mission is to unlock their productivity in a secure way, ensuring they feel like a connected part of the team, while protecting our internal and customer data. This can be especially challenging with a distributed workforce. What may have sounded impossible before, was achievable with Google. We created an enterprise computing stack with ChromeOS, Google Workspace, Cameyo, and Chrome Enterprise Premium. This new solution made our login speed three times faster than our previous setup — a crucial improvement for our customer agents. The tech stack simplified our security and management while unlocking business benefits like higher productivity, cost-per-call savings, and improved customer satisfaction.
Workspace established a strong foundation
Our digital journey began in 2018 with a mission to transform our distributed workforce. We knew that cloud apps with built-in security were the way forward. By migrating to Google Workspace, we immediately improved productivity, enhanced the end-user experience, and took advantage of its robust, natively built-in security and compliance features.
Workspace quickly became an end-to-end solution for us. Our teams create files in Google Docs, Sheets, and Slides, and store them in Google Drive, where the automatic version control ensures seamless collaboration.
We connect on Chat, manage approvals and workflows in Gmail, and use Google Meet to connect virtually. Google Calendar is our workhorse for planning and scheduling, allowing us to view other team members’ schedules, set up events, and send email invites — all from one tab. The benefits of Google Workspace allowed us to fundamentally rethink our end-user computing strategy.
With growth from acquisitions, we built a cloud platform
To address our technical debt and remove past complexities, we needed a platform that could support our growth from acquisitions while improving our security posture and team member experience. Seeing the positive results with Google Workspace, we were inspired to test other Google solutions to support our growth. We anticipated that a combination of ChromeOS, Cameyo, and Chrome Enterprise Premium would help us reap the benefits of browser-based computing, including simplified management and improved security.
We deployed ChromeOS in our call centers, allowing us to move away from our costly legacy Virtual Desktop Infrastructure (VDI) solution. However, we still needed to provide access to important contact center software.
After evaluating several virtualization products, Cameyo clearly outperformed the others and offered the significant cost savings we needed. By choosing ChromeOS with Cameyo, we achieved a major digital transformation, cutting operational expenses and avoiding a $15 million infrastructure refresh. The transition also accelerated performance and simplified our IT management. What’s more, our call center team members have found the Cameyo platform incredibly easy to use. Today, more than 100 applications flow through it. Looking ahead, we expect this solution to yield a significantly lower total cost of ownership compared to our previous solution.
This journey led us to what we now call the TELUS Desktop Stream: a zero-trust, browser-based app streaming solution built with ChromeOS, Chrome Enterprise Premium, and Cameyo. Cameyo is crucial because it allows all our applications, whether cloud-based or not, to run directly through a browser. This gives us confidence that we can continue to provide our teams with the necessary tools while reaping the benefits of cloud-based end-user computing.
Unlocking business-wide benefits
At every stage of our digital transformation, we’ve seen benefits beyond cost savings. For example, my IT team appreciates Chrome Enterprise Premium for its ability to streamline our Chrome browser management across our entire device fleet. They also love how Cameyo enables TELUS Desktop Stream to scale application access instantly, supporting the flexible workforce we need to serve customers worldwide.
Chrome Enterprise Premium gave us the security protection we needed for a distributed workforce and a growing business. With extra security features such as context-aware access controls and data loss protection, we are more secure as an organization, even with eliminating the need of VPN’s for team members leveraging virtual access.
ChromeOS helped us deliver the speed that team members requested for their productivity solutions. Login times are three times faster in ChromeOS compared with our old Windows operating system. And today, team members talk about how much faster it is to work within ChromeOS, integrated with Chrome Enterprise and Cameyo, compared with the previous slow VDI solution with annoying lags and inconsistent latency.
In the TELUS call center, the combined solutions have directly delivered greater efficiency by allowing us to handle more calls per hour. The integration has been really powerful, having a real impact on productivity, and by extension, the levels of service and engagement we can bring to our customers.
When employees are satisfied, customers are satisfied
Our ongoing Google deployment is helping us achieve our goal to thrive. We believe that when employees are empowered with intuitive, friction-less experiences, they become not just more productive but more satisfied as well. With Google, we are now set up for the future with a web based tech stack that is agile enough to securely support our growth, while keeping our teams productive and engaged.
As companies transition past legacy infrastructure and set themselves up for growth in AI, multi-cloud, and platform engineering requirements, many are looking to Google Cloud for its reliability, performance, and cost benefits.
To achieve successful migrations, especially at the enterprise level, organizations need a seasoned partner with a deep understanding of platform modernization, automation, and business outcomes. Searce is one of the Google Cloud partners helping several organizations migrate to Google Cloud and achieve striking business value.
Searce has led over 1,000 migrations
Searce is an engineering-led, AI-powered consultancy and Google Cloud Premier partner that has delivered over 1,000 successful migrations to date. Their focus: helping businesses across industries scale faster, modernize legacy platforms, and unlock new efficiencies with Google Cloud.
Through hands-on experience, Searce has identified three consistent, measurable benefits that enterprises gain when migrating to Google Cloud:
Improved reliability and integration with cloud-native services (e.g., 25% improved reliability)
Increased productivity with reduced costs (e.g., 50% reduction in TCO)
Scalability and performance (e.g., Petabytes of data migrated with 75% reduced downtime and 30% performance improvement)
Together, Google Cloud’s Global VPC, leadership in AI/ML, and powerful managed container services like Google Kubernetes Engine and GKE Autopilot enable these transformations. Searce combines these capabilities with its engineering-led, solution-oriented approach to deliver impact fast — especially for organizations modernizing legacy platforms or scaling AI workloads. Let’s take a look at a few examples of migrations to Google Cloud that Searce has performed.
Boosting reliability in healthcare with GKE Gateway Controller
A major digital healthcare provider that treats more than 50 million patients per year required near 100% platform availability. Their legacy Kubernetes infrastructure on a different cloud provider would regularly have unplanned outages during microservices upgrades, which postponed care, interrupted pharmacies, and eroded patient trust.
Searce upgraded their platform to GKE, replacing legacy ingress resources with GKE Gateway Controllers to break apart routing and facilitate safer, quicker rollbacks. The outcome? Searce said the healthcare company saw:
25% increase in platform reliability
30% reduction in infrastructure expense
Streamlined log aggregation with Cloud Logging and single resource management with Global VPC architecture
With GKE, they now roll out updates securely, minimizing downtime and speeding up development without sacrificing availability.
50% lower TCO for a global fintech leader with GKE Autopilot
With operations in over 20 countries, an international fintech company used several Kubernetes clusters to achieve performance at scale, but managing this sort of global deployment had a price. Operational overhead was sucking engineering time and momentum out of innovation.
By moving them to GKE Autopilot, Searce assisted in unifying service management and transferring operational complexity to Google Cloud. With global HTTP load balancing, self-managing Infrastructure as Code (IaC) modules, and team-specific resource management, Searce said its customer’s business experienced:
50% reduction in total cost of ownership
40% improvement in engineering productivity
Reduced time-to-market for customer-facing features
This transformation enabled teams to concentrate on what is most important: creating value for users, not dealing with infrastructure.
Scaling performance for a global telecom giant
A leading global telecommunication provider with petabytes of data and over 150 years of history needed to be able to scale without compromises. Their traditional systems constrained performance and required unacceptably high amounts of downtime for maintenance.
They joined forces with Searce to plan a smooth transition to Google Cloud, optimizing storage and compute for AI workloads, and combining high-throughput data pipelines.
Searce and the customer were able to realize:
30% boost in system performance
75% decrease in downtime during migration
Increased customer innovation and data-driven services for a future-proof foundation
How Searce delivers migration success
With over 1000 successful migrations under its belt, Searce brings deep engineering expertise and a proven framework to help enterprises move to Google Cloud with speed, confidence, and long-term value.
At the core is Searce’s ‘evlos’ (Solve differently) approach to simplify, optimize, and future-proof enterprise workloads across infrastructure, data, applications, and AI.
A proven five-step migration framework
Searce’s structured approach ensures a smooth transition from discovery through Day-2 operations:
Discovery and assessment – Identify workloads and dependencies
Planning – Define migration, DR, and rollback strategies
Design, migrate and test – Wave-based automation with minimal downtime
Execution – Cutover and system stabilization
HyperCare – 24×7 post-migration support via Searce’s in-house SOC team
To help enterprises through their journey to Google Cloud and further streamline the customer journey, Searce offers three key accelerators:
Automated Migration to Google Cloud – Script-based automation for faster cutovers
Ready to accelerate your migration journey?
Enterprises worldwide are choosing Google Cloud to build scalable, reliable, and AI-ready platforms. With proven migration experience, domain expertise, and a unique approach to transformation, Searce is the trusted partner to get you there faster. Backed by Google Cloud’s capabilities, Searce turns migration into a strategic enabler for innovation, not just a technical shift.
Explore what’s possible when you migrate with confidence. Access the whitepaper using this link to get more details.
Innovating with AI requires accelerators such as GPUs that can be hard to come by in times of extreme demand. To address this challenge, we offer Dynamic Workload Scheduler (DWS), a service that optimizes access to compute resources when and where you need them. In July, we announced Calendar mode in DWS to provide short-term ML capacity without long-term commitments, and today, we are taking the next step: the general availability (GA) of Flex-start VMs.
Available through the Compute Engine instance API, gcloud CLI, and the Google Cloud console, Flex-start VMs provide a simple and direct way to create single VM instances that can wait for in-demand GPUs. This makes it easy to integrate this flexible consumption option into your existing workflows and schedulers.
What are Flex-start VMs?
Flex-start VMs, powered by Dynamic Workload Scheduler, introduce a highly differentiated consumption model that’s a first among major cloud providers, letting you create single VM instances that provide fair and improved access to GPUs. Flex-start VMs are ideal for defined-duration tasks such as AI model fine-tuning, batch inference, HPC, and research experiments that don’t need to start immediately. In exchange for being flexible with start time, you get two major benefits:
Dramatically improved resource obtainability: By allowing your capacity requests to persist in a queue for up to two hours, you increase the likelihood of securing resources, without needing to build your own retry logic.
Cost-effective pricing: Flex-start VM SKUs offer significant discounts compared to standard on-demand pricing, making cutting-edge accelerators more accessible.
Flex-start VMs can run uninterrupted for a maximum of seven days and consume preemptible quota.
A new way to request capacity
With Flex-start VMs, you can now choose how your request is handled if capacity isn’t immediately available using a single parameter: request-valid-for-duration.
Without this parameter, when creating a VM, Compute Engine makes a short, best-effort attempt (about 90 seconds) to secure your resources. If capacity is available, your VM is provisioned. If not, the request fails quickly with a stockout error. This “fail-fast” behavior is good for workflows where you need an answer immediately so you can make scheduling decisions such as trying another zone or falling back to a different machine type.
However, for workloads that can wait, you can now make a persistent capacity request by setting the request-valid-for-duration flag. Select a period between 90 seconds and 2 hours to instruct Compute Engine to hold your request in a queue. Your VM enters a PENDING state, and the system works to provision your resources as they become available within your specified timeframe. This “get-in-line” approach provides a fair and managed way to access hardware, transforming the user experience from one of repeated manual retries to a simple, one-time request.
Key features of Flex-start VMs
Flex-start VMs offer several critical features for flexibility and ease of use:
Direct instance API access: Integration with instances.insert, or via a single CLI command, lets you create single Flex-start VMs simply and directly, making it easy to integrate them into custom schedulers and workflows.
Stop and start capabilities:You have full control over your Flex-start VMs. For instance, you can stop an instance to pause billing and release the underlying resources. Then, when you’re ready to resume it, simply issue a start command to place a new capacity request. Once the capacity is successfully provisioned, the seven-day maximum run duration clock resets.
Configurable termination action:For many advanced use cases, you can set instanceTerminationAction = STOP so that when your VM’s seven-day runtime expires, the instance is stopped rather than deleted. This preserves your VM’s configuration, including its IP address and boot disk, saving on setup time for subsequent runs.
What customers have to say
Customers across research and industry are using Flex-start VMs to improve their access to scarce accelerators.
“Our custom scheduling environment demands precise control and direct API access. The GA of Flex-start in the Instance API, particularly with its stop/start capabilities and configurable termination, is a game-changer. It allows us to seamlessly integrate this new, highly-efficient consumption model into our complex workflows, maximizing both our resource utilization and performance.”– Ragnar Kjørstad, Systems Engineer, Hudson River Trading (HRT)
“For our critical anti-fraud model training, Flex-start VMs are a game-changer. The queuing mechanism gives us reliable access to powerful A100 GPUs, which enhances our development cycles and security offerings at a significant performance-to-cost advantage.” – Bakai Zhamgyrchiev, Head of ML, Oz Forensics
Get started today
Getting started with a queued Flex-start VM is straightforward. You can create one using a gcloud command or directly through the API.
Flex-start VMs in the Instance API is a direct response to the need for more efficient, reliable, and fair access to high-demand AI accelerators. By introducing a novel queuing mechanism,you can integrate the new Flex-start consumption model into your existing workflows easily, so you can spend less time architecting retry loops for on-demand access. To learn more and try Flex-start VMs today, see the documentation and pricing information.
A year ago today, Google Cloud filed a formal complaint with the European Commission about Microsoft’s anti-competitive cloud licensing practices — specifically those that impose financial penalties on businesses that use Windows Server software on Azure’s biggest competitors.
Despite regulatory scrutiny, it’s clear that Microsoft intends to keep its restrictive licensing policies in place for most cloud customers. In fact, it’s getting worse.
As part of a recent earnings call, Microsoft disclosed that its efforts to force software customers to use Azure are “not anywhere close to the finish line,” and represented one of three pillars “driving [its] growth.” As we approach the end of September, Microsoft is imposing another wave of licensing changes to force more customers to Azure by preventing managed service providers from hosting certain workloads on Azure’s competitors.
Regulators have taken notice. As part of a comprehensive investigation, the U.K.’s Competition and Markets Authority (CMA) recently found that restrictive licensing harms cloud customers, competition, economic growth, and innovation. At the same time, a growingnumber of regulators around the world are also scrutinizing Microsoft’s anti-competitive conduct — proving that fair competition is an issue that transcends politics and borders.
While some progress has been made, restrictive licensing continues to be a global problem, locking in cloud customers, harming economic growth, and stifling innovation.
Economic, security, and innovation harms
Restrictive cloud licensing has caused an enormous amount of harm to the global economy over the last year. This includes direct penalties that Microsoft forces businesses to pay, and downstream harms to economic growth, cybersecurity, and innovation. Ending restrictive licensing could help supercharge economies around the world.
Microsoft still imposes a 400% price markup on customers who choose to move legacy workloads to competitors’ clouds. This penalty forces customers onto Azure by making it more expensive to use a competitor. A mere 5% increase in cloud pricing due to lack of competition costs U.K. cloud customers £500 million annually, according to the CMA. A separate study in the EU found restrictive licensing amounted to a billion-Euro tax on businesses.
With AI technologies disrupting the business market in dramatic ways, ending Microsoft’s anti-competitive licensing is more important than ever as customers move to the cloud to access AI at scale. Customers, not Microsoft, should decide what cloud — and therefore what AI tools — work best for their business.
The ongoing risk of inaction
Perhaps most telling of all, the CMA found that since some of the most restrictive licensing terms went into place over the last few years, Microsoft Azure has gained customers at two or even three times the rate as competitors. Less choice and weaker competition is exactly the type of “existential challenge” to Europe’s competitiveness that the Draghi report warned of.
Ending restrictive licensing could help governments “unlock up to €1.2 trillion in additional EU GDP by 2030” and “generate up to €450 billion per year in fiscal savings and productivity gains,” according to a recent study by the European Centre for International Political Economy. Now is the time for regulators and policymakers globally to act to drive forward digital transformation and innovation.
In the year since our complaint to the European Commission, our message is as clear as ever: Restrictive cloud licensing practices harm businesses and undermine European competitiveness. To drive the next century of technology innovation and growth, regulators must act now to end these anti-competitive licensing practices that harm businesses.
Autopilot is an operational mode for Google Kubernetes Engine (GKE) that provides a fully managed environment and takes care of operational details, like provisioning compute capacity for your workloads. Autopilot allows you to spend more time on developing your own applications and less time on managing node-level details. This year, we upgraded Autopilot’s autoscaling stack to a fully dynamic container-optimized compute platform that rapidly scales horizontally and vertically to support your workloads. Simply attach a horizontal pod autoscaler (HPA) or vertical pod autoscaler (VPA) to your environment, and experience a fully dynamic platform that can scale rapidly to serve your users.
More and more customers, including Hotspring and Contextual AI, understand that Autopilot can dramatically simplify Kubernetes cluster operations and enhance resource efficiency for their critical workloads. In fact, in 2024, 30% of active GKE clusters were created in Autopilot mode. The new container-optimized compute platform has also proved popular with customers, who report rapid performance improvements in provisioning time. The faster GKE provisions capacity, the more responsive your workloads become, improving your customers’ experience and optimizing costs.
Today, we are pleased to announce that the best of Autopilot is now available in all qualified GKE clusters, not just dedicated Autopilot ones. Now, you can utilize Autopilot’s container-optimized compute platform and ease of operation from existing GKE clusters. It’s generally available, starting with clusters enrolled in the Rapid release channel and running GKE version 1.33.1-gke.1107000 or later. Most clusters will qualify and be able to access these new features as they roll out to the other release channels, except clusters enrolled in the Extended channel and those that use the older routes-based networking. To access these new features, enroll in the Rapid channel and upgrade your cluster version, or wait to be auto-upgraded.
Autopilot features are offered in Standard clusters via compute classes, which are a modern way to group and specify compute requirements for workloads in GKE. GKE now has two built-in compute classes, autopilot and autopilot-spot, that are pre-installed on all qualified clusters running on GKE 1.33.1-gke.1107000 or later and enrolled in the Rapid release channel. Running your workload on Autopilot’s container-optimized compute platform is as easy as specifying the autopilot (or autopilot-spot) compute class, like so:
Better still, you can make the Autopilot container-optimized compute platform the default for a namespace, a great way to save both time and money. You get efficient bin-packing, where the workload is charged for resource requests (and can even still burst!), rapid scaling, and you don’t have to plan your node shapes and sizes.
Here’s how to set Autopilot as your default for a namespace:
Pod sizes for the container-optimized compute platform start at 50 milli-CPU (that’s just 5% of 1 CPU core!), and can scale to 28vCPU. With the container-optimized compute platform you only pay for the resources your Pod requests, so you don’t have to worry about system overhead or empty nodes. Pods such as those larger than 28 vCPU or with specific hardware requirements can also run in Autopilot mode on specialized compute with node-based pricing via customized compute classes.
Run AI workloads on GPUs and TPUs with Autopilot
It’s easy to pair Autopilot’s container-optimized compute platform with specific hardware such as GPUs, TPUs and high-performance CPUs to run your AI workloads. You can run those workloads in the same cluster side by side Pods on the container-optimized compute platform. By choosing Autopilot mode for these AI workloads, you benefit from the Autopilot’s managed node properties, where we take a more active role in management. Furthermore, you also get our enterprise-grade privileged admission controls that require workloads to run in user-space, for better supportability, reliability and an improved security posture.
Here’s how to define your own customized compute class that runs in Autopilot mode with specific hardware, in this example a G2 machine type with NVIDIA L4s with two priority rules:
We’re also making compute classes work better with a new provisioning mode that automatically provisions resources for compute classes, without changing how other workloads are scheduled on existing node pools. This means you can now adopt the new deployment paradigm of compute class (including the new Autopilot-enabled compute classes) at your own pace, without affecting existing workloads and deployment strategies.
Until now, to use compute class in Standard clusters with automatic node provisioning, you needed to enable node auto-provisioning for the entire cluster. Node auto-provisioning has been part of GKE for many years, but it was previously an all-or-nothing decision — you couldn’t easily combine a manual node pool with a compute class provisioned by node auto-provisioning without potentially changing how workloads outside of the compute class were scheduled. Now you can, with our new automatically provisioned compute classes. All Autopilot compute classes use this system, so it’s easy to run workloads in Autopilot mode side-by-side with your existing deployments (e.g., on manual node pools). You can also enable this feature on any compute class starting with clusters in the Rapid channel running GKE version 1.33.3-gke.1136000 or later.
With the Autopilot mode for compute classes in Standard clusters, and the new automatic provisioning mode for all compute classes, you can now introduce compute class as an option to more clusters without impacting how any of your existing workloads are scheduled. Customers we’ve spoken to like this, as they can adopt these new patterns gradually for new workloads and by migrating existing ones, without needing to plan a disruptive switch-over.
Autopilot for all
At Google Cloud, we believe in the power of GKE’s Autopilot mode to simplify operations for your GKE clusters and make them more efficient. Now, those benefits are available to all GKE customers! To learn more about GKE Autopilot and how to enable it for your clusters, check out these resources.
The role of the data scientist is rapidly transforming. For the past decade, their mission has centered on analyzing the past to run predictive models that informed business decisions. Today, that is no longer enough. The market now demands that data scientists build the future by designing and deploying intelligent, autonomous agents that can reason, act, and learn on behalf of the enterprise.
This transition moves the data scientist from an analyst to an agentic architect. But the tools of the past — fragmented notebooks, siloed data systems, and complex paths to production — create friction that breaks the creative flow.
At Big Data London, we are announcing the next wave of data innovations built on an AI-native stack, designed to address these challenges. These capabilities help data scientists move beyond analysis to action by enabling them to:
Stop wasting time context-switching. We’re delivering a single, intelligent notebook environment where you can instantly use SQL, Python, and Spark together, letting you build and iterate in one place instead of fighting your tools.
Build agents that understand the real world. We’re giving you native, SQL-based access to the messy, real-time data — like live event streams and unstructured data — that your agents need to make smart, context-aware decisions.
Go from prototype to production in minutes, not weeks. We’re providing a complete ‘Build-Deploy-Connect’ toolkit to move your logic from a single notebook into a secure, production-grade fleet of autonomous agents.
Unifying the environment for data science
The greatest challenge of data science productivity is friction. Data scientists live in a state of constant, forced context-switching: writing SQL in one client, exporting data, loading it into a Python notebook, configuring a separate Spark cluster for heavy lifting, and then switching to a BI tool just to visualize results. Every switch breaks the creative “flow state” where real discovery happens. Our priority is to eliminate this friction by creating the single, intelligent environment an architect needs to engineer, build, and deploy — not just run predictive models.
Today, we are launching fundamental enhancements to Colab Enterprise notebooks in BigQuery and Vertex AI. We’ve added native SQL cells (preview), so you can now iterate on SQL queries and Python code in the same place. This lets you use SQL for data exploration and immediately pipe the results into a BigQuery DataFrame to build models in Python. Furthermore, rich interactive visualization cells (preview) automatically generate editable charts from your data to quickly assess the analysis. This integration breaks the barrier between SQL, Python, and visualization, transforming the notebook into an integrated development environment for data science tasks.
But an integrated environment is only half the solution; it must also be intelligent. This is the power of our Data Science Agent, which acts as an “interactive partner” inside Colab. Recent enhancements to this agent mean it can now incorporate sophisticated tool usage (preview) within its detailed plans, including the use of BigQueryML for training and inferencing, BigQueryDataFrames for analysis using Python, or large scale Spark transformations. This means your analysis gets more advanced, your demanding workloads are more cost-effective to run, and your models get into production quicker.
In addition, we are also making our Lightning Engine generally available. The Lightning Engine acceleratesSpark performance more than 4x compared to open-source Spark. And Lightning Engine is ML and AI-ready by default, seamlessly integrating with BigQuery Notebooks, Vertex AI, and VS Code. This means you can use the same accelerated Spark runtime across your entire workflow in any tool of choice — from initial exploration in a notebook to distributed training on Vertex AI. We’re also announcing advanced support for Spark 4.0 (preview), bringing its latest innovations directly to you.
Building agents that understand the real world
Agentic architects build systems that will sense and respond to the world in real time. This requires access to data that has historically been siloed in separate, specialized systems such as live event streams and unstructured data. To address this challenge we are making real-time streams and unstructured data more accessible for data science teams.
First, to process real-time data using SQL we are announcing stateful processing for BigQuery continuous queries (preview). In the past, it was difficult to ask questions about patterns over time using just SQL on live data. This new capability changes that. It gives your SQL queries a “memory,” allowing you to ask complex, state-aware questions. For example, instead of just seeing a single transaction, you can ask, “Has this credit card’s average transaction value over the last 5 minutes suddenly spiked by 300%?” An agent can now detect this suspicious velocity pattern — which a human analyst reviewing individual alerts would miss — and proactively trigger a temporary block on the card before a major fraudulent charge goes through. This unlocks powerful new use cases, from real-time fraud detection to adaptive security agents that learn and identify new attack patterns as they happen.
Second, we are removing the friction to build AI applications using a vector database, by helping data teams with autonomous embedding generation in BigQuery (preview) over multimodal data. Building on our BigQuery Vector Search capabilities, you no longer have to build, manage, or maintain a separate, complex data pipeline just to create and update your vector embeddings. BigQuery now takes care of this automatically as data arrives and as users search for new terms in natural language. This capability enables agents to connect user intent to enterprise data, and it’s already powering systems like the in-store product finder at Morrisons, which handles 50,000 customer searches on a busy day. Customers can use the product finder on their phones as they walk around the supermarket. By typing in the name of a product, they can immediately find which aisle a product is on and in which part of that aisle. The system uses semantic search to identify the specific product SKU, querying real-time store layout and product catalog data.
Trusted, production ready multi-agent development
When an analyst delivers a report and their job is done. When an architect deploys an autonomous application or agent, their job has just begun. This shift from notebook-as-prototype to agent-as-product introduces a critical new set of challenges: How do you move your notebook logic into a scalable, secure, and production-ready fleet of agents?
To solve this, we are providing a complete “Build-Deploy-Connect” toolkit for the agent architect. First, the Agent Development Kit (ADK) provides the framework to build, test, and orchestrate your logic into a fleet of specialized, production-grade agents. This is how you move from a single-file prototype to a robust, multi-agent system. And this agentic fleet doesn’t just find problems — it acts on them. ADK allows agents to ‘close the loop’ by taking intelligent, autonomous actions, from triggering alerts to creating and populating detailed case files directly in operational systems like ServiceNow or Salesforce.
A huge challenge until now was securely connecting these agents to your enterprise data, forcing developers to build and maintain their own custom integrations. To solve this, we launched first-party BigQuery tools directly integrated within ADK or via MCP. These are Google-maintained, secure tools that allow your agent to intelligently discover datasets, get table info, and execute SQL queries, freeing your team to focus on agent logic, not foundational plumbing. In addition, your agentic fleet can now easily connect to any data platform in Google Cloud using our MCP Toolbox. Available across BigQuery, AlloyDB, Cloud SQL, and Spanner, MCP Toolbox provides a secure, universal ‘plug’ for your agent fleet, connecting them to both the data sources and the tools they need to function.
This “Build-Deploy-Connect” toolkit also extends to the architect’s own workflow. While ADK helps agents connect to data, the architect (the human developer) needs to manage this system using a new primary interface: the command line (CLI). To eliminate the friction of switching to a UI for data tasks, we are integrating data tasks directly into the terminal with our new Gemini CLI extensions for Data Cloud (preview). Through the agentic Gemini CLI, developers can now use natural language to find datasets, analyze data, or generate forecasts — for example, you can simply state gemini bq “analyze error rates for ‘checkout-service'” — and even pipe results to local tools like Matplotlib, all without leaving your terminal.
Architecting the future
These innovations transform the impact data scientists can have within the organization. Using an AI-native stack we are now unifying the development environment in new ways, expanding data boundaries, and enabling trusted production ready development.
You can now automate tasks and use agents to become an agentic architect helping your organization to sense, reason, and act with intelligence. Ready to experience this transformation? Check out our new Data Science eBook with eight practical use cases and notebooks to get you started building today.
In June, Google introduced Gemini CLI, an open-source AI agent that brings the power of Gemini directly into your terminal. And today, we’re excited to announce open-source Gemini CLI extensions for Google Data Cloud services.
Building applications and analyzing trends with services like Cloud SQL, AlloyDB and BigQuery has never been easier — all from your local development environment! Whether you’re just getting started or a seasoned developer, these extensions make common data interactions such as app development, deployment, operations, and data analytics more productive and easier. So, let’s jump right in!
Using a Data Cloud Gemini CLI extension
Before you get started, make sure you have enabled the APIs and configured the IAM permissions required to access specific services.
To retrieve the newest functionality, install the latest release of the Gemini CLI (v0.6.0):
Replace <EXTENSION> with the name of the service you want to use. For example, alloydb, cloud-sql-postgresql or bigquery-data-analytics.
Before starting the Gemini CLI, you’ll need to configure the extension to connect with your Google Cloud project by adding the required environment variables. The table below provides more information on the configuration required.
Extension Name
Description
Configuration
alloydb
Create resources and interact with AlloyDB for PostgreSQL databases and data.
Now, you can start the Gemini CLI using command gemini. You can view the extensions installed with the command /extensions
You can list the MCP servers and tools included in the extension using command /mcp list
Using the Gemini CLI for Cloud SQL for PostgreSQL extension
The Cloud SQL for PostgreSQL extension lets you perform a number of actions. Some of the main ones are included below:
Create instance: Creates a new Cloud SQL instance for PostgreSQL (and also MySQL, or SQL Server)
List instances: Lists all Cloud SQL instances in a given project
Get instance: Retrieves information about a specific Cloud SQL instance
Create user: Creates a new user account within a specified Cloud SQL instance, supporting both standard and Cloud IAM users
Curious about how to put it in action? Like any good project, start with a solid written plan of what you are trying to do. Then, you can provide that project plan to the CLI as a series of prompts, and the agent will start provisioning the database and other resources:
After configuring the extension to connect to the new database, the agent can generate the required tables based on the approved plan. For easy testing, you can prompt the agent to add test data.
Now the agent can use the context it has to generate an API to make the data accessible.
As you can see, these extensions make it incredibly easy to start building with Google Cloud databases!
Using the BigQuery Analytics extensions
For your analytical needs, we are thrilled to give you a first look at the Gemini CLI extension for BigQuery Data Analytics. We are alsoexcited togiveaccess to the Conversational Analytics API through the BigQuery Conversational Analytics extension. This is the first step in our journey to bring the full power of BigQuery directly into your local coding environment, creating an integrated and unified workflow.
With this extension you can
Explore data: Use natural language to search for your tables.
Analyze: Ask business questions on the data and generate intelligent insights.
Dive deeper: Use conversational analytics APIs to dive deeper into the insights.
And extend: Use other tools or extensions to extend into advanced workflows like charting, reporting, code management, etc.
This initial release provides a comprehensive suite of tools to Gemini CLI:
Metadata tools: Discover and understand the BigQuery data landscape.
Query execution tool: Run any BigQuery query and get the results back, summarized to your console.
AI-powered forecasting: Leverage BigQuery’s built-in AI.Forecastfunction for powerful time-series predictions directly from the command line.
Deeper data Insights: The“ask_data_insights” tool provides access to server-side BigQuery agent for richer data insights.
And more …
[Note: To use the conversational analytics extension you need to enable additional APIs. Refer to documentation for additional info.]
Here is an example journey with analytics extensions:
Explore and analyze your data , e.g.,
code_block
<ListValue: [StructValue([(‘code’, ‘> find tables related to PyPi downloadsrn rn✦ I found the following tables related to PyPi downloads:rnrn * file_downloads: projects/bigquery-public-data/datasets/pypi/tables/file_downloadsrn * distribution_metadata: projects/bigquery-public-data/datasets/pypi/tables/distribution_metadata’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7fdcc86a8700>)])]>
code_block
<ListValue: [StructValue([(‘code’, ‘> Using bigquery-public-data.pypi.file_downloads show me top 10 downloaded pypi packages this month rnrn✦ Here are the top 10 most downloaded PyPI packages this month:rnrn 1. boto3: 685,007,866 downloadsrn 2. botocore: 531,034,851 downloadsrn 3. urllib3: 512,611,825 downloadsrn 4. requests: 464,595,806 downloadsrn 5. typing-extensions: 459,505,780 downloadsrn 6. certifi: 451,929,759 downloadsrn 7. charset-normalizer: 428,716,731 downloadsrn 8. idna: 409,262,986 downloadsrn 9. grpcio-status: 402,535,938 downloadsrn 10. aiobotocore: 399,650,559 downloads’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7fdcc5054a60>)])]>
Run deeper insights
Using “ask_data_insights” to trigger an agent on the BigQuery (Conversational analytics API) to answer your questions. The server side agent is smart enough to gather additional context about your data and offer deeper insights into your questions.
You can go further and generate charts and reports by mixing BigQuery data with your local tools. Here’s a prompt to try:
”using bigquery-public-data.pypi.file_downloads can you forecast downloads for the last four months of 2025 for package urllib3? Please plot a chart that includes actual downloads for the first 8 months, followed by the forecast for the last four months”
Get started today!
Ready to level up your Gemini CLI extensions for our Data Cloud services? Read more in theextensions documentation. Check out our templates and start building your own extensions to share with the community!
Public sector agencies are under increasing pressure to operate with greater speed and agility, yet are often hampered by decades of legacy data. Critical information, essential for meeting tight deadlines and fulfilling mandates, frequently lies buried within vast collections of unstructured documents. This challenge of transforming institutional knowledge into actionable insight is a common hurdle on the path to modernization.
The Indiana Department of Transportation (INDOT) recently faced this exact scenario. To comply with Governor Mike Braun’s Executive Order 25-13, all state agencies were given 30 days to complete a government efficiency report, mapping all statutory responsibilities to their core purpose. For INDOT, the critical information needed to complete this report was buried in a mix of editable and static documents – decades of policies, procedures, and manuals scattered across internal sites. A manual review was projected to take hundreds of hours, making the deadline nearly impossible. This tight deadline necessitated an innovative approach to data processing and report generation.
Recognizing a complex challenge as an opportunity for transformation, INDOT’s leadership envisioned an AI-powered solution. The agency chose to build its pilot program on its existing Google Cloud environment, which allowed it to deploy Gemini’s capabilities immediately. By taking this strategic approach, the team was able to turn a difficult compliance requirement into a powerful demonstration of government efficiency.
From manual analysis to an AI-powered pilot in one week
Operating in an agile week-long sprint, INDOT’s team built an innovative workflow centered on Retrieval-Augmented Generation (RAG). This technique enhances generative AI models by grounding them in specific, private data, allowing them to provide accurate, context-aware answers.
The technical workflow began with data ingestion and pre-processing. The team quickly developed Python scripts to perform “Extract, Transform, Load” (ETL) on the fly, scraping internal websites for statutes and parsing text from numerous internal files. This crucial step cleaned and structured the data for the next stage: indexing. Using Vertex AI Search, they created a robust, searchable vector index of the curated documents, which formed the definitive knowledge base for the generative model.
With the data indexed, the RAG engine in Vertex AI could efficiently retrieve the most relevant document snippets in response to a query. This contextual information was then passed to Gemini via Vertex AI. This two-step process was critical, as it ensured the model’s responses were based solely on INDOT’s official documents, not on public internet data.
Setting a new standard for government efficiency
Within an intensive, week-long effort, the team delivered a functioning pilot that generated draft reports across nine INDOT divisions with an impressive 98% fidelity – a measure of how accurately the new reports reflected the information in the original source documents. This innovative approach saved an estimated 360 hours of manual effort, freeing agency staff from tedious data collection to focus on the high-value work of refining and validating the reports. The solution enabled INDOT to become the largest Indiana state agency to submit its government efficiency report on time.
The government efficiency report was a novel experience for many on our executive team, demonstrating firsthand the transformative potential of large language models like Gemini. This project didn’t just help us meet a critical deadline; it paved the way for broader executive support of AI initiatives that will ultimately enhance our ability to serve Indiana’s transportation needs.
Alison Grand
Deputy Commissioner and Chief Legal Counsel, Indiana Department of Transportation
The AI-generated report framework was so effective that it became the official template for 60 other state agencies, powerfully demonstrating a responsible use of AI and building significant trust in INDOT as a leader in statewide policy. By building a scalable, secure RAG system on Google Cloud, INDOT not only met its tight deadline but also created a reusable model for future innovation, accelerating its mission to better serve the people of Indiana.
Join us at Google Public Sector Summit
To see Google’s latest AI innovations in action, and learn more about how Google Cloud technology is empowering state and local government agencies, register to attend the Google Public Sector Summit taking place on October 29 in Washington, D.C.
Editor’s note: Today’s post is by Syed Mohammad Mujeeb, CIO and Arsalan Mazhar, Head of Infrastructure,forJS Bank a prominent and rapidly growing midsize commercial bank in Pakistan with a strong national presence of over 293 branches. JS Bank, always at the forefront of technology, deployed a Google stack to modernize operations while maintaining security & compliance.
Snapshot:
JS Bank’s IT department, strained across 293 branches, was hindered by endpoint instability, a complex security stack, and a lack of device standardization. This reactive environment limited their capacity for innovation.
Through a strategic migration to a unified Google ecosystem—including ChromeOS, Google Workspace, and Google Cloud—the bank transformed its operations. The deployment of 1,500 Chromebooks resulted in a more reliable, secure, and manageable IT infrastructure. This shift cut device management time by 40% and halved daily support tickets, empowering the IT team to pivot from routine maintenance to strategic initiatives like digitization and AI integration.
Reduced IT Burden: reduced device management time by 40%
Daily support tickets were halved, freeing up IT time for strategic, value-added projects
Nearly 90% endpoint standardization, creating a manageable and efficient IT architecture
A simplified, powerful security posture with the built-in protection of ChromeOS and Google Workspace
At JS Bank, we pride ourselves as technology pioneers, always bringing new technology into banking. Our slogan, “Barhna Hai Aagey,” means we are always moving onward and upward. But a few years ago, our internal IT infrastructure was holding us back. We researched and evaluated different solutions, but found the combination of ChromeOS and Google Workspace, a perfect fit in today’s technology landscape which is surrounded by cyber threats. When we shifted to a unified Google stack, we paved the way for our future driven by AI, innovation, and operational excellence.
Before our transformation, our legacy solution was functional, but it was a constant struggle. Our IT team was spread thin across our 293 branches, dealing with a cumbersome setup that required numerous security tools, including antivirus, anti-malware, all layered on top of each other. Endpoints crashed frequently, and with a mixture of older devices and some devices running Ubuntu, we lacked the standardization needed for true efficiency and security. It was a reactive environment, and our team was spending too much time on basic fixes rather than driving innovation.
We decided to make a strategic change to align with our bank’s core mission of digitization, and that meant finding a partner with an end-to-end solution. We chose Google because we saw the value in their integrated ecosystem and anticipated the future convergence of public and private clouds. We deployed 1,500 Chromeboxes across branches and fully transitioned to Google Workspace.
Today, we have achieved nearly 90% standardization across our endpoints with Chromebooks and Chromeboxes, all deeply integrated with Google Workspace. This shift has led to significant improvements in security, IT management, and employee productivity. The built-in security features of the Google ecosystem provide peace of mind, especially during periods of heightened cybersecurity threats, as we feel that Google will inherently protect us from cyberattacks. This has simplified security protocols in branches, eliminating the need for multiple antivirus and anti-malware tools, giving our security team incredible peace of mind. Moreover, the lightweight nature of the Google solutions ensures applications are available from anywhere, anytime, and deployments in branches are simplified.
To strengthen security across all corporate devices, we made Chrome our required browser. This provides foundational protections like Safe Browsing to block malicious sites, browser reporting, and password reuse alerts. For 1,500 users, we adopted Chrome Enterprise Premium. This provides features like Zero-Trust enterprise security, centralized management, data loss prevention (DLP) to protect against accidental data loss, secure access to applications with context-aware access restrictions, and scans high-risk files.
With Google, our IT architecture is now manageable. The team’s focus has fundamentally shifted from putting out fires to supporting our customers and building value. We’ve seen a change in our own employees, too; the teams who once managed our legacy systems are now eager to work within the Google ecosystem. From an IT perspective, the results are remarkable: the team required to manage the ChromeOS environment has shrunk to 40%. Daily support tickets have been halved, freeing IT staff from hardware troubleshooting to focus on more strategic application support, enhancing their job satisfaction and career development. Our IT staff now enjoy less taxing weekends due to reduced work hours and a lighter operational burden.
Our “One Platform” vision comes to life
We are simplifying our IT architecture using Google’s ecosystem to achieve our “One Platform” vision. As a Google shop, we’ve deployed Chromebooks enterprise-wide and unified user access with a “One Window” application and single sign-on. Our “One Data” platform uses an Elastic Search data lake on Google Cloud, now being connected to Google’s LLMs. This integrated platform provides our complete AI toolkit—from Gemini and NotebookLM to upcoming Document and Vision AI. By exploring Vertex AI, we are on track to become the region’s most technologically advanced bank by 2026.
Our journey involved significant internal change, but by trusting the process and our partners, we have built a foundation that is not only simpler and more secure but is also ready for the next wave of innovation. We are truly living our mission of moving onward and upward.
As a Python library for accelerator-oriented array computation and program transformation, JAX is widely recognized for its power in training large-scale AI models. But its core design as a system for composable function transformations unlocks its potential in a much broader scientific landscape. Following our recent post on solving high-order partial differential equations, or PDEs, we’re excited to highlight another frontier where JAX is making a significant impact: AI-driven protein engineering.
I recently spoke with April Schleck and Nick Boyd, two co-founders of Escalante, a startup using AI to train models that predict the impact of drugs on cellular protein expression levels. Their story is a powerful illustration of how JAX’s fundamental design choices — especially its functional and composable nature — are enabling researchers to tackle multi-faceted scientific challenges in ways that are difficult to achieve with other frameworks.
A new approach to protein design
April and Nick explained that Escalante’s long-term vision is to train machine learning (ML) models that can design drugs from the ground up. Unlike fields like natural language processing, which benefit from vast amounts of public data, biology currently lacks the specific datasets needed to train models that truly understand cellular systems. Thus, their immediate focus is to solve this data problem by using current AI tools to build new kinds of lab assays that can generate these massive, relevant biological datasets.
This short-term mission puts them squarely in the field of protein engineering, which they described as a complex, multi-objective optimization problem. When designing a new protein, they aren’t just optimizing one thing; it needs to bind to a specific target, while also being soluble, thermostable, and expressible in bacteria. Each of these properties is predicted by a different ML model (see figure below), ranging from complex architectures like AlphaFold 2(implemented in JAX) to simpler, custom-trained models. Their core challenge is to combine all these different objectives into a single optimization loop.
This is where, as April put it, “JAX became a game-changer for us.” She noted that while combining many AI models might be theoretically possible in other frameworks, JAX’s functional nature makes it incredibly natural to integrate a dozen different ones into a single loss function (see figure below).
Easily combine multiple objectives represented by different loss terms and models
In the above code, Nick explained that there are at least two different ways models are being combined — some loss terms that are being linearly combined (e.g. the AF loss + the ESM pseudo log likelihood loss), and some terms where models are being composed serially (e.g., in the first Boltz-1 term we first fold the sequence with Boltz-1 and then compute the sequence likelihood after inverse folding with another model, ProteinMPNN).
To make this work, they embraced the JAX ecosystem, even translating models from PyTorch themselves — a prime example being their JAX translation of the Boltz-2 structure prediction model.
This approach gives what April called an “expressive language for protein design,” where models can be composed, added, and transformed to define a final objective. April said that the most incredible part is that this entire, complex graph of models “can be wrapped in a single jax.jit call that gives great performance” — something they found very difficult to do in other frameworks.
Instead of a typical training run that optimizes a model’s weights, their workflow inverts the process to optimize the input itself, using a collection of fixed, pre-trained neural networks as a complex, multi-objective loss function. The approach is mechanically analogous to Google’s DeepDream. Just as DeepDream takes a fixed, pre-trained image classifier and uses gradient ascent to iteratively modify an input image’s pixels to maximize a chosen layer’s activation, Escalante’s method starts with a random protein sequence. This sequence is fed through a committee of “expert” models — each one a pre-trained scorer for a different desirable property, like binding affinity or stability. The outputs from all the models are combined into a single, differentiable objective functional. They then calculate a gradient of this final score with respect to the input sequence via backpropagation. An optimizer then uses this gradient to update the sequence, nudging it in a direction that better satisfies the collective requirements of all the models. This cycle repeats, evolving the random initial input into a novel, optimized protein sequence that the entire ensemble of models “believes” is ideal.
Nick said that the choice of JAX was critical for this process. Its ability to compile and automatically differentiate complex code makes it ideal for optimizing the sophisticated loss functions at the heart of their work with Escalante’s library of tools for their protein design work, Mosaic. Furthermore, the framework’s native integration with TPU hardware via the XLA compiler allowed them to easily scale these workloads.
Escalante is sampling many potential protein designs for solving a problem (by optimizing the loss function). Each sampling job might generate 1K – 50K potential designs, which are then ranked and filtered. By the end of the process, they test only about 10 designs in the wet lab. This has led them to adopt a unique infrastructure pattern. Using Google Kubernetes Engine) (GKE), they instantly spin up 2,000 to 4,000 spot TPUs, run their optimization jobs for about half an hour, and then shut them all down.
Nick also shared the compelling economics driving this choice. Given current spot pricing, adopting Cloud TPU v6e (Trillium) over an H100 GPU translated to a gain of 3.65x in performance per dollar for their large-scale jobs. He stressed that this cost-effectiveness is critical for their long-term goal of designing protein binders against the entire human proteome, a task that requires immense computational scale.
To build their system, they rely on key libraries within the JAX ecosystem like Equinox and Optax. Nick prefers Equinox because it feels like “vanilla JAX,” calling its concept of representing a model as a simple PyTree “beautiful and easy to reason about.” Optax, meanwhile, gives them the flexibility to easily swap in different optimization algorithms for their design loops.
They emphasized that this entire stack — JAX’s functional core, its powerful ecosystem libraries, and the scalable TPU hardware — is what makes their research possible.
We are excited to see community contributions like Escalante’s Mosaic library, which contains the tools for their protein design work and is now available on GitHub. It’s a fantastic addition to the landscape of JAX-native scientific tools.
Stories like this highlight a growing trend: JAX is much more than a framework for deep learning. Its powerful system of program transformations, like grad and jit, makes it a foundational library for the paradigm of differentiable programming, empowering a new generation of scientific discovery. The JAX team at Google is committed to supporting and growing this vibrant ecosystem, and that starts with hearing directly from you.
Share your story: Are you using JAX to tackle a challenging problem?
Help guide our roadmap: Are there new features or capabilities that would unlock your next breakthrough?
Your feature requests are essential for guiding the evolution of JAX. Please reach out to the team to share your work or discuss what you need from JAX via GitHub.
Our sincere thanks to April and Nick for sharing their insightful journey with us. We’re excited to see how they and other researchers continue to leverage JAX to solve the world’s most complex scientific problems.
AtDeutsche Bank Research, the core mission of our analysts is delivering original, independent economic and financial analysis. However, creating research reports and notes relies heavily on a foundation of painstaking manual work. Or at least that was the case until generative AI came along.
Historically, analysts would sift through and gather data from financial statements, regulatory filings, and industry reports. Then, the true challenge begins — synthesizing this vast amount of information to uncover insights and findings. To do this, they have to build financial models, identify patterns and trends, and draw connections between diverse sources, past research, and the broader global context.
As analysts need to work as quickly as possible to bring valuable insights to market, this time-consuming process can limit the depth of analysis and the range of topics they can cover.
Our goal was to enhance the research analystexperience and reduce the reliance on manual processes and outsourcing. We created DB Lumina — an AI-powered research agent that helps automate data analysis, streamline workflows, and deliver more accurate and timely insights – all while maintaining the stringent data privacy requirements for the highly regulated financial sector.
“The adoption of the DB Lumina digital assistant by hundreds of research analysts is the culmination of more than 12 months of intense collaboration between dbResearch, our internal development team, and many others. This is just the start of our journey, and we are looking forward to building on this foundation as we continue to push the boundaries of how we responsibly use AI in research production to unlock exciting new innovations across our expansive coverage areas.” – Pam Finelli, Global COO for Investment Research at Deutsche Bank
DB Lumina has three key features that transform the research experience for analysts and enhance productivity through advanced technologies.
1. Gen AI-powered chat
DB Lumina’s core conversational interface enables analysts to interact with Google’s state-of-the-art AI foundation models , including the multimodal Gemini models. They can ask questions, brainstorm ideas, refine writing, and even generate content in real time. Additionally, the chat capability supports uploading and querying documents conversationally, leveraging prior chat history to revisit and continue previous sessions. DB Lumina can help with tasks like summarization, proofreading, translation, and content drafting with precision and speed. In addition, we implemented guardrailing techniques to ensure the generation of compliant and reliable outputs.
2. Prompt templates
Prompt Templates offer pre-configured instructions tailored for document processing with consistent, high-quality outcomes. These templates enable analysts to facilitate the summarization of large documents, extraction of key data points, and the creation of reusable workflows for repetitive tasks. They can be customized for specific roles or business needs, and standardized across teams. Analysts can also save and share templates, ensuring more streamlined operations and enhanced collaboration. This functionality is made possible by Google’s long context window combined with advanced prompting techniques, which also provide citations for verification.
3. Knowledge
DB Lumina integrates a Retrieval-Augmented Generation (RAG) architecture that grounds responses in enterprise knowledge sources, such as internal research, external unstructured data (such as SEC filings), and other document repositories. The agent enhances transparency and accuracy by providing inline citations and source viewers for fact-checking. It also implements controlled access to confidential data with audit logging and explainability features, ensuring secure and trustworthy operations. Using advanced RAG architecture, supported by Google Cloud technologies, enables us to bring generative capabilities to enterprise knowledge resources to give analysts access to the latest, most relevant information when creating research reports and notes.
DB Lumina architecture
DB Lumina was designed to enhance Deutsche Bank Research’s productivity by enabling document ingestion, content summarization, Q&A, and editing.
Built on Google Cloud, the architecture leverages the following services:
All of DB Lumina’s AI capabilities are implemented with guardrails to ensure safe and compliant interactions. We also handle logging and monitoring with Google Cloud’s Observability suite, with prompt interactions stored in Cloud Storage and queried through BigQuery. To manage authentication, we use Identity as a Service integrated with Azure AD, and centralize authorization through dbEntitlements.
RAG and document ingestion
When DB Lumina processes and indexes documents, it splits them into chunks and creates embeddings using APIs like Gemini Embeddings API. It then stores these embeddings in a vector database like Vertex AI Vector Search or the pgvector extension on Cloud SQL. Raw text chunks are stored separately, for example, in Datastore or Cloud Storage.
These diagrams below show the typical RAG and ingestion patterns:
Overview of the agent.
When an analyst submits a query, the system then routes it through a query engine. A Python application leverages an LLM API (Gemini 2.0 and 2.5) and retrieves relevant document snippets based on the query, providing context that is then used by the model to generate a relevant response. The sources indicate experimentation with different retrievers, including one using the pgvector extension on Cloud SQL for PostgreSQL, and one based on Vertex AI Search.
User interface
Using sliders in DB Lumina’s interface, users can easily adjust various parameters for summarization, including verbosity, data density, factuality, structure, reader perspective, flow, and individuality. The interface also includes functionality for providing feedback on summaries.
An evaluation framework for gen AI
Evaluating gen AI applications and agents like DB Lumina requires a custom framework due to the complexity and variability of model outputs. Traditional metrics and generic benchmarks often fail to capture the needs for gen AI features, the nuanced expectations of domain-specific users, and the operational constraints of enterprise environments. This necessitates a new set of gen AI metrics to accurately measure performance.
The DB Lumina evaluation framework employs a rich and extensible set of both industry-standard and custom-developed metrics, which are mapped to defined categories and documented in a central metric dictionary to ensure consistency across teams and features. Standard metrics like accuracy, completeness, and latency are foundational, but they are augmented with custom metrics, such as citation precision and recall, false rejection rates, and verbosity control — each tailored to the specific demands and regulatory requirements of financial research and document-grounded generation. Popular frameworks like Ragas also provide a solid foundation for assessing how well our RAG system grounds its responses in retrieved documents and avoids hallucinations.
In addition, test datasets are carefully curated to reflect a wide range of real-world scenarios, edge cases, and potential biases across DB Lumina’s core features like chat, document Q&A, templates, and RAG-based knowledge retrieval. These datasets are version-controlled and regularly updated to maintain relevance as the tool evolves. Their purpose is to provide a stable benchmark for evaluating model behavior under controlled conditions, enabling consistent comparisons across optimization cycles.
Evaluation is both quantitative and qualitative, combining automated scoring with human review for aspects like tone, structure, and content fidelity. Importantly, the framework ensures each feature is assessed for correctness, usability, efficiency, and compliance while enabling the rapid feedback and robust risk management needed to support iterative optimization and ongoing performance monitoring. We compare current metric outputs against historical baselines, leveraging stable test sets, Git hash tracking, and automated metric pipelines to support proactive interventions to ensure that performance deviations are caught early and addressed before they impact users or compliance standards.
This layered approach ensures that DB Lumina is not only accurate and efficient but also aligned with Deutsche Bank’s internal standards, achieving a balanced and rigorous evaluation strategy that supports both innovation and accountability.
Bringing new benefits to the business
We developed an initial pilot for DB Lumina with Google Cloud Consulting, creating a simple prototype early in the use case development that used only embeddings without prompts. Though it was later surpassed by later versions, this pilot informed the subsequent development of DB Lumina’s RAG architecture.
The project transitioned then through our development and application testing environments to our production deployment, eventually going live in September 2024. Currently, DB Lumina is already in the hands of around 5,000 users across Deutsche Bank Research, specifically in divisions like Investment Bank Origination & Advisory and Fixed Income & Currencies. We plan to roll it out to more than 10,000 users across corporate banking and other functions by the end of the year.
DBLumina is expected to deliver significant business benefits for Deutsche Bank:
Time savings: Analysts reported significant time savings, saving 30 to 45 minutes on preparing earnings note templates and up to two hours when writing research reports and roadshow updates.
Increased analysis depth: One analyst increased the analysis in an earnings report by 50%, adding additions sections by region and activity, as well as a summary section for forecast changes. This was achieved through summarization of earnings releases and investor transcripts and subsequent analysis through conversational prompts.
New analysis opportunities: DB Lumina has created new opportunities for teams to analyze new topics. For example, the U.S. and European Economics teams use DB Lumina to score central bank communications to assess hawkishness and dovishness over time. Another analyst was able to analyze and compare budget speeches from eight different ministries, tallying up keywords related to capacity constraints and growth orientation to identify shifts in priorities.
Increased accuracy: Analysts have also started using DB Lumina as part of their editing process. One supervisory analyst noted that since the rollout, there has been a noted improvement in the editorial and grammatical accuracy across analyst notes, especially from non-native English speakers.
Building the future of gen AI and RAG in finance
We’ve seen the power of RAG transform how financial institutions interact with their data. DB Lumina has proved the value of combining retrieval, gen AI, and conversational AI, but this is just the start of our journey. We believe the future lies in embracing and refining the “agentic” capabilities that are inherent in our architecture. We envision building and orchestrating a system where various components act as agents — all working together to provide intelligent and informed responses to complex financial inquiries.
To support our vision moving forward, we plan to deepen agent specialization within our RAG framework, building agents designed to handle specific types of queries or tasks across compliance, investment strategies, and risk assessment. We also want to incorporate the ReAct (Reasoning and Acting) paradigm into our agents’ decision-making process to enable them to not only retrieve information but also actively reason, plan actions, and refine their searches to provide more accurate and nuanced answers.
In addition, we’ll be actively exploring and implementing more of the tools and services available within Vertex AI to further enhance our AI capabilities. This includes exploring other models for specific tasks or to achieve different performance characteristics, optimizing our vector search infrastructure, and utilizing AI pipelines for greater efficiency and scalability across our RAG system.The ultimate goal is to empower DB Lumina to handle increasingly complex and multi-faceted queries through improved context understanding, ensuring it can accurately interpret context like previous interactions and underlying financial concepts. This includes moving beyond simple question answers to providing analysis and recommendations based on retrieved information. To enhance DB Lumina’s ability to provide real-time information and address queries requiring up-to-date external data, we are planning to integrate a feature for grounding responses with internet-based information.
By focusing on these areas, we aim to transform DB Lumina from a helpful information retriever into a powerful AI agent capable of tackling even the most challenging financial inquiries. This will unlock new opportunities for improved customer service, enhanced decision-making, and greater operational efficiency for financial institutions. The future of RAG and gen AI in finance is bright, and we’re excited to be at the forefront of this transformative technology.
Today, we are excited to announce the 2025 DORA Report: State of AI-assisted Software Development. Drawing on insights from over 100 hours of qualitative data and survey responses from nearly 5,000 technology professionals from around the world.
The report reveals a key insight: AI doesn’t fix a team; it amplifies what’s already there. Strong teams use AI to become even better and more efficient. Struggling teams will find that AI only highlights and intensifies their existing problems. The greatest return comes not from the AI tools themselves, but from a strategic focus on the quality of internal platforms, the clarity of workflows, and the alignment of teams.
AI, the great amplifier
As we established from the 2024 report as well as the special report published this year called “Impact of Generative AI in Software Development”, organizations are continuing to heavily adopt AI and receive substantial benefits across important outcomes. And there is evidence of learning to better integrate these tools into our workflow. Unlike last year, we observe a positive relationship between AI adoption on both software delivery throughput and product performance. It appears that people, teams, and tools are learning where, when, and how AI is most useful. However, AI adoption does continue to have a negative relationship with software delivery stability.
This confirms our central theory – AI accelerates software development, but that acceleration can expose weaknesses downstream. Without robust control systems, like strong automated testing, mature version control practices, and fast feedback loops, an increase in change volume leads to instability. Teams working in loosely coupled architectures with fast feedback loops see gains, while those constrained by tightly coupled systems and slow processes see little or no benefit.
Key findings from the 2025 report
Beyond this central theme, this year’s research highlighted the following about modern software development:
AI adoption is near-universal: 90% of survey respondents report using AI at work. More than 80% believe it has increased their productivity. However, skepticism remains as 30% report little or no trust in the code generated by AI, a slightly lower percentage than last year but a key trend to note.
User-centricity is a prerequisite for AI success: AI becomes most useful when it’s pointed at a clear problem, and a user-centric focus provides that essential direction. Our data shows this focus amplifies AI’s positive influence on team performance.
Platform engineering is the foundation: Our data shows that 90% of organizations have adopted at least one platform and there is a direct correlation between a high quality internal platform and an organization’s ability to unlock the value of AI, making it an essential foundation for success.
The seven team archetypes
Simple software delivery metrics alone aren’t sufficient. They tell you what is happening but not why it’s happening. To connect performance data to experience, we conducted a cluster analysis that reveals seven common team profiles or archetypes, each with a unique interplay of performance, stability, and well-being. This model provides leaders with a way to diagnose team health and apply the right interventions.
The ‘Foundational challenges’ group are trapped in survival mode and face significant gaps in their processes and environment, leading to low performance, high system stability, and high levels of burnout and friction. While the ‘Harmonious high achievers’ excel across multiple areas, showing positive metrics for team well-being, product outcomes, and software delivery.
Read more details of each archetype in the “Understanding your software delivery performance: A look at seven team profiles” chapter of the report.
Unlocking the value of AI with the ‘DORA AI Capabilities Model’
This year, we went beyond identifying AI’s impact to investigating the conditions in which AI-assisted technology-professionals realize the best outcomes. The value of AI is unlocked not by the tools themselves, but by the surrounding technical practices and cultural environment.
Our research identified seven capabilities that are shown to magnify the positive impact of AI in organizations.
Where leaders should get started
One of the key insights derived from the research this year is that the value of AI will be unlocked by reimagining the system of work it inhabits. Technology leaders should treat AI adoption as an organizational transformation.
Here’s where we suggest you begin:
Clarify and socialize your AI policies
Connect AI to your internal context
Prioritize foundational practices
Fortify your safety nets
Invest in your internal platform
Focus on your end-users
The DORA research program is committed to serving as a compass to teams and organizations as we navigate the important and transformative period with AI. We hope the new team profiles and the DORA AI capabilities model provide a clear roadmap for you to move beyond simply adopting AI to unlocking its value by investing in teams and people. We look forward to learning how you put these insights into practice. To learn more:
Artificial intelligence is rapidly transforming software development. But simply adopting AI tools isn’t a guarantee of success. Across the industry, tech leaders and developers are asking the same critical questions: How do we move from just using AI to truly succeeding with it? How do we ensure our investment in AI delivers better, faster, and more reliable software?
The DORA research team has developed the inaugural DORA AI Capabilities Model to provide data-backed guidance for organizations grappling with these questions. This is not just another report on AI adoption trends; it is a guide to the specific technical and cultural practices that amplify the benefits of AI.
The DORA AI Capabilities Model: 7 levers of success
We developed the DORA AI Capabilities Model through a three-phase process. First, we identified and prioritized a wide-range of candidate capabilities based on 78 in-depth interviews, existing literature, and perspectives from leading subject-matter experts. Second, we developed and validated survey questions to ensure they were clear, reliable, and measured each capability accurately. Lastly, we evaluated the impact of a subset of these candidates using the rigorous methodology of designing and analyzing our annual survey—which reached almost 5,000 respondents. The analysis identified seven capabilities that substantially either amplify or unlock the benefits of AI:
Clear and communicated AI stance: Your organization’s position on AI-assisted tools must be clear and well-communicated.This includes clarity on expectations for AI use, support for experimentation, and which tools are permitted. Our research indicates that a clear AI stance amplifies AI’s positive impact on individual effectiveness and organizational performance, and can reduce friction for employees. Importantly, this capability does not measure the specific content of AI use policies, meaning organizations can achieve this capability regardless of their unique stance—as long as that stance is clear and communicated.
Healthy data ecosystems: The quality of your internal data is critical to AI success. A healthy data ecosystem, characterized by high-quality, easily accessible, and unified internal data, substantially amplifies the positive influence of AI adoption on organizational performance.
AI-accessible internal data: Connecting AI tools to internal data sources boosts their impact on individual effectiveness and code quality.Providing AI with company-specific context allows it to move beyond a general-purpose assistant into a highly specialized and valuable tool for your developers.
Strong version control practices: With the increased volume and velocity of code generation from AI, strong version control practices are more crucial than ever. Our research shows a powerful connection between mature version control habits and AI adoption. Specifically, frequent commits amplify AI’s positive influence on individual effectiveness, while the frequent use of rollback features boosts the performance of AI-assisted teams.
Working in small batches: Working in small batches, a long-standing DORA principle, is especially powerful in an AI-assisted environment.This practice amplifies the positive influence of AI on product performance and reduces friction for development teams.
User-centric focus: A deep focus on the end-user’s experience is paramount for teams utilizing AI. Our findings show that a user-centric focus amplifies the positive influence of AI on team performance. Importantly, we also found that in the absence of a user-centric focus, AI adoption can have a negative impact on team performance. When users are at the center of strategy, AI can help propel teams in the right direction. But, when users aren’t the focus, AI-assisted development teams may just be moving quickly in the wrong direction.
Quality internal platforms: Quality internal platforms provide the shared capabilities needed to scale the benefits of AI across an organization.In organizations with quality internal platforms, AI’s positive influence on organizational performance is amplified.
Putting the DORA AI Capabilities Model into practice
To successfully leverage AI in software development, it’s not enough to simply adopt new tools. Organizations must foster the right technical and cultural environment for AI-assisted developers to thrive. Based on our seven inaugural DORA AI Capabilities we recommend that organizations seeking to maximize the benefits of their AI adoption:
Clarify and socialize your AI policies: Ambiguity about what is acceptable stifles adoption and creates risk. Establish and clearly communicate your policy on permitted AI tools and usage to build developer trust and provide the psychological safety needed for effective experimentation.
Treat your data as a strategic asset: The benefits of AI on organizational performance are significantly amplified by a healthy data ecosystem. Invest in the quality, accessibility, and unification of your internal data sources.
Connect AI to your internal context: Move beyond generic AI assistance by investing the engineering effort to give your AI tools secure access to internal documentation, codebases, and other data sources. This provides the necessary company-specific context for maximal effectiveness.
Double-down on known best practices, like working in manageable increments: Enforce the discipline of working in small batches to improve product performance and reduce friction for AI-assisted teams.
Prioritize user-centricity: AI-assisted development tools can help developers produce, debug, and review code more quickly. But, if the core product strategy doesn’t center the needs of the end-user, then more code won’t mean more value to the organization. Explicitly centering user needs is a North Star for orienting AI-assisted teams toward the realization of a shared goal.
Embrace and fortify your safety nets: As AI increases the velocity of changes, your version control system becomes a critical safety net. Encourage teams to become highly proficient in using rollback and revert features.
Invest in your internal platform: A quality internal platform provides the necessary guardrails and shared capabilities that allow the benefits of AI to scale effectively and securely across your organization.
DORA’s research has long held that even the best tools and teams can’t succeed without the right organizational conditions. The findings of our inaugural DORA AI Capabilities Model are a reminder of this fact and suggest that successful AI-assisted development isn’t just a purchasing decision; it’s a decision to cultivate the conditions where AI-assisted developers thrive. Investing in these seven capabilities is an important step toward creating an environment where AI-assisted software development succeeds, leading to enhanced outcomes for your developers, your products, and your entire organization.
To explore the DORA AI Capabilities Model in more detail and to access our full 2025 DORA State of AI-Assisted Software Development, please visit the DORA website.
Getting ahead — and staying ahead — of the demand for AI skills isn’t just key for those looking for a new role. Research shows proving your skills through credentials drives promotion, salary increase, leadership opportunities and more. And 8 in 10 Google Cloud learners feel our training helps them stay ahead in the age of AI.1 This is why we are so focused on providing new AI training content ensuring you have the tools to keep up in this ever-evolving space.
That’s why I’m thrilled to announce a new suite of Google Cloud AI training courses. These courses are designed with intermediate and advanced technical learners in mind for roles such as Cloud Infrastructure Engineers, Cloud Architects, AI Engineers and MLOps Engineers, AI Developers and Data Scientists. Whether you’re looking to build and manage powerful AI infrastructure, master the art of fine-tuning generative AI models, leverage serverless AI inference, or secure your AI deployments, we’ve got you covered.
For cloud infrastructure engineers, cloud architects, AI engineers and MLOps engineers:
AI infrastructure mini coursesare your guide to designing, deploying and managing the high-performance infrastructure that powers modern AI. You’ll gain a deep understanding of Google’s TPU and GPU platforms, and learn to use Google Compute Engine (GCE) and Google Kubernetes Engine (GKE) as a robust foundation for any AI workload you can imagine.
For machine learning engineers, data scientists and AI developers:
Build AI Agents with Databases on Google Cloudteaches you how to securely connect AI agents to your existing enterprise databases. You’ll learn to craft agents that perform intelligent querying and semantic search, design and implement advanced multi-step workflows, and deploy and operationalize these powerful AI applications. This course is essential for building robust and reliable AI agents that can leverage your most critical data.
Supervised fine-tuning for Gemini educates you on how to take Google’s powerful models and make them your own by customizing them for your specific tasks, enhancing their quality and efficiency so they deliver precisely what you and your users need.
Cloud Run for AI Inference teaches you how to deploy those innovations with incredible speed and scale of serverless AI workloads. You’ll learn how to handle demanding AI workloads, including lightweight LLMs, and leverage GPU acceleration, ensuring your creations reach your audience efficiently and reliably.
Security engineers, security analysts:
Model Armor: Securing AI Deployments equips you with the knowledge to protect your generative AI applications from critical risks like data leakage and prompt injection. It’s the essential step to ensuring your innovations can be leveraged with confidence.
For individual developers, business analysts, and other non-technical users:
Develop AI-Powered Prototypes in Google AI Studioshows you how to use Google AI Studio, our developer playground for the Gemini API, to quickly sketch and test your ideas. Through hands-on labs and tutorials, you’ll learn how to prototype apps with little upfront setup and create custom models without needing extensive coding expertise. It’s the perfect way to turn a concept into a working model, ensuring your final structure is built on a tested and innovative design.
Start learning
Building a career in AI is about creating a future where you feel empowered and prepared, no matter how the landscape changes. We believe these courses provide the tools and the confidence to do just that.
Google Kubernetes Engine (GKE) is a powerful platform for orchestrating scalable AI and high-performance computing (HPC) workloads. But as clusters grow and jobs become more data-intensive, storage I/O can become a bottleneck. Your powerful GPUs and TPUs can end up idle, while waiting for data, driving up costs and slowing down innovation.
Google Cloud Managed Lustre is designed to solve this problem. Many on-premises HPC environments already use parallel file systems, and Managed Lustre makes it easier to bring those workloads to the cloud. With its managed Container Storage Interface (CSI) driver, Managed Lustre and GKE operations are fully integrated.
Optimizing your move to a high-performance parallel file system can help you get the most out of your investment from day one.
Before deploying, it’s helpful to know when to use Managed Lustre versus other options like Google Cloud Storage. For most AI and ML workloads, Managed Lustre is the recommended solution. It excels in training and checkpointing scenarios that require very low latency (less than a millisecond) and high throughput for small files, which keeps your expensive accelerators fully utilized. For data archiving or workloads with large files (over 50 MB) that can tolerate higher latency, Cloud Storage FUSE with Anywhere Cache can be another choice.
Based on our work with early customers and the learnings from our teams, here are five best practices to ensure you get the most out of Managed Lustre on GKE.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x7f48cffec4c0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
1. Design for data locality
For performance-sensitive applications, you want your compute resources and storage to be as close as possible, ideally within the same zone in a given region. When provisioning volumes dynamically, the volumeBindingMode parameter in your StorageClass is your most important tool. We strongly recommend setting it to WaitForFirstConsumer. GKE provides a built-in StorageClass for Managed Lustre that uses WaitForFirstConsumer binding mode by default.
Why it’s a best practice: Using WaitForFirstConsumer instructs GKE to delay the provisioning of the Lustre instance until a pod that needs it is scheduled. The scheduler then uses the pod’s topology constraints (i.e., the zone it’s scheduled in) to create the Lustre instance in that exact same zone. This guarantees co-location of your storage and compute, minimizing network latency.
2. Right-size your performance with tiers
Not all high-performance workloads are the same. Managed Lustre offers multiple performance tiers (read and write throughput in MB/s per TiB of storage) so you can align cost directly with your performance requirements.
1000 & 500 MB/s/TiB: Ideal for throughput-critical workloads like foundation model training or large-scale physics simulations where I/O bandwidth is the primary bottleneck.
250 MB/s/TiB: A balanced, cost-effective tier great for many general HPC workloads and AI inference serving, and data-heavy analytics pipelines.
125 MB/s/TiB: Best for large-capacity use cases where having a massive, POSIX-compliant file system is more important than achieving peak throughput. This is also useful for migrating on-premises containerized applications without modification,making it easier to migrate on-premises workloads to the cloud storage.
Why it’s a best practice: Defaulting to the highest tier isn’t always the most cost-effective strategy. By analyzing your workload’s I/O profile, you can significantly optimize your total cost of ownership.
3. Master your networking foundation
A parallel file system is a network-attached resource. Getting the networking right up front will save you days of troubleshooting. Before provisioning, ensure your VPC is correctly configured by following the setup steps in our documentation. This involves three key steps detailed in our documentation:
Enable Service Networking.
Create an IP range for VPC peering.
Create a firewall rule to allow traffic from that range on the Lustre network port (TCP 988 or 6988).
Why it’s a best practice: This is a one-time setup per VPC that establishes the secure peering connection that allows your GKE nodes to communicate with the Managed Lustre service.
4. Use dynamic provisioning for simplicity, static for long-lived shared data
The Managed Lustre CSI driver supports two modes for connecting storage to your GKE workloads.
Dynamic provisioning: Use when your storage is tightly coupled to the lifecycle of a specific workload or application. By defining a StorageClass and PersistentVolumeClaim (PVC), GKE will automatically manage the Lustre instance lifecycle for you. This is the simplest, most automated approach.
Static provisioning: Use when you have a long-lived Lustre instance that needs to be shared across multiple GKE clusters and jobs. You create the Lustre instance once, then create a PersistentVolume (PV) and PVC in your cluster to mount it. This decouples the storage lifecycle from any single workload.
Why it’s a best practice: Thinking about your data’s lifecycle helps you choose the right pattern. Use dynamic provisioning as your default because of simplicity, and opt for static provisioning when you need to treat your file system as a persistent, shared resource across your organization.
5. Architecting for parallelism with Kubernetes Jobs
Many AI and HPC tasks, like data preprocessing or batch inference, are suited for parallel execution. Instead of running a single, large pod, use the Kubernetes Job resource to divide the work across many smaller pods.
Consider this pattern:
Create a single PersistentVolumeClaim for your Managed Lustre instance, making it available to your cluster.
Define a Kubernetes job with parallelism set to a high number (e.g., 100).
Each pod created by the Job mounts the same Lustre PVC.
Design your application so that each pod works on a different subset of the data (e.g., processing a different range of files or data chunks).
Why it’s a best practice: In this pattern, you create a single PVC for your Lustre instance and have each pod created by the Job mount that same PVC. By designing your application so that each pod works on a different subset of the data, you turn your GKE cluster into a powerful, distributed data processing engine. The GKE Job controller acts as the parallel task orchestrator, while Managed Lustre serves as the high-speed data backbone, allowing you to achieve massive aggregate throughput.
Get started today
By combining the orchestration power of GKE with the performance of Managed Lustre, you can build a truly scalable and efficient platform for AI and HPC. Following these best practices will help you create a solution that is not only powerful, but also efficient, cost-effective, and easy to manage.
As cloud infrastructure evolves, so should how you safeguard that technology. As part of our efforts to help you maintain a strong security posture, we’ve introduced powerful capabilities that can address some of the thorniest challenges faced by IT teams who work with Google Compute Engine (GCE) virtual machines and Google Kubernetes Engine (GKE) containers.
Infrastructure administrators face critical security challenges such as publicly accessible storage, software flaws, excessive permissions, and malware. That’s why we’ve introduced new, integrated security dashboards in GCE and GKE consoles, powered by Security Command Center (SCC). Available now, these dashboards can provide critical security insights and proactively highlight potential vulnerabilities, misconfiguration risks, and active threats relevant to your compute engine instances and Kubernetes clusters.
Embedding crucial security insights directly in GCE and GKE environments can empower you to address relevant security issues faster, and play a key role in maintaining a more secure environment over time.
Gain better visibility, directly where you work
The GCE Security Risk Overview page now shows top security findings, vulnerability findings over time, and common vulnerabilities and exploits (CVEs) on your virtual machines. These security insights, powered by Google Threat Intelligence, provide dynamic analysis based on the latest threats uncovered by Mandiant expert analysts. With these insights, you can make better decisions such as which virtual machine to patch first, how to better manage public access, and which CVEs to prioritize for your engineering team.
The top security findings can help prioritize the biggest risks in your environment such as misconfigurations that lead to overly accessible resources, critical software vulnerabilities, and potential moderate risks that may pose a combined critical risk.
Vulnerability findings over time can help assess how well your software engineering team is addressing known software vulnerabilities. CVE details are presented in two widgets: a heatmap distribution on the exploitability and potential impact of the vulnerabilities in your environment, and a list of the top five CVEs found in your virtual machines.
New GCE Security Risk Dashboard highlights top security insights.
The updated GKE console is similar, designed to help teams make better remediation decisions and catch threats before they escalate. A dedicated GKE security page displays streamlined findings on misconfigurations, top threats, and vulnerabilities:
The Workloads configuration widget highlights potential misconfigurations, such as over-permissive containers and pod and namespace risks.
Top threats highlight Kubernetes and container threats, such as cryptomining, privilege escalation, and malicious code execution.
Top software vulnerabilities highlight top CVEs and prioritize them based on their prevalence in your environment and the severity impact.
New GKE Security Posture Dashboard highlights key security insights.
Fully activate dashboards by upgrading to Security Command Center Premium
The GCE and GKE security dashboards, powered by Security Command Center, include the security findings widget (in the GCE dashboard) and the workload configurations widget (in the GKE dashboard).
To access the vulnerabilities and threats widgets, we recommend upgrading to Security Command Center Premium directly from the dashboards, available as a 30-day free trial. You can review the GCE documentation and GKE documentation to learn more about the security dashboards. To learn more about Security Command Center Premium and our different service tiers review the service tier documentation. You can learn more about Security Command Center Premium here.
In the latest episode of the Agent Factory podcast, Amit Miraj and I took a deep dive into the Gemini CLI. We were joined by the creator of the Gemini CLI, Taylor Mullen, who shared the origin story, design philosophy, and future roadmap.
This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.
What is the Gemini CLI?
The Gemini CLI is a powerful, conversational AI agent that lives directly in your command line. It’s designed to be a versatile assistant that can help you with your everyday workflows. Unlike a simple chatbot, the Gemini CLI is agentic. This means it can reason, choose tools, and execute multi-step plans to accomplish a goal, all while keeping you informed. It’s open-source, extensible, and as we learned from its creator, Taylor Mullen, it’s built with a deep understanding of the developer workflow.
The Factory Floor
The Factory Floor is our segment for getting hands-on. This week, we put the Gemini CLI to the test with two real-world demos designed to tackle everyday challenges.
I kicked off the demos by tackling a problem I think every developer has faced: getting up to speed with a new codebase. This included using the Gemini CLI to complete the following tasks:
For the next demo, Amit tackled a problem close to his heart: keeping up with the flood of new AI research papers. He showed how he built a personal research assistant using the Gemini CLI to complete the following tasks:
Process a directory of research papers and generate an interactive webpage explainer for each one
Iterate on a simple prompt, creating a detailed, multi-part prompt to generate a better output
Lang Chain 1.0 Alpha: The popular library is refocusing around a new unified agent abstraction built on Lang Graph, bringing production-grade features like state management and human-in-the-loop to the forefront.
Embedding Gemma: Google’s new family of open, lightweight embedding models that allow developers to build on-device, privacy-centric applications.
Gemma 3 270M: A tiny 270 million parameter model from Google, perfect for creating small, efficient sub-agents for simple tasks.
Gemini CLI in Zed Code Editor: The Gemini CLI is now integrated directly into the Z Code editor, allowing developers to explain code and generate snippets without switching contexts.
500 AI Agents Projects: A GitHub repository with a categorized list of open-source agent projects.
Transformers & LLMs cheatsheet: A resource from a team at Stanford that provides a great starting point or refresher on the fundamentals of LLMs.
Taylor Mullen on the Gemini CLI
The highlight of the episode for me was our in-depth conversation with Taylor Mullen. He gave us a fascinating look behind the curtain at the philosophy and future of the Gemini CLI. Here are some of the key questions we covered:
Taylor explained that the project started about a year and a half ago as an experiment with multi-agent systems. While the CLI version was the most compelling, the technology at the time made it too slow and expensive. He said it was “one of those things… that was almost a little bit too early.” Later, seeing the developer community embrace other AI-powered CLIs proved the demand was there. This inspired him to revisit the idea, leading to a week-long sprint where he built the first prototype.
For Taylor, the number one reason for making the Gemini CLI open source was trust and security. He emphasized, “We want people to see exactly how it operates… so they can have trust.” He also spoke passionately about the open-source community, calling it the “number one thing that’s on my mind.” He sees the community as an essential partner that helps keep the project grounded, secure, and building the right things for users.
When I asked Taylor how his team manages to ship an incredible 100 to 150 features, bug fixes, and enhancements every single week, his answer was simple: they use the Gemini CLI to build itself.
Taylor shared a story about the CLI’s first self-built feature: its own Markdown renderer. He explained that while using AI to 10x productivity is becoming easier, the real challenge is achieving 100x. For his team, this means using the agent to parallelize workflows and optimize human time. It’s not about the AI getting everything right on the first try, but about creating a tight feedback loop for human-AI collaboration at scale.
Gemini CLI under the hood: “Do what a person would do”
The guiding principle, Taylor said, is to “do what a person would do and don’t take shortcuts.” He revealed that, surprisingly, the Gemini CLI doesn’t use embeddings for code search. Instead, it performs an agentic search, using tools like grep, reading files, and finding references. This mimics the exact process a human developer would use to understand a codebase. The goal is to ground the AI in the most relevant, real-time context possible to produce the best results.
We also discussed the agent’s ability to “self-heal.” When the CLI hits a wall, it doesn’t just fail; it proposes a new plan. Taylor gave an example where the agent, after being asked for a shareable link, created a GitHub repo and used GitHub Pages to deploy the content.
The team is doubling down on extensibility. The vision is to create a rich ecosystem where anyone can build, share, and install extensions. These are not just new tools, but curated bundles of commands, instructions, and MCP servers tailored for specific workflows. He’s excited to see what the community will build and how users will customize the Gemini CLI for their unique needs.
Your turn to build
The best way to understand the power of the Gemini CLI is to try it yourself.
AI is transforming how people work and how businesses operate. But with these powerful tools comes a critical question: how do we empower our teams with AI, while ensuring corporate data remains protected?
A key answer lies in the browser, an app most employees use every day, for most of their day. Today, we announced several new AI advancements coming to Chrome, which redefine how browsers can help people with daily tasks, and work is no exception. Powerful AI capabilities right in the browser will help business users be more productive than ever, and we’re giving IT and security teams the enterprise-grade controls they need to keep company data safe.
Gemini in Chrome, with enterprise protections
Our work days can be full of distractions— endless context switching between projects, and repetitive tasks that slow people down. That’s why we’re bringing a new level of assistance directly into the browser, where many of these workflows are already taking place.
Gemini in Chrome1 is an AI browsing assistant that helps people at work. It can cut through the complexity of finding and making use of information across tabs and help people get work done faster. Employees can now easily summarize long and complex reports or documents, grab key insights from a video, or even brainstorm ideas for a new project with help from Gemini in Chrome. Gemini in Chrome can understand the context of a user’s tabs, and soon it will even help recall recent tabs they had open.
Gemini in Chrome will be able to recall your past tabs for you
We’re bringing these capabilities to Google Workspace business and education customers with enterprise-grade data protections, ensuring IT teams stay in control of their company’s data.
Gemini in Chrome doesn’t just help you find information that you need for your workday, you can also take action through integrations with Google apps people use every day like Google Calendar, Docs and Drive. So employees can schedule a meeting right in their current workflows.
Gemini in Chrome is now integrated with your favorite Google apps
Gemini in Chrome is becoming available for Mac and Windows users in the U.S., and we’re also bringing Gemini in Chrome to mobile in the U.S. Users can also activate Gemini when using Chrome on Android, and other apps, by holding the power button. And starting soon, on iOS Gemini in Chrome will be built into the app.
IT teams can configure Gemini in Chrome through policies in Chrome Enterprise Core, and enterprise data protections automatically extend to customers with qualifying editions of Google Workspace.
AI Mode from Google Search in Chrome
In addition to Gemini in Chrome, the Chrome omnibox—the address bar people use to navigate the web—is also getting an upgrade. With AI Mode, people can ask complex, multi-part questions specific to their needs in the same place where they already search. You’ll get an AI-generated response, and can keep exploring with follow-up questions and helpful web links. IT teams can manage this feature through the generative AI policies in Chrome Enterprise Core.
Proactive AI Protection
We know that a browser’s greatest value is its ability to keep users safe. As the security threats from AI-generated scams and phishing attacks become more sophisticated, our defenses must evolve just as quickly. That’s why security is one of the core pillars of Chrome’s AI strategy.
Safe Browsing’s Enhanced Protection mode is now even more secure with the help of AI. We’re using it to proactively block increasingly convincing threats such as tech support scams, and will be expanding to fake anti/virus and impersonated brand websites soon. We’ve also added AI to help detect and block scammy and spammy site notifications, which has already led to billions fewer notifications being sent to Chrome on Android users every day.
AI with enterprise controls
Organizations want to empower their workforce with AI for greater productivity, but never at the expense of security. Chrome Enterprise gives IT teams the tools they need to manage these new capabilities effectively: our comprehensive policies allow IT and security teams to decide exactly which AI features in Chrome are enabled for which users, and how that data is treated.
Chrome Enterprise Premium allows organizations even more safeguards. For example, they can use URL filtering to block unapproved AI tools and point employees back to corporate supported AI services. Within AI tools, security teams can apply data masking or other upload and copy/paste restrictions for sensitive data. These advanced capabilities further prevent sensitive information from being accidentally or maliciously shared via AI tools or any other web sites.
With Chrome Enterprise, AI in the browser offers businesses the best of both worlds: a highly productive, AI-enhanced user experience and the enterprise-grade security enterprises depend on to protect their data. To learn more about these new features, view our recent Behind the Browser AI Edition video.
1 Check responses for accuracy. Available on select devices and in select countries, languages, and to users 18+
Enterprises need to move from experimenting with AI agents to achieving real productivity, but many struggle to scale their agents from prototypes to secure, production-ready systems.
The question is no longer if agents deliver value, but how to deploy them with enterprise confidence. And there’s immense potential for those who solve the scaling challenge. Our 2025 ROI of AI Report reveals that 88% of agentic AI early adopters are already seeing a positive return on investment (ROI) on generative AI.
Vertex AI Agent Builder is the unified platform that helps you close this gap. It’s where you can build the smartest agents, and deploy and scale them with enterprise-grade confidence.
Today, we’ll walk you through agent development on Vertex AI Agent Builder, and highlight a couple of key updates to fuel your next wave of agent-driven productivity and growth.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7d8c08a4f0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
The five pillars of enterprise agent development on Vertex AI Agent Builder
Moving an agent from prototype to production requires a cohesive suite of tools. Vertex AI Agent Builder simplifies this complexity by providing an integrated workflow across five essential pillars, supporting your agent through every step of its lifecycle.
1. Agent frameworks Your agent development journey begins here. You configure and orchestrate your agents using your preferred open source framework. The Agent Development Kit (ADK) – what we use internally at Google – is one of the many options available, and it has already seen over 4.7 million downloads since April.
2. Model choice Models are the intelligent core of your agent. Our platform is provider-agnostic, supporting every leading model – including the Gemini 2.5 model family – alongside hundreds of third-party and open source models from Vertex AI Model Garden. With the ability to Provision Throughput, you can secure dedicated capacity for consistent, low-latency performance at scale.
3. Tools for taking actions Once built, your agent needs tools to take action and interact with the real world. Grounding is a critical step that connects your AI to verifiable, real-time data – dramatically reducing hallucinations and building user trust. On Vertex AI, you can connect your agent to trusted, real-time data sources you already rely on. For example, Grounding with Google Maps is now available for everyone in production. Your agents gain accuracy and the ability to reduce hallucinations by accessing the freshness of Google Maps, which includes factual information on 250 million places for location-aware recommendations and actions.
4. Scalability and performance Deploy and manage at scale using Vertex AI Agent Engine. We built this suite of modular, managed services to instantly move your prototypes into production. The platform provides everything needed for operation and scaling, including a fully managed runtime, integrated Sessions and Memory Bank to personalize context across user interactions, and integrated evaluation and observability services.
Since launch, hundreds of thousands of agents have been deployed to Vertex AI Agent Engine. Here are some recent updates we’re most excited about:
Secure code execution: We now provide a managed, sandboxed environment to run agent-generated code. This is vital for mitigating risks while unlocking advanced capabilities for tasks like financial calculations or data science modeling .
Agent-to-Agent collaboration: Build sophisticated, reliable multi-agent systems with native support for the Agent-to-Agent (A2A) protocol when you deploy to the Agent Engine runtime. This allows your agents to securely discover, collaborate, and delegate tasks to other agents, breaking down operational silos .
Real-Time interactive agents: Unlock a new class of interactive experiences with Bidirectional Streaming. This provides a persistent, two-way communication channel ideal for real-time conversational AI, live customer support, and interactive applications that process audio or video inputs .
Simplified path to production: We have streamlined the journey from a local ADK prototype to a live service, with a one-line deployment in the ADK CLI to Agent Engine.
5. Built-in trust and security Security and compliance are built into every layer of the Vertex AI architecture, ensuring control is paramount. This includes preventing data exfiltration with Virtual Private Cloud Service Controls (VPC-SC) and using your own encryption keys with Customer-Managed Encryption Keys (CMEK). We also meet strict compliance milestones like HIPAA and Data Residency (DRZ) compliance requirements. Your agents can handle sensitive workloads in highly regulated industries with full confidence.
Get started today
It’s time to move your AI strategy from experimentation to exponential growth. Bridge the production gap and deploy your first enterprise agent with Vertex AI Agent Builder, the secure, scalable, and intelligent advantage you need to succeed.
We are happy to drop the third installment of our Network Performance Decoded whitepaper series, where we dive into topics in network performance and benchmarking best practices that often come up as you troubleshoot, deploy, scale, or architect your cloud-based workloads. We started this series last year to provide you helpful tips to not only make the best of your network but also avoid costly mistakes that can drastically impact your application performance. Check out our last two installments — tuning TCP and UDP bulk flows performance, and network performance limiters.
In this installment, we provide an overview of three recent whitepapers — one on TCP retransmissions, another on the impact of headers and MTUs on data transfer performance, and finally, using netperf to measure packets per second performance.
1. Make it snappy: Tuning TCP retransmission behaviour
The A Brief Look at Tuning TCP Retransmission Behaviour whitepaper is all about how to make your online applications feel snappier, by tweaking two Linux TCP settings, net.ipv4.tcp_thin_linear_timeouts and net.ipv4.tcp_rto_min_us (or rto_min) Think of it as fine-tuning your application’s response times and how quickly your application recovers when there’s a hiccup in the network.
For all the gory details, you’ll need to read the paper, but here’s the lowdown on what you’ll learn:
Faster recovery is possible: By playing with these settings, especially making rto_min smaller, you can drastically cut down on how long your TCP connections just sit there doing nothing after a brief network interruption. This means your apps respond faster, and users have a smoother experience.
Newer kernels are your friend: If you’re running a newer Linux kernel (like 6.11 or later), you can go even lower with rto_min (down to 5 milliseconds!). This is because these newer kernels have smarter ways of handling things, leading to even quicker recovery.
Protective ReRoute takes resiliency to the next level: For those on Google Cloud, tuning net.ipv4.tcp_rto_min_us can actually help Google Cloud’s Protective ReRoute (PRR) mechanism kick in sooner, making your applications more resilient to network issues.
Not just for occasional outages: Even for random, isolated packet loss, these tweaks can make a difference. If you have a target for how quickly your app should respond, you can use these settings to ensure TCP retransmits data well before that deadline.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 to try Google Cloud networking’), (‘body’, <wagtail.rich_text.RichText object at 0x3eed51df5cd0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
2. Beyond network link-rate
Consider more than just “link-rate” when thinking about network performance! In our Headers and Data and Bitrates whitepaper, we discuss how the true speed of data transfer is shaped by:
Headers: Think of these as necessary packaging that reduces the actual data sent per packet.
Maximum Transmission Units (MTUs): These dictate maximum packet size. Larger MTUs mean more data per packet, making your data transfers more efficient.
In cloud environments, a VM’s outbound data limit (egress cap) isn’t always the same as the physical network’s speed. While sometimes close, extra cloud-specific headers can still impact your final throughput. Optimize your MTU settings to get the most out of your cloud network. In a nutshell, it’s not just about the advertised speed, but how effectively your data travels!
3. How many transactions can you handle?
In Measuring Aggregate Packets Per Second with netperf, you’ll learn how to use netperf to figure out how many transactions (and thus packets) per second your network can handle, which is super useful for systems that aren’t just pushing huge files around. Go beyond just measuring bulk transfers and learn a way to measure the packets per second rates which can gate the performance of your request/response applications.
Here’s what you’ll learn:
Beating skew error: Ever noticed weird results when running a bunch of netperf tests at once? That’s “skew error,” and this whitepaper describes using “demo mode” to fix it, giving you way more accurate overall performance numbers.
Sizing up your test: Get practical tips on how many “load generators” (the machines sending the traffic) and how many concurrent streams you need to get reliable results. Basically, you want enough power to truly challenge your system.
Why UDP burst mode is your friend: It explains why using “burst-mode UDP/RR” is the secret sauce for measuring packets per second. TCP, as smart as it is, can sometimes hide the true packet rate because it tries to be too efficient.
Full-spectrum testing and analysis: The whitepaper walks you through different test types you can run with the runemomniaggdemo.sh script, giving you an effective means to measure how many network transactions per second the instance under test can achieve. This might help you infer aspects of the rest of your network that influence this benchmark. Plus, it shows you how to crunch the numbers and even get some sweet graphs to visualize your findings.
Stay tuned
With these resources our goal is to foster an open, collaborative community for network benchmarking and troubleshooting. While our examples may be drawn from Google Cloud, the underlying principles are universally applicable, no matter where your workloads operate. You can access all our whitepapers — past, present, and future — on ourwebpage. Be sure to check back for more!