AI is rapidly reshaping the public sector, ushering in a new era of intelligent and AI-powered service delivery and mission impact. Chief AI Officers (CAIOs) and other agency leaders play a critical role as AI becomes more pervasive. At Google, we’ve long believed that AI is a foundational and transformational technology, with the potential to benefit people and society. Realizing its full potential to improve government services, enhance decision-making, and ultimately create a more efficient and effective public sector requires leadership and a clear commitment.
Google recently commissioned IDC to conduct a study that surveyed 161 federal CAIOs, government AI leaders and other decision makers to understand how agency leaders are leading in this new AI era – and the value they are already bringing when it comes to AI governance, collaboration, and building public trust and citizen engagement¹. I recently sat down with Ruthbea Yesner, Vice-President of IDC Government Insights to explore the key findings of this research and what it means for the public sector – see excerpts of our discussion and key insights below.
Key Finding #1: 62% of those surveyed say strengthening cybersecurity is a top motivator for AI investments
Agencies are embracing AI to enhance cybersecurity and protect critical infrastructure – with 60% of respondents indicating that internal cybersecurity protection is their top AI/ML use case. Over 40% of federal agencies surveyed state that protecting critical infrastructure is a key driver for their AI investments going forward. Additionally, respondents believe that applying AI to strengthen cybersecurity and protect critical infrastructure will deliver positive outcomes in just 9 months; the second fastest time to value of any expected outcome of AI.
CAIOs and other agency leaders play a crucial role in driving AI adoption and ensuring that agencies are able to leverage this powerful technology. While 50% of federal agencies have already appointed a CAIO, the rest are expected to follow soon. As adoption accelerates and AI maturity grows, CAIOs need to prioritize robust cybersecurity measures and risk mitigation strategies in all AI initiatives, ensuring the protection of sensitive data and systems.
Key Finding #2: Higher AI maturity increases likelihood to explore other Gen AI use cases by 4x
IDC created a 5-phase approach to assessing AI maturity and the findings are remarkable – 50% of agencies surveyed reported high levels of AI maturity which corresponds to mature behaviors like piloting and implementing generative AI use cases to drive innovation and mission impact. Mature AI agencies are embracing an innovation culture and are focused on AI use cases and projects with high potential for impact.
We’re seeing some agencies solving for one specific problem or use case and creating quick wins and the appetite to do more, and in other cases, they are tackling big, complex challenges head-on. By adopting an AI-first mindset, incorporating AI into their workflows and scaling their use of AI, they are creating the groundswell to do more. This has a compounding effect as AI becomes more pervasive across the agency, and individuals increasingly feel part of its positive cultural change and impact.
This has a catalyst effect, it just takes one person doing something amazing with AI to motivate others to learn and apply AI
Ruthbea Yesner
Vice-President of IDC Government Insights
Generative AI is the future – attracting 42% of AI investments. Agencies are eager to explore its potential – and innovation will be a key motivator for continued AI investment going forward. As organizations prioritize AI, the CAIO role becomes even more multifaceted, demanding not just technical expertise but also visionary leadership to drive organizational culture change and develop a truly AI-enabled workforce.
Key Finding #3: An AI-ready workforce is the key to unlocking AI’s potential
The rapid pace of AI adoption has highlighted a significant challenge: a shortage of AI expertise. 39% percent of survey respondents report that their biggest challenge is a lack of in-house AI skills and expertise, and 68% are focused on training and retaining their workforce.
Google is tackling this skills challenge head-on. We recently announced our Google Cloud Launchpad for Veterans – a no-cost training and certification journey to equip veterans in all roles and at all levels with the cloud knowledge and skills needed to drive innovation, and contribute to their employer’s digital transformation strategy. And we also announced a new AI training initiative through Google.org’s AI Opportunity Fund – with $15 million for AI skills training for US government workers for the Partnership for Public Service and InnovateUS. This also includes a grant to the Partnership for Public Service to establish the new Center for Federal AI to provide AI skills and literacy to federal leaders and workers, including 2,000 senior government officials.
One thing is clear – AI requires leadership, and the CAIO is an important new C-suite role signaling the government’s commitment to harness AI and reach its full potential. CAIOs and other agency leaders are critical to charting this new AI era and providing the expertise and leadership necessary to leverage AI for the public good.
To learn more about how CAIOs are leading in this new AI era, download The Chief Artificial Intelligence Officer (CAIO) Playbook: A Practical Guide for Advancing AI Innovation in Government. By embracing its recommendations, agencies can create their own roadmap to drive AI adoption to accelerate mission outcomes and impact. To hear the full interview with Ruthbea Yesner, Vice-President of IDC Government Insights, please register to join the Google Public Sector Summit On-Demand on December 3rd.
¹ IDC Signature White Paper, The Chief Artificial Intelligence Officer (CAIO) Playbook: A Practical Guide for Advancing AI Innovation in Government, sponsored by Google Public Sector, Doc# US52616824, October 2024.
Protecting sensitive company data is no longer just a best practice—it’s business critical. In today’s world, data breaches can have serious consequences, from financial losses and reputational damage to legal repercussions and operational disruptions. That’s why Chrome Enterprise Premium, our advanced secure enterprise browser offering, includes a growing suite of Data Loss Prevention (DLP) capabilities to help organizations safeguard their sensitive information and maintain compliance.
We recently launched a number of enhancements to our DLP capabilities, giving you even more granular control over your company’s data. This blog post will explore how these new capabilities support your organization’s comprehensive DLP journey—from discovering potential risks and user behavior, to controlling data flow with robust security measures, to investigating potential incidents with detailed reporting and analysis, and finally, to expanding protection beyond desktops.
Discover and understand user behavior
Understanding how your users interact with data is the first step in preventing data leaks. Chrome Enterprise provides powerful tools to gain visibility into user activity and to identify potential risks:
1. Chrome Security Insights
Chrome Security Insights empowers Chrome Enterprise customers to proactively identify potential threats with simplified security monitoring. This feature monitors key security configurations, security event logging, and 50 common DLP detectors with just a few clicks. Administrators gain valuable insights into high-risk activities through detailed reports on users, domains, and sensitive data transfers, enabling swift identification and remediation of security concerns. Start your 30-day Chrome Enterprise Premium trial and enable Chrome Security Insights here.
2. URL Filtering Audit Mode [Currently in Public Preview (beta), general availability coming soon]
Chrome Enterprise Premium’s URL Filtering Audit Mode offers a valuable tool for organizations seeking to refine their web access policies. It allows administrators to selectively activate monitoring of employee browsing activity without enforcing restrictions, providing insights into users behavior and potential security risks. By analyzing this data, IT and security teams can make informed decisions regarding URL filtering rules, striking an effective balance between security and user productivity. See here to learn how to configure URL Filtering Audit Mode.
Enforce DLP controls
Once you understand your users’ behavior, it’s time to put the right controls in place to prevent data leaks. Chrome Enterprise offers a robust set of in-browser protections.
1. Copy and paste protections
Controlling how users interact with sensitive data is crucial. Chrome Enterprise Premium’s copy and paste protections allow you to restrict or block users from copying sensitive information from web pages or pasting it into unauthorized applications or websites. This granular control helps prevent data exfiltration and ensures that sensitive information stays within designated boundaries, reducing the risk of data breaches and helping with compliance with data protection regulations. The copy and paste protections include:
Preventing data leakage to Incognito mode: Concerned about sensitive data being copied into incognito mode, where it can potentially bypass security measures? Chrome Enterprise Premium now allows you to block or warn users when they attempt to copy data between regular browsing sessions and incognito windows.
Controlling data sharing between applications: For organizations looking to prevent data leakage to external applications, Chrome Enterprise Premium now allows you to block or warn users when they attempt to copy data from your web applications into external programs like Notepad, Microsoft Word, or other apps.
Isolating data between Chrome profiles: Shared devices or users with multiple Chrome profiles can introduce risks of data cross-contamination. Chrome Enterprise Premium’s new copy-paste controls now allow you to block or warn users when they attempt to move data between different profiles.
Securing sensitive emails: Emails often contain highly confidential information requiring stringent protection. With Chrome Enterprise Premium, you can implement specific rules, such as blocking any copying from Gmail unless it’s being pasted back into Gmail.
See more details about setting up copy and paste protections here.
2. Watermarking
Watermarking acts as a deterrent to unauthorized data sharing. Chrome Enterprise Premium allows you to apply visible watermarks to sensitive company documents viewed in the browser, displaying information like the user’s email address, date, or a custom message. This helps discourage data exfiltration and makes it easier to trace the source of any leaked information. See here on how to set up watermarking with Chrome Enterprise Premium
3. Screenshot protections
Screenshots can be a convenient way to capture information, but they also pose a data leak risk. Chrome Enterprise Premium’s screenshot protection allows you to prevent users from taking screenshots of sensitive content within the browser. This adds another layer of protection to your DLP strategy, limiting the potential for unauthorized data capture. Learn how to set up screenshot protection rules here.
These controls work together to create a comprehensive security strategy, limiting the ways in which data can be exfiltrated from your organization.
Investigate potential data leaks
Even with the best preventative measures in place, it’s crucial to be prepared to investigate potential security incidents. Chrome Enterprise provides tools to help you quickly identify and address threats:
1. Evidence Locker [Currently in Private Preview, general availability coming soon]
The evidence locker provides a secure repository for storing files and data that require further investigation by security teams. For instance, if an employee attempts to download a non-public financial report, Chrome Enterprise Premium can block the action and retain a copy of the file in the evidence locker. This triggers a detailed report for IT and security teams, enabling them to take appropriate investigation and remediation steps. Stay tuned for more information on the upcoming release of Evidence Locker.
2. Chrome Extension Telemetry in Google Security Operations
Chrome Enterprise Core integrates with Google Security Operations, our cloud-native security analytics platform, to provide deeper visibility into browser activity. Previously, detection and response teams were limited to analyzing static extension attributes. Now, you can set dynamic rules that continuously monitor extension behavior in your production environment, enabling proactive identification and remediation of risks before they escalate into threats. For example, you can monitor if extensions are unexpectedly contacting remote hosts or accessing cookies. This enhanced visibility empowers your security team to detect and mitigate data theft and infrastructure attacks in near real-time, significantly reducing your organization’s vulnerability to malicious extensions. See how to set this up here.
Expand protection to other platforms
Chrome Enterprise is committed to extending its threat protection capabilities beyond the desktop.
1. Mobile threat protections
With the growing use of mobile devices for work, securing the browser on these devices is essential. Chrome Enterprise Core is extending its threat protection capabilities to Android devices with download blocking. This feature will allow organizations to set policies to prevent users from downloading malicious files flagged by Google Safe Browsing from the web onto their mobile devices, bringing threat protections beyond desktops. Organizations can also choose to block all downloads on Android on managed Chrome. Get started with Chrome Enterprise Core today at no additional costs.
Chrome Enterprise Premium: Your partner in DLP
These features are just a glimpse into the comprehensive DLP capabilities offered by Chrome Enterprise. We are consistently enhancing our security capabilities to help organizations like yours take a proactive approach to data loss prevention, safeguarding sensitive information at the critical browser layer and ensuring compliance in today’s increasingly complex digital landscape.
Start using Chrome Enterprise Core today at no additional cost to gain foundational security capabilities. Or, experience Chrome Enterprise Premium’s advanced security and DLP features with a free 60-day trial and enable Chrome Security Insights here.
One of Google Cloud’s major missions is to arm security professionals with modern tools to help them defend against the latest threats. Part of that mission involves moving closer to a more autonomous, adaptive approach in threat intelligence automation.
In our latest advancements in malware analysis, we’re equipping Gemini with new capabilities to address obfuscation techniques and obtain real-time insights on indicators of compromise (IOCs). By integrating the Code Interpreter extension, Gemini can now dynamically create and execute code to help deobfuscate specific strings or code sections, while Google Threat Intelligence (GTI) function calling enables it to query GTI for additional context on URLs, IPs, and domains found within malware samples. These tools are a step toward transforming Gemini into a more adaptive agent for malware analysis, enhancing its ability to interpret obfuscated elements and gather contextual information based on the unique characteristics of each sample.
Building on this foundation, we previously explored critical preparatory steps with Gemini 1.5 Pro, leveraging its expansive 2-million-token input window to process substantial sections of decompiled code in a single pass. To further enhance scalability, we introduced Gemini 1.5 Flash, incorporating automated binary unpacking through Mandiant Backscatter before the decompilation phase to tackle certain obfuscation techniques. Yet, as any seasoned malware analyst knows, the true challenge often begins once the code is exposed. Malware developers frequently employ obfuscation tactics to conceal critical IOCs and underlying logic. Malware may also download additional malicious code, making it challenging to fully understand the behavior of a given sample.
For large language models (LLMs), obfuscation techniques and additional payloads create unique challenges. When dealing with obfuscated strings such as URLs, IPs, domains, or file names, LLMs often “hallucinate” without explicit decoding methods. Additionally, LLMs cannot access, for example, URLs that host additional payloads, often resulting in speculative interpretations about the sample’s behavior.
To help with these challenges, Code Interpreter and GTI function calling tools provide targeted solutions. Code Interpreter enables Gemini to autonomously create and execute custom scripts, as needed, using its own judgment to decode obfuscated elements within a sample, such as strings encoded with XOR-based algorithms. This capability minimizes interpretation errors and enhances Gemini’s ability to reveal hidden logic without requiring manual intervention.
Meanwhile, GTI function calling expands Gemini’s reach by retrieving contextualized information from Google Threat Intelligence on suspicious external resources such as URLs, IPs, or domains, providing verified insights without speculative guesses. Together, these tools equip Gemini to better handle obfuscated or externally hosted data, bringing it closer to the goal of functioning as an autonomous agent for malware analysis.
To illustrate how these enhancements boost Gemini’s capabilities, let’s look at a practical example. In this case, we’re analyzing a PowerShell script that contains an obfuscated URL that hosts a second-stage payload. This particular sample was previously analyzed with some of the most advanced publicly available LLM models, which incorporate code generation and execution as part of their reasoning process. Despite these capabilities, each model “hallucinated,” generating completely fabricated URLs instead of accurately revealing the correct one.
Obfuscated PowerShell code sample to be analyzed by Gemini
Utilizing Code Interpreter and GTI function calling as part of its reasoning process, Gemini autonomously generated the following report without any human intervention. When deemed necessary, it applies these tools to process and extract additional information from the sample.
Final Report: PowerShell Malware with URL Deobfuscation
This report analyzes a PowerShell script designed to download and execute a file from an obfuscated URL. The script employs a custom obfuscation routine resembling the RC4 stream cipher to conceal the malicious URL.
Functionality:
Obfuscation Routine: The script defines a function F that performs byte-level XOR encryption using a key derived from the string “tox2”. This function is used to deobfuscate the malicious URL.
URL Decryption: The obfuscated URL, stored as a byte array, is decrypted using the F function with “tox2” as the key.
File Download: The script downloads a file from the decrypted URL (hxxps://filedn[.]eu/lODWTgN8sswHA6Pn8HXWe1J/tox2/Scan_docs%2398097960[.]msi) and saves it to the system’s temporary directory under a randomly generated filename.
Execution: Finally, the script executes the downloaded file.
def deobfuscate(x, y):
s = list(range(256))
j = 0
k = [ord(c) for c in y]
for i in range(256):
j = (j + s[i] + k[i % len(k)]) % 256
s[i], s[j] = s[j], s[i]
i = 0
j = 0
output = []
for m in range(len(x)):
i = (i + 1) % 256
j = (j + s[i]) % 256
s[i], s[j] = s[j], s[i]
output.append(x[m] ^ s[(s[i] + s[j]) % 256])
return output
a = [0x25, 0x7B, 0x5A, 0xD8, 0x9C, 0xFF, 0xC3, 0xA4, 0x95, 0xB7,
0x9D, 0x6E, 0x5F, 0x4D, 0x49, 0xFB, 0x78, 0x32, 0xB8, 0x17, 0x72,
0x01, 0x31, 0xFB, 0xC0, 0x58, 0x45, 0xCB, 0x5C, 0x3C, 0xDD, 0x00,
0x60, 0x87, 0x25, 0xB2, 0x44, 0xA6, 0xC5, 0xEA, 0x71, 0xAA, 0x9D,
0xA1, 0x05, 0x5F, 0xBD, 0x41, 0x28, 0x70, 0x30, 0x6B, 0x96, 0x24,
0x92, 0xA0, 0x9D, 0x92, 0x23, 0x79, 0x69, 0x2C, 0x1B, 0x1E, 0xEB,
0x69, 0x8F, 0x04, 0x10, 0x74, 0xBC]
b = "tox2"
c = deobfuscate(a, b)
decoded_url = "".join(chr(i) for i in c)
print(decoded_url)
Google Threat Intelligence (GTI) Lookup:
The GTI lookup provides valuable context regarding the identified IOC:
Threat Actor: UNC5687, known for using MESHAGENT in phishing campaigns impersonating the Security Service of Ukraine.
Campaign: This specific URL is linked to a campaign spreading MESHAGENT via phishing emails with malicious links. The campaign involved a captcha page leading to the download of either a malicious MSI or EXE file.
Malware: The campaign delivers MESHAGENT, an open-source remote access framework, configured to communicate with C2 domains linked to a service called “AnonVNC.”
Conclusion:
The analyzed PowerShell script is confirmed to be malicious, associated with a phishing campaign by UNC5687 distributing the MESHAGENT remote access tool.
Gemini identified that the script employs an XOR-based obfuscation algorithm that resembles RC4 to conceal the download URL. Recognizing this pattern, Gemini autonomously generates and executes a Python deobfuscation script within the Code Interpreter sandbox, successfully revealing the external resource.
With the URL in hand, Gemini then utilizes GTI function calling to query Google Threat Intelligence for further context. This analysis links the URL to UNC5687, a threat cluster known for using a remote access tool in phishing campaigns impersonating the Security Service of Ukraine.
As we’ve seen, the integration of these tools has strengthened Gemini’s ability to function as a malware analyst capable of adapting its approach to address obfuscation and gathering vital context on IOCs. By incorporating the Code Interpreter and GTI function calling, Gemini is better equipped to navigate complex samples by autonomously interpreting hidden elements and contextualizing external references.
While these are significant advancements, many challenges remain, especially given the vast diversity of malware and scenarios that exist in the threat landscape. We’re committed to making steady progress, and future updates will continue to enhance Gemini’s capabilities, moving us closer to a more autonomous, adaptive approach in threat intelligence automation.
Cassandra, a key-value NoSQL database, is prized for its speed and scalability, and used broadly for applications that require rapid data retrieval and storage such as caching, session management, and real-time analytics. Its simple key-value pair structure helps ensure high performance and easy management, especially for large datasets.
But this simplicity also leads to limitations like poor support for complex queries, potential data redundancy, and difficulty in modeling intricate relationships. Spanner, Google Cloud’s always-on, globally consistent, and virtually unlimited-scale database, combines the scalability and availability of NoSQL with the strong consistency and relational model of traditional databases, positioning it for traditional Cassandra workloads. And today, it’s easier than ever to switch from Cassandra to Spanner, with the introduction of the Cassandra to Spanner Proxy Adapter, an open-source tool for plug-and-play migrations of Cassandra workloads to Spanner, without any changes to the application logic.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3ec95529c8e0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Spanner for NoSQL workloads
Spanner provides strong consistency, high availability, virtually unlimited scalability, and a familiar relational data model with support for SQL and ACID transactions for data integrity. As a fully managed service, it helps simplify operations, allowing teams to focus on application development rather than database administration. Furthermore, Spanner’s high availability, even at a massive global scale, supports business continuity by minimizing database downtime.
We’re constantly evolving Spanner to meet the needs of modern businesses. Some of the latest Spanner capabilities include enhanced multi-model capabilities such as graph, full-text search, vector search, improved performance for analytical queries with Spanner Data Boost, and unique enterprise features such as geo-partitioning and dual-region configurations. For Cassandra users, these powerful features, along with Spanner’s compelling price-performance, unlock a world of new, exciting possibilities.
The Cassandra to Spanner adapter — battle-tested by Yahoo!
If you’re wondering, “Spanner sounds like a leap forward from Cassandra. How do I get started?” the proxy adapter provides a plug-n-play way to forward your client applications’ Cassandra Query Language (CQL) traffic to Spanner. Under the hood, the adapter functions as a Cassandra client for the application but operates internally by interacting with Spanner for all data manipulation tasks. With the Cassandra to Spanner proxy adapter there is no migration for your application code needed — it just works!
Yahoo successfully migrated from Cassandra to Spanner, reaping the benefits of improved performance, scalability, consistency, and operational efficiency. And the proxy adapter made it easy to migrate.
“The Cassandra Adapter has provided a foundation for migrating the Yahoo Contacts workload from Cassandra to Spanner without changing any of our CQL queries. Our migration strategy has more flexibility, and we can focus on other engineering activities while utilizing the scale, redundancy, and support of Spanner without updating the codebase. Spanner is cost-effective for our specific needs, delivering the performance required for a business of our scale. This transition enables us to maintain operational continuity while optimizing cost and performance.” – Patrick JD Newnan, Principal Product Manager, Core Mail and Analytics, Yahoo
Another Google Cloud customer that successfully migrated from Cassandra to Spanner recently is Reltio. Reltio benefited from an effortless migration process to minimize downtime and disruption to their services while reaping the benefits of a fully managed, globally distributed, and strongly consistent database.
These success stories demonstrate that migrating from Cassandra to Spanner can be a transformative step for businesses seeking to modernize their data infrastructure, unlock new capabilities, and accelerate innovation.
How does the new proxy adapter simplify your migration? A typical database migration involves the following steps:
Some of these steps — migrate your application (step 4) and migrate the data (step 6) — are more complex than others. The proxy adapter vastly simplifies migrating a Cassandra-backed application to point to Spanner. Here’s a high-level overview of the steps involved when using the new proxy adapter:
1. Assessment: Evaluate your Cassandra schema, data model, and query patterns which ones you can simplify after moving to Spanner.
2. Schema design: Spanner’s table declaration syntax and data types are similar to Cassandra’s; the documentation covers these similarities and differences in depth. With Spanner, you can also take advantage of relational capabilities and features like interleaved tables for optimal performance.
3. Data migration: There are several steps to migrate your data:
Replicate incoming data: Replicate incoming updates to your Cassandra cluster to Spanner in real-time using Cassandra’s Change Data Capture (CDC).
Another possibility is to update your application logic to perform dual-writes to Cassandra and Spanner. We don’t recommend this approach if you’re trying to minimize changes to your application code.
4. Set up the proxy adapter and update your Cassandra configuration: Download and run the Cassandra to Spanner Proxy Adapter, which runs as a sidecar next to your application. By default, the proxy adapter runs on port 9042. In case you decide to use a different port, don’t forget to update your application code to point to the proxy adapter.
5. Testing: Thoroughly test your migrated application and data in a non-production environment to ensure everything works as expected.
6. Cutover: Once you’re confident in the migration, switch your application traffic to Spanner. Monitor closely for any issues and fine-tune performance as needed.
What’s under the hood of the new proxy adapter?
The new proxy adapter presents itself as a Cassandra client to the application. From the application’s perspective, the only noticeable change is the IP address or hostname of the Cassandra endpoint, which now points to the proxy adapter. This streamlines the Spanner migration, without requiring extensive modifications to application code.
We designed the proxy adapter to establish a one-to-one mapping between each Cassandra cluster and a corresponding Spanner database. The proxy instance employs a multi-listener architecture, with each listener bound to a distinct port. This facilitates concurrent handling of multiple client connections, where each listener manages a distinct connection with the specified Spanner database.
The proxy’s translation layer handles the intricacies of the Cassandra protocol. This layer performs message decoding and encoding, manages buffers and caches, and crucially, parses incoming CQL queries and translates them into Spanner-compatible equivalents.
For more details about different ways of setting up the adapter, limitations, mapping of CQL data types to Spanner, and more, refer to the proxy adapter documentation.
Addressing common concerns and challenges
Let’s address a few concerns you may have with your migrations:
Cost: Have a look at Accenture’s benchmark result that demonstrates that Spanner ensures not only consistent latency and throughput but also cost efficiency. Furthermore, Spanner now offers a new tiered pricing model (Spanner editions) that delivers better cost transparency and cost savings opportunities to help you take advantage of all of Spanner’s capabilities.
Latency increases: To minimize an increase in query latencies, we recommend running the proxy adapter on the same host as the client application (as a side-car proxy) or running on the same Docker network when running the proxy adapter in a Docker container. We also recommend keeping the CPU utilization of the proxy adapter host to under 80%.
Schema flexibility: While Cassandra offers schema flexibility, Spanner’s stricter relational schema provides advantages in terms of data integrity, query power, and consistency.
Learning curve: Spanner’s data types have some differences with Cassandra’s. Have a look at this comprehensive documentation that can ease the transition.
Get started today
The benefits of strong consistency, simplified operations, enhanced data integrity, and global scalability make Spanner a compelling option for businesses looking to leverage the cloud’s full potential for NoSQL workloads. With the new Cassandra to Spanner proxy adapter, we are making it easier to plan and execute on your migration strategy, so you can unlock a new era of data-driven innovation for your organization.
AlloyDB Omni is back with a new release, version 15.7.0, and it’s bringing serious enhancements to your PostgreSQL workflows, including:
Faster performance
A new ultra-fast disk cache
An enhanced columnar engine
The general availability of ScANN vector indexing
A new release of the AlloyDB Omni Kubernetes operator
From transactional and analytical workloads to cutting-edge vector search, this update delivers across the board – in your data center, at the edge, on your laptop, and in any cloud and with 100% PostgreSQL compatibility.
Let’s jump in.
Better performance
Many workloads already get a boost compared to standard PostgreSQL. In our performance tests, AlloyDB Omni is more than 2x faster than standard PostgreSQL for transactional workloads, with most of the tuning being done for you automatically, without special configurations. One of the key advantages is the memory agent that optimizes shared buffers while at the same time avoiding out-of-memory errors. In general, the more memory you configure for AlloyDB Omni, the better it performs, serving more queries from the shared buffers and reducing the need to make calls to disk, which can be magnitudes slower than memory, particularly when using durable network storage
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e11acc7ab80>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
An ultra-fast disk cache
This trade-off between memory and disk storage also just got more flexible, with the introduction of an ultra-fast disk cache. It allows you to configure a fast, local, and not necessarily durable storage device as an extension of Postgres’ buffer cache. Instead of aging data out of memory to make space for new data, AlloyDB Omni can keep a copy of not-quite-hot data in the disk cache, where it can be accessed faster than from persistent disk.
Enhanced columnar engine
AlloyDB Omni’s analytics accelerator is changing the game for mixed workloads. Developers are finding it invaluable for gaining real-time analytical insights from their transactional data, all without the overhead of managing extra data pipelines or separate databases. You can instead enable the columnar engine, assign a portion of your memory to it, and let AlloyDB Omni decide which columns or tables to populate in the columnar engine to speed up queries. In our benchmarks, the columnar engine speeds up analytical queries up to 100x compared to standard PostgreSQL.
The practical size limit to the analytics accelerator was determined by the amount of memory you are able to assign to the columnar engine. What’s new is a feature that allows you to configure a fast local storage device for the columnar engine to spill to. This increases the volume of data that you can run analytical queries on.
SCaNN goes GA
Lastly, for vector database use-cases, AlloyDB Omni already offers great performance with pgvector using either the ivf or hnsw indexes. But while vector indexes are a great way to accelerate queries, they can be slow to build and rebuild. At Google Cloud Next 2024 we introduced ScaNN index as another available index type. AlloyDB AI’s ScaNN index surpasses standard PostgreSQL’s HNSW index by offering up to 4x faster vector queries. Beyond pure speed, ScaNN delivers significant advantages for real-world applications:
Rapid indexing: Accelerate development and eliminate bottlenecks in large-scale deployments with significantly faster index build times.
Optimized memory utilization: Reduce memory consumption by 3-4x compared to PostgreSQL’s HNSW index. This allows larger workloads to run on smaller hardware and boosts performance for diverse, hybrid workloads.
As of AlloyDB Omni version 15.7.0, AlloyDB AI ScANN indexing is generally available.
A new Kubernetes operator
In addition to the new version of AlloyDB Omni, we have also released version 1.2.0 of the AlloyDB Omni Kubernetes operator. This release adds support for more configuration options for health checks when high availability is enabled, support for configuring high availability to be enabled when a disaster recovery secondary cluster is promoted to primary, and support for log rotation to help manage storage space used by PostgreSQL log files.
At Google Cloud, we’re rapidly advancing our high-performance computing (HPC) capabilities, providing researchers and engineers with powerful tools and infrastructure to tackle the most demanding computational challenges. Here’s a look at some of the key developments driving HPC innovation on Google Cloud, as well as our presence at Supercomputing 2024.
We began our H-series with H3 VMs, specifically designed to meet the needs of demanding HPC workloads. Now, we’re excited to share some key features of the next generation of the H family, bringing even more innovation and performance to the table. The upcoming VMs will feature:
Improved workload scalability via RDMA-enabled 200 Gbps networking
Native support to directly provision full, tightly-coupled HPC clusters on demand
Titaniumtechnology that delivers superior performance, reliability, and security
We provide system blueprints for setting up turnkey, pre-configured HPC clusters on our H series VMs.
The next generation of H series is coming in early 2025.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e11a43375e0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Parallelstore: World’s first fully-managed DAOS offering
Parallelstore is a fully managed, scalable, high-performance storage solution based on next-generation DAOS technology, designed for demanding HPC and AI workloads. It is now generally available and provides:
Up to 6x greater read throughput performance compared to competitive Lustre scratch offerings
Low latency (<0.5ms at p50) and high throughput (>1GiB/s per TiB) to access data with minimal delays, even at massive scale
High IOPS (30K IOPS per TiB) for metadata operations
Simplified management that reduces operational overhead with a fully managed service
Parallelstore is great for applications requiring fast access to large datasets, such as:
Analyzing massive genomic datasets for personalized medicine
Training large language models (LLMs) and other AI applications efficiently
Running complex HPC simulations with rapid data access
A3 Ultra VMs with NVIDIA H200 Tensor Core GPUs
For GPU-based HPC workloads, we recently announced A3 Ultra VMs, which feature NVIDIA H200 Tensor Core GPUs. A3 Ultra VMs offer a significant leap in performance over previous generations. They are built on servers with our new Titanium ML network adapter, optimized to deliver a secure, high-performance cloud experience for AI workloads, and powered by NVIDIA ConnectX-7 networking. Combined with our datacenter-wide 4-way rail-aligned network, A3 Ultra VMs deliver non-blocking 3.2 Tbps of GPU-to-GPU traffic with RDMA over Converged Ethernet (RoCE).
Compared with A3 Mega, A3 Ultra offers:
2x the GPU-to-GPU networking bandwidth, powered by Google Cloud’s Titanium ML network adapter and backed by our Jupiter data center network
Up to 2x higher LLM inferencing performance with nearly double the memory capacity and 1.4x more memory bandwidth
Ability to scale to tens of thousands of GPUs in a dense, performance-optimized cluster for large AI and HPC workloads
With system blueprints, available through Cluster Toolkit, customers can quickly and easily create turnkey, pre-configured HPC clusters with Slurm support on A3 VMs.
A3 Ultra VMs will also be available through Google Kubernetes Engine (GKE), which provides an open, portable, extensible, and highly-scalable platform for large-scale training and serving of AI workloads.
Trillium: Ushering in a new era of TPU performance for AI
Tensor Processing Units, or TPUs, power our most advanced AI models such as Gemini, popular Google services like Search, Photos, and Maps, as well as scientific breakthroughs like AlphaFold 2 — which led to a Nobel Prize this year!
4.7x increase in peak compute performance per chip
Double the high bandwidth memory capacity
Double the interchip interconnect bandwidth
Cluster Toolkit: Streamlining HPC deployments
We continue to improve Cluster Toolkit, providing open-source tools for deploying and managing HPC environments on Google Cloud. Recent updates include:
Slurm-gcp V6 is now generally available, providing faster deployments and robust reconfiguration among other benefits.
Google Cloud Customer Care is now available for Toolkit. You can find more information here on how to get support via the Cloud Customer Care console.
GKE: Container orchestration with scale and performance
GKE continues to lead the way for containerized workloads with the support of the largest Kubernetes clusters in the industry. With support for up to 65,000 nodes, we believe GKE offers more than 10X larger scale than the other two largest public cloud providers.
At the same time, we continue to invest in automating and simplifying the building of HPC and AI platforms, with:
Secondary boot disk, which provides faster workload startups through container image caching
Custom compute classes, offering greater control over compute resource allocation and scaling
Extensive innovations in Kueue.sh, which is becoming the de facto standard for job queueing on Kubernetes with topology-aware scheduling, priority and fairness in queueing, multi-cluster support (see demo by Google and CERN engineers), and more
Customer success stories: Atommap and beyond
Atommap, a company specializing in atomic-scale materials design, is using Google Cloud HPC to accelerate its research and development efforts. With H3 VMs and Parallelstore, Atommap has achieved:
Significant speedup in simulations: Reduced time-to-results by more than half, enabling faster innovation
Improved scalability: Easily scaled resources for 1,000s to 10,000s of molecular simulations, to meet growing computational demands
Better cost-effectiveness: Optimized infrastructure costs, with savings of up to 80%, while achieving high performance
Atommap’s success story highlights the transformative potential of Google Cloud HPC for organizations pushing the boundaries of scientific discovery and technological advancement.
Looking ahead
Google Cloud is committed to continuous innovation for HPC. Expect further enhancements to HPC VMs, Parallelstore, Cluster Toolkit, Slurm-gcp, and other HPC products and solutions. With a focus on performance, scalability, compatibility, and ease of use, we’re empowering researchers and engineers to tackle the world’s most complex computational challenges.
Google Cloud Advanced Computing Community
We’re excited to announce the launch of the Google Cloud Advanced Computing Community, a new kind of community of practice for sharing and growing HPC, AI, and quantum computing expertise, innovation, and impact.
This community of practice will bring together thought leaders and experts from Google, its partners, and HPC, AI, and quantum computing organizations around the world for engaging presentations and panels on innovative technologies and their applications. The Community will also leverage Google’s powerful, comprehensive, and cloud-native tools to create an interactive, dynamic, and engaging forum for discussion and collaboration.
The Community launches now, with meetings starting in December 2024 and a full rollout of learning and collaboration resources in early 2025. To learn more, register here.
Google Cloud at Supercomputing 2024
The annual Supercomputing Conference series brings together the global HPC community to showcase the latest advancements in HPC, networking, storage and data analysis. Google Cloud is excited to return to Supercomputing 2024 in Atlanta with our largest presence ever.
Visit Google Cloud at booth #1730 to jump in and learn about our HPC, AI infrastructure, and quantum solutions. The booth will feature a Trillium TPU board, NVIDIA H200 GPU and ConnectX-7 NIC, hands-on labs, a full schedule of talks, a comfortable lounge space, and plenty of great swag!
The booth theater will include talks from ARM, Altair, Ansys, Intel, NAG, SchedMD, Siemens, Sycomp, Weka, and more. Booth labs will get you deploying Slurm clusters to fine-tune the Llama2 model or run GROMACS using Cloud Batch to run microbenchmarks or quantum simulations, and more.
We’re also involved in several parts of SC24’s technical program, including BoFs, User Groups, and Workshops. Googlers will participate in the following technical sessions:
Finally, we’ll be holding private meetings and roadmap briefings with our HPC leadership throughout the conference. To schedule a meeting, please contact hpc-sales@google.com.
Cloud compliance can present significant regulatory and technical challenges for organizations. These complexities often include delineating compliance responsibilities and accountabilities between the customer and cloud provider.
At Google Cloud, we understand these challenges faced by our customers’ cloud engineering, compliance, and audit teams, and want to help make them easier to manage. That’s why we’re pleased to announce that our Audit Manager service, which can digitize and help streamline the compliance auditing process, is now generally available.
Understanding compliance across layers in Google Cloud.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e11ab561700>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Traditional compliance methodologies, reliant on manual processes for evidence collection, are inefficient, prone to errors, and resource-intensive. According to the Gartner® Audit Survey, “When surveyed on their key priorities for 2024, 75% of chief audit executives (CAEs) cited audit’s ability to keep up with the fast-evolving cybersecurity landscape as their top priority — making it the most commonly cited priority.”
Introducing Audit Manager
Audit Manager can help organizations accelerate compliance efforts by providing:
Clear shared responsibility outlines: A matrix of shared responsibilities that delineates compliance duties between the cloud provider and customers, offering actionable recommendations tailored to your workloads.
Automated compliance assessments: Evaluation of your workloads against industry-standard technical control requirements in a simple and automated manner. Audit manager already supports popular industry and regulatory frameworks including NIST 800-53, ISO, SOC, and CSA-CCM. You can see the full list of supported frameworkshere.
Audit-ready evidence:Automated generation of comprehensive verifiable evidence reports to support your compliance claims and overarching governance activity. Audit Manager provides you with a quick execution summary of compliance at a framework level and the ability to deep-dive using control level reports.
Actionable remediation guidance: Insights to swiftly address each compliance gap that is identified.
The compliance audit journey with Audit Manager
The cloud compliance audit process involves defining responsibilities, identifying and mitigating risks, collecting supporting data, and generating a final report. This process requires collaboration between Governance, Risk, and Compliance analysts, compliance managers, developers, and auditors, each with their own specific tasks. Audit Manager streamlines this process for all involved roles, which can help simplify their work and improve efficiency.
Shift left your compliance audit process with Audit Manager.
Customer case study: Deutsche Börse Group
Deutsche Börse Group, an international stock exchange organization and innovative market infrastructure provider, began their strategic partnership with Google Cloud in 2022. Their cloud transformation journey is well under way, which brings with it the challenge of achieving and documenting compliance in their environment.
Florian Rodeit, head of cloud governance for Google Cloud, Deutsche Börse Group, first heard about Audit Manager during a Las Vegas Google Cloud Next 2024 session.
“The Audit Manager product promises a level of automation and audit control that has a lot of potential. At Deutsche Börse Group, we were excited to access the preview, explore the functionality further and build out a joint solution,” he said.
Following the European preview launch of Audit Manager, Deutsche Börse Group and Google Cloud set up a collaborative project to explore automating cloud controls via Audit Manager. Deutsche Börse Group had already created a comprehensive control catalogto manage their cloud control requirements across the organization. They analyzed the Cloud Security Alliance’s Cloud Controls Matrix against their written rules framework to create inputs for Audit Manager, and set out ownership and implementation guidelines for cloud-specific controls.
Now, Deutsche Börse Groupcan use Audit Manager to check if there are resources configured that deviate from the control framework, such as any resources that have been set up outside of approved regions. This provides automated, auditable evidence to support their specific requirements for compliant usage of Google Cloud resources.
Benjamin Möller, expert cloud governance, vice-president, Deutsche Börse Group, has been leading the collaborative project. “Moving forward, we hope that Audit Manager will allow us to automate many of our technical controls — giving us robust assurance that we are compliant, enabling us to quickly identify and rectify non-compliance, and minimizing the manual over-head of audit evidence. We are excited to continue making progress on our joint venture,” he said.
Take the next step
To use Audit Manager, access the tool directly from your Google Cloud console. Navigate to the Compliance tab in your Google Cloud console, and select Audit Manager. For a comprehensive guide on using Audit Manager, please refer to our detailed product documentation. We encourage you to share your feedback on this service to help us improve Audit Manager’s user experience.
We are thrilled to announce new capabilities that make running Dataproc Serverless even faster, easier, and more intelligent.
Elevate your Spark experience with:
Native query execution: Experience significant performance gains with the new Native query execution in the Premium tier.
Seamless monitoring with Spark UI: Track job progress in real time with a built-in Spark UI available by default for all Spark batches and sessions.
Streamlined investigation: Troubleshoot batch jobs from a central “Investigate” tab displaying all the essential metrics highlights and logs filtered by errors automatically.
Proactive autotuning and assisted troubleshooting with Gemini: Let Gemini minimize failures and autotune performance based on historical patterns. Quickly resolve issues using Gemini-powered insights and recommendations.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e461910aaf0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Accelerate your Spark jobs with native query execution
You can unlock considerable speed improvements for your Spark batch jobs in the Premium tier on Dataproc Serverless Runtimes 2.2.26+ or 1.2.26+ by enabling native query execution — no application changes required.
This new feature in Dataproc Serverless Premium tier improved the query performance by ~47%in our tests on queries derived from TPC-DS and TPC-H benchmarks.
Start now by running the native query execution qualification tool that can help you easily identify eligible jobs and estimate potential performance gains. Once you have the list of batch jobs identified for native query execution, you can enable it and have the jobs run faster and potentially save costs.
Seamless monitoring with Spark UI
Tired of wrestling with setting up the persistent history server (PHS) clusters and maintaining them just to debug your Spark batches? Wouldn’t it be easier if you could avoid the ongoing costs of the history server and yet see the Spark UI in real-time?
Until now, monitoring and troubleshooting Spark jobs in Dataproc Serverless required setting up and managing a separate Spark persistent history server. Crucially, each batch job had to be configured to use the history server. Otherwise, the open-source UI would be unavailable for analysis for the batch job. Additionally, the open-source UI suffered from slow navigation between applications.
We’ve heard you, loud and clear. We’re excited to announce a fully managed Spark UI in Dataproc Serverless that makes monitoring and troubleshooting a breeze.
The new Spark UI is built-in and automatically available for every batch job and session in both Standard and Premium tiers of Dataproc Serverless at no additional cost. Simply submit your job and start analyzing performance in real time with the Spark UI right away.
Here’s why you’ll love the Serverless Spark UI:
Traditional Approach
The new Dataproc Serverless Spark UI
Effort
Create and manage a Spark history server cluster. Configure each batch job to use the cluster.
No cluster setup or management required. Spark UI is available by default for all your batches without any extra configuration.The UI can be accessed directly from the Batch / Session details page in the Google Cloud console.
Latency
UI performance can degrade with increased load. Requires active resource management.
Enjoy a responsive UI that automatically scales to handle even the most demanding workloads.
Availability
The UI is only available as long as the history server cluster is running.
Access your Spark UI for 90 days after your batch job is submitted.
Data freshness
Wait for a stage to complete to see that its events are in the UI.
View regularly updated data without waiting for the stage to complete.
Functionality
Basic UI based on open-source Spark.
Enhanced UI with ongoing improvements based on user feedback.
Cost
Ongoing cost for the PHS cluster.
No additional charge.
Accessing the Spark UI
To gain deeper insights into your Spark batches and sessions — whether they’re still running or completed — simply navigate to the Batch Details or Session Details page in the Google Cloud console. You’ll find a “VIEW SPARK UI” link in the top right corner.
The new Spark UI provides the same powerful features as the open-source Spark History Server, giving you deep insights into your Spark job performance. Easily browse both running and completed applications, explore jobs, stages, and tasks, and analyze SQL queries for a comprehensive understanding of the execution of your application. Quickly identify bottlenecks and troubleshoot issues with detailed execution information. For even deeper analysis, the ‘Executors’ tab provides direct links to the relevant logs in Cloud Logging, allowing you to quickly investigate issues related to specific executors.
You can still use the “VIEW SPARK HISTORY SERVER” link to view the Persistent Spark History Server if you had already configured one.
A new “Investigate” tab in the Batch details screen gives you instant diagnostic highlights collected at a single place.
In the “Metrics highlights” section, the essential metrics are automatically displayed, giving you a clear picture of your batch job’s health. You can further create a custom dashboard if you need more metrics.
Below the metrics highlights, a widget “Job Logs” shows the logs filtered by errors, so you can instantly spot and address problems. If you would like to dig further into the logs, you can go to the Logs Explorer.
Proactive autotuning and assisted troubleshooting with Gemini (Preview)
Last but not least, Gemini in BigQuery can help reduce the complexity of optimizing hundreds of Spark properties in your batch job configurations while submitting the job. If the job fails or runs slow, Gemini can save the effort of wading through several GBs of logs to troubleshoot the job.
Optimize performance: Gemini can automatically fine-tune the Spark configurations of your Dataproc Serverless batch jobs for optimal performance and reliability.
Simplify troubleshooting: You can quickly diagnose and resolve issues with slow or failed jobs by clicking “Ask Gemini” for AI-powered analysis and guidance.
Sign up here for a free preview of the Gemini features and “Investigate” tab for Dataproc Serverless.
Climate change is the biggest challenge our society faces. As scientists, governments, and industry leaders gather in Baku, Azerbaijan for the 2024 United Nations Climate Change Conference, a.k.a. COP29, it’s incumbent upon all of us to find innovative solutions that can drive impact at a global scale.
The gravity of climate change requires solutions that go beyond incremental change. To find those solutions, we need the ability to make better decisions about how to approach climate mitigation and adaptation across every human activity — from transport, industry, and agriculture to communications, finance, and housing. This requires processing vast volumes of data generated by these industries. The combination of AI and cloud technologies offer the potential to unlock climate change solutions that can be both transformational and global in scale.
We already have a lot of examples that we can draw from.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eebc08468e0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Today, for example, Google Earth Engine is being used by the Forest Data Partnership, a collaboration for global monitoring of commodity-driven deforestation, to monitor every oil palm plantation around the globe, providing participating companies live early-warning signals for deforestation risks, and dramatically reducing the costs involved in forest monitoring. Similarly, NGIS is using Google Earth Engine to power TraceMark, helping businesses deliver traceability and transparency across global supply chains.
Another example is Global Fishing Watch, an international nonprofit co-founded by Google that is using geospatial analytics and AI to understand how human activity impacts the seas, global industries, climate, biodiversity and more. The datasets map global ocean infrastructure and vessels that don’t publicly broadcast their positions. This helps to advance policy conversations about offshore renewables development, provides insight into carbon dioxide emissions from maritime vessels, and enables marine protection.
It’s clear that AI can process large volumes of data, optimize complex systems, and drive the development of new business models. We see businesses harnessing the technology in the fight against climate change in four ways:
1. Measuring business performance
Businesses are using AI-powered insights to help monitor their advance towards sustainability targets, which ultimately contributes to building business resilience.
In today’s business landscape, this is of paramount importance as companies face growing demands for transparency and accountability regarding their environmental and social impact.
We are seeing cloud and AI being used to de-risk investments, improve transparency, and increase profitability through the use of large-scale datasets, machine learning, and generative AI. These technologies allow companies to analyze their ESG performance, gain insights into climate risks, and monitor supplier behaviors.
For example, Palo Alto Networks partnered with Watershed, a Google Cloud Ready – Sustainability Partner, to measure and track their carbon emissions across their entire business using Google Cloud. This partnership enabled them to gain a comprehensive understanding of their environmental impact and set actionable targets for reducing emissions.
Another example is HSBC, which developed a new credit ranking tool on Google Cloud that allows them to run multiple climate risk scenarios simultaneously. This tool empowers HSBC to make more informed investment decisions while considering the potential impact of climate change on their portfolio.
Secondly, businesses are using AI to optimize their operations and supply chains for energy and resource efficiency, as well as to cut costs.
This is crucial for companies seeking to enhance their sustainability performance while simultaneously improving their bottom line. Through the use of AI and machine learning, cloud technologies empower organizations to optimize their existing operations, improve cost efficiency, and minimize waste.
For example, Geotab, another Google Cloud Ready – Sustainability partner, is managing 75 billion data records in BigQuery for 4 million commercial fleet vehicles every day to optimize vehicle routes, increase driver safety behaviors and accelerate the path to fleet electrification.
3. Identifying cleaner business models
As the world shifts towards more sustainable practices, businesses must adapt and identify new avenues for growth. Cloud and AI is helping businesses do just that. Cloud and AI allow organizations to reimagine their business models, explore new markets, and create innovative products and services that align with their sustainability goals.
Recykal, for instance, has partnered with Google Cloud to build Asia’s largest circular economy marketplace. By leveraging Google Cloud’s AI and machine learning capabilities, Recykal is revolutionizing waste management and promoting sustainable practices in Asia.
Another example is Einride, a company that is reimagining freight transport by using electric, self-driving vehicles and an AI-powered platform. Their innovative approach to logistics is disrupting the transportation industry and contributing to a more sustainable future.
More recently, Climate Engine and Robeco have started using AI and geospatial technologies with their scientific expertise and investment knowledge to inform how publicly traded companies’ actions impact biodiversity. You can read their joint thought leadership paper here.
4. Building more sustainably
Finally, and very importantly, businesses want to ensure that the actual use of cloud and AI technologies doesn’t lead to increased climate impacts. From the get-go, developers need to take concrete steps towards reducing the carbon footprint and cost of their applications in the cloud.
This is why, through our Carbon Sense suite, we provide developers with the tools and resources they need to build and deploy applications in a way that minimizes their environmental impact, all while maintaining cost efficiency.
L’Oréal, for example, leverages Google Cloud’s Carbon Footprint tool to track the gross carbon emissions associated with their cloud usage. This allows L’Oréal to understand the environmental impact of their technology decisions and implement strategies to reduce their footprint.
Finally, Google takes its own carbon footprint very seriously, and is pursuing an ambitious goal to achieve net-zero emissions across all of its operations and value chain, supported by a goal to run on 24/7 carbon-free energy on every grid where it operates by 2030.
Google Cloud is committed to helping organizations of all sizes achieve their sustainability goals. With cloud, data analytics, and AI, we’re delivering new ways to build resilience, reduce costs, and unlock sustainable growth, while also accelerating the impact of organizations’ sustainability initiatives through the smarter use of data. This is an opportunity to drive tangible business results and create a more sustainable future for all.
Crafting the perfect prompt for generative AI models can be an art in itself. The difference between a useful and a generic AI response can sometimes be a well-crafted prompt. But, getting there often requires time-consuming tweaking, iteration, and a learning curve. That’s why we’re thrilled to announce new updates to the AI-powered prompt writing tools in Vertex AI, designed to make prompting easier and more accessible for all developers.
We’re introducing two powerful features designed to streamline your prompt engineering workflow:Generate prompt andRefineprompt.
Imagine you need a prompt to summarize customer reviews about your latest product. Instead of crafting the prompt yourself, you can simply tell the Generate prompt feature your goal. It will then create a comprehensive prompt, including placeholders for the reviews, which you can easily populate with your own data later. Generate prompt takes the guesswork out of prompt engineering by:
Turning simple objectives into tailor-made, effective prompts. This way, you don’t need to agonize over phrasing and keywords.
Generating placeholders for context, like customer reviews, news articles, or code snippets. This allows you to quickly add your specific data and get immediate results.
Speeding up the prompt writing process. Focus on your core tasks, not on perfecting prompt syntax.
Refine prompt: Iterate and improve with AI-powered suggestions
Once you have a prompt, either crafted by Generate prompt or one you’ve written yourself, Refine prompt helps you modify it for optimal performance. Here’s how it works:
Provide feedback: After running your prompt, simply provide feedback on the response, the same way you would critique a writer.
Instant suggestions: Vertex AI generates a new, suggested prompt in one step, taking your feedback into account.
Iterate and improve: You can accept or reject the suggestion and continue iterating by running the refined prompt and providing further feedback.
Prompt refinement boosts the quality of the prompt, while also saving significant times during prompt design. The quality is typically improved by augmenting the prompt instructions in a way that Gemini will better understand.
Below are some sample prompts that were revised with Refine prompt:
Original prompts
After using Prompt Refinement
Suggest engaging lesson plan ideas for art class
Suggest 3 engaging lesson plan ideas for a high school art class, each focusing on a different art form. Be concise and only include the most relevant information, such as the art form, target age group, and key activity.
Plan a schedule for a week with focus time and meeting time. Take in account that there are 2 teams with 6 hour delay
Create a detailed weekly schedule for a team with a 6-hour time difference. The schedule should include:
Specific time blocks for focus time and meetings.
Consideration of overlapping work hours to ensure effective communication and collaboration.
A balance of individual work and team interactions.
Suggestions for time zone conversion tools or strategies to facilitate scheduling.
A powerful duo: Generate prompt meets Refine prompt
These two features work in tandem to help you craft the most effective prompt for your objective – irrespective of your skill level. Generate prompt gets you started quickly, while Refine prompt allows for iterative improvement in five steps:
Define your objective: Tell Generate prompt what you want to achieve.
Generate a prompt: Generate prompt creates a ready-to-use prompt, often with helpful placeholders for context.
Run the prompt and review the output: Execute the prompt with your chosen LLM in Vertex AI.
Refine with feedback: Use Refine prompt to provide feedback on the output and receive AI-powered suggestions for prompt improvement.
Iterate until ideal performance: Continue refining and rerunning your prompt until you achieve your desired results.
How to get started
Go ahead and try out an AI-assisted prompt-writing through our interactive critiquing workflow. Vertex AI’s easy-to-use UI for refining prompts can be tested without setting up a Google Cloud account through this link (to demo without a Google Cloud account, be sure you are logged out of your Google account in your web browser or use incognito mode). For those with an account, you’ll have the ability to save, manage, and fine-tune your prompts.
Generative AI presents both immense opportunities and challenges for the Department of Defense (DoD). The potential to enhance situational awareness, streamline tasks, and improve decision-making is significant. However, the DoD’s unique requirements, especially their stringent security standards for cloud services (IL5), necessitate carefully crafted AI solutions that balance innovation with security.
The DoD’s 2023 Data, Analytics, and Artificial Intelligence Adoption Strategy report emphasizes the need to “strengthen the organizational environment” for AI deployment. This underscores the importance of solutions that seamlessly integrate into existing infrastructure, prioritize data security, and enable responsible and intelligent use of AI.
Google Public Sector’s 4 AI pillars: A framework for DoD AI adoption
To meet the DoD’s unique challenges, Google AI for Public Sector has focused on 4 areas when designing solutions to help empower the DoD:
Adaptive: AI solutions must seamlessly integrate into the DoD’s existing complex and evolving technology ecosystem. Google prioritizes adaptable solutions that minimize disruption and enable rapid adoption, aligning with the DoD’s focus on agile innovation.
Secure: Protecting sensitive DoD data is paramount. Google’s AI solutions are engineered with robust security measures, including Zero Trust architecture and adherence to IL5 requirements, ensuring the confidentiality and integrity of critical information.
Intelligent: Google’s AI capabilities are designed to deliver actionable insights from vast and diverse datasets. By harnessing the power of machine learning and natural language processing, our solutions enable the DoD to make data-driven decisions with greater speed and accuracy.
Responsible: Google is committed to developing and deploying AI in a responsible and ethical manner. Our AI Principles guide our research, product development, and deployment decisions, ensuring that AI is used for good and avoids harmful applications.
Breaking down data silos and delivering insights with enterprise search
Google Cloud’s solution for enterprise search is a powerful tool designed to help organizations overcome the challenges of data fragmentation. It acts as a central hub, seamlessly connecting to diverse data sources across the department, including structured and unstructured data.
Intelligent Information Retrieval: Leveraging advanced AI and natural language processing, enterprise search delivers precise and contextually relevant answers to queries, even when dealing with unstructured data like documents, images, and reports.
Seamless Integration: Federated search combined with Retrieval Augmented Generation (RAG) provides relevant query responses without the need to move data or train a custom Large Language Model (LLM).
Enhanced Transparency and Trust: The solution provides links to source documents alongside AI-generated responses, allowing users to verify information and build confidence in the system.
Robust Security: With all services used in the solution submitted for IL5 accreditation, enterprise search incorporates industry-leading security measures, including Role-Based Access Control (RBAC) and Common Access Card (CaC) compatibility, to safeguard sensitive DoD data.
Future-Proof Flexibility: The solution supports a wide range of Large Language Models (LLMs), including Google’s Gemini family of models and Gemma, our family of lightweight, state-of-the-art open models. Google offers choice, adaptability and avoids vendor lock-in, allowing the DoD to leverage the latest AI advancements without extensive redevelopment.
Google Cloud’s generative AI infused solution directly supports the DoD’s mission by consolidating data access, enhancing discoverability, and providing rapid, accurate insights, leading to improved decision-making and a strategic advantage.
Google Cloud is committed to supporting the DoD’s AI journey by providing solutions that are not only powerful and innovative, but also secure, responsible, and adaptable. By empowering the DoD to harness the full potential of its data, we are helping to enable more agile, informed, and effective service members. Learn more about how Google Public Sector’s AI solutions can empower your agency and visit Google AI for Public Sector for examples of how we are helping accelerate mission impact with AI.
Welcome to the first Cloud CISO Perspectives for November 2024. Today I’m joined by Andy Wen, Google Cloud’s senior director of product management for Google Workspace, to discuss a new Google survey into the high security costs of legacy tech.
As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.
–Phil Venables, VP, TI Security & CISO, Google Cloud
aside_block
<ListValue: [StructValue([(‘title’, ‘Get vital board insights with Google Cloud’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6f0df2ddf0>), (‘btn_text’, ‘Visit the hub’), (‘href’, ‘https://cloud.google.com/solutions/security/board-of-directors?utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q2-global-PROD941-physicalevent-er-CEG_Boardroom_Summit&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
Confronting the high security cost of legacy tech
By Phil Venables, VP, TI Security & CISO, Google Cloud, and Andy Wen, senior director, product management, Google Workspace
From a business perspective, it’s easy to understand why many organizations continue to rely on outdated technology. Replacing older systems can be expensive, but relying on them comes with hidden costs that can far outstrip the benefits.
Legacy technology can greatly increase the business and security risks that an organization will face, a serious concern given that the global average total cost of a security breach in 2024 was $4.88 million. Despite the availability of a plethora of more modern solutions, we’re still seeing too many organizations rely on defenses that were designed for the desktop era, according to a new Google Workspace global cyber security survey of more than 2,000 security and IT decision-makers.
Phil Venables, VP, TI Security & CISO, Google Cloud
The numbers paint a dire picture of the security impact of operating legacy systems:
71% said that legacy technology has left organizations less prepared for the future.
63% believe that their organization’s technology landscape is less secure than it was in the past.
More than 66% told us that their organizations are investing more time and money than ever in securing their environments — but still experience costly security incidents.
81% of organizations experience at least one security incident per year.
Organizations experience eight security incidents on average per year.
We know many security leaders have convinced the business to invest in more security tools, because the survey also found that 61% of organizations are using more security tools than they did two years ago. Yet while more than two-thirds of organizations are investing more time and money in securing their environments, many are still experiencing expensive security incidents.
Andy Wen, senior director, product management, Google Workspace
Environments with more security tools often attempt to compensate for legacy platforms that continue to be vulnerable to security incidents. Meanwhile, 81% of security leaders believe cloud-first platforms are safer than legacy platforms.
Organizations with 10 or more security tools reported an average of 14 security incidents per year, with 34% of them spending more than $250,000 on incidents per year.
Organizations with fewer than 10 tools reported an average of six incidents per year, with 19% of them spending more than $250,000 on incidents per year.
“The solution is not more security tools, but more secure tools,” said CISA Director, Jen Easterly, at her mWISE Conference keynote in September.
We have also made this point often. To be truly resilient in today’s security landscape, organizations must consider an IT overhaul and rethink their strategy toward solutions with modern, secure-by-design architectures that nullify classes of vulnerabilities and attack vectors.
It may be daunting to take on an overhaul, especially for large organizations, but security leaders need to look at investing in a cloud-first solution to be resilient. The change can be made in small steps to minimize disruption and evaluate return on investment, such as using Chrome Enterprise for secure browsing and providing Google Workspace to specific teams.
The bottom line is that adopting modern technology can help eliminate entire classes of threats, as well as improve business outcomes.
We’d like to highlight three customer interactions that underscore organizational value gained by modernizing. Organizations need a centralized solution that can evolve, especially as attacks continue to increase in quantity and sophistication. We recently did some work with the cybersecurity company Trellix, which did a complete overhaul of its security infrastructure.
Trellix was running into issues where its old software stack felt stagnant and didn’t connect into new things they were doing or building. These older solutions made it hard to control where data was sitting and who was accessing it. They’ve since fully migrated to Google Workspace, adopted the Zero Trust capabilities we’ve built in, and augmented them with their own security solutions, including a security operations console, email security, and endpoint protection.
Employees can now chat, email, view files, edit documents, and join meetings from their device of choice without worrying about security and access permissions. All these capabilities live within the same platform, making it easier and simpler for security admins to oversee data safety with features like endpoint management and Zero Trust access controls in Workspace — without slowing down employee collaboration.
Similarly, the city of Dearborn, Mich., replaced its legacy email solution. After making the switch to Gmail, users noticed a meaningful decrease in spam, phishing, and malware, which helped reduce their cybersecurity risks.
Humana’s dilemma was driven by a legacy suite of desktop-based office applications that its IT team needed to spend 70% of its time maintaining. Humana’s IT team rolled out Google Workspace to 13,000 Humana employees in the field and in the office in four months, migrating 22 terabytes of data. Workspace’s built-in security features and browser-based apps saved the team time and reduced costs, and also led to a steady reduction in help desk tickets during and after rollout.
For more leadership guidance from Google Cloud experts, please see our CISO Insights hub.
aside_block
<ListValue: [StructValue([(‘title’, ‘Join the Google Cloud CISO Community’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6f0df2d370>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://rsvp.withgoogle.com/events/ciso-community-interest?utm_source=cgc-blog&utm_medium=blog&utm_campaign=2024-cloud-ciso-newsletter-events-ref&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
In case you missed it
Here are the latest updates, products, services, and resources from our security teams so far this month:
Join our upcoming Security Talks to unlock the Defender’s Advantage: Our next Security Talks is coming on Nov. 19, and will focus on the Defender’s Advantage. This free, day-long virtual event is packed with insights and strategies to help you proactively secure your cloud environment. Register today.
Cyber risk top 5: What every board should know: Boards should learn about security and digital transformation to better manage their organizations. Here’s five top risks they need to know — and prepare for. Read more.
Mandatory MFA is coming to Google Cloud. Here’s what you need to know: To help keep our customers secure, starting in 2025 we will require them to use MFA when accessing Google Cloud. Read more.
Google Cloud expands CVE program: As part of our commitment to security and transparency on vulnerabilities found in our products and services, we now will issue CVEs for critical Google Cloud vulnerabilities. Read more.
Our 2025 Forecast report: Get ready for the next year in cybersecurity with our 2025 Forecast report, now available. Read more.
From AI to Zero Trust, Google Cloud Security delivers comprehensive public sector solutions: Google Cloud Security is committed to helping government agencies and organizations strengthen their defenses, and we recently made several announcements at the Google Public Sector Summit. Read more.
FedRAMP High development in the cloud: Code with Cloud Workstations: A Forrester Total Economic Impact™ (TEI) study found that Google Cloud Workstations enhance consistency, agility, and security while reducing costs and risks. Read more.
Please visit the Google Cloud blog for more security stories published this month.
(In)tuned to take-overs: Abusing Intune permissions for lateral movement and privilege escalation: Learn how the Mandiant Red Team was able to move laterally from a customer’s on-premises environment to their Microsoft Entra ID tenant, and obtained privileges to compromise existing Entra ID service principals installed in the tenant. Also learn how to defend against it. Read more.
Flare-On 11 Challenge solutions: The latest Flare-On challenge is over, and it proved a doozy: Only 275 players out of 5,300 completed all 10 stages. Read more.
Please visit the Google Cloud blog for more threat intelligence stories published this month.
Now hear this: Google Cloud Security and Mandiant podcasts
Gen AI security: Unseen attack surfaces and pentesting lessons: What’s the current state of gen AI security? From common mistakes to novel attack surfaces to unique challenges, podcast hosts Anton Chuvakin and Tim Peacock discuss with Ante Gojsalic, co-founder and CTO, SplxAI, today’s gen AI security concerns and their potential impact on tomorrow’s tech. Listen here.
Get the Google Security Operations perspective on SIEM and security data lakes: What’s a disassembled SIEM, and why you should care: Travis Lanham, uber tech lead for Security Operations Engineering, Google Cloud, goes SIEM-deep with Anton and Tim. Listen here.
To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in two weeks with more security-related updates from Google Cloud.
It’s an exciting time in the world of data and analytics, with more organizations harnessing the power of data and AI to help transform and grow their businesses. But in a threat landscape with increasingly sophisticated attacks around every corner, ensuring the security and integrity of that data is critical.
Google Cloud offers a comprehensive suite of tools to help protect your data while unlocking its potential. In our new ebook, Building a Secure Data Platform with Google Cloud, we dig into the many data security capabilities within Google Cloud and share how they can help support data-based innovation strategies.
Take a peek inside the ebook, then download the full version here.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6f0e6fc0d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Unlock data platform-level security with BigQuery BigQuery, Google Cloud’s unified data platform, offers a robust set of integrated security features to help you safeguard your data. The platform automatically encrypts all data at rest, which provides a foundational layer of defense against unauthorized access. For data sharing, BigQuery Analytics Hub and data clean rooms allow you to efficiently, securely, and easily share data across organizational boundaries. The platform also includes Dataplex, which enables you to implement comprehensive policies to govern how data is accessed, used, and shared within your organization.
Shield assets with granular access controls and guardrails With Cloud Identity and Access Management (IAM), you can manage access to critical data across BigQuery, Cloud Run, Cloud Run functions, and Google Kubernetes Engine (GKE) resources. Organization restrictions place further limits on which users can access resources in your organization. Combined with Cloud IAM, this feature supports your organization policies and helps you maintain a secure perimeter around your Google Cloud environment.
Create security boundaries with perimeter and data protection Google Cloud offers several ways to reinforce your perimeter. VPC Service Controls help prevent data exfiltration from cloud resources, providing precise control over access and movement by external entities or by insiders.
Meanwhile, with Sensitive Data Protection, you can identify and classify your sensitive data within BigQuery, which can help you implement targeted protection measures such as masking, tokenization, and redaction. You can also gain even more granular control over your encryption keys with Customer-managed encryption keys for BigQuery.
Strengthen data security posture with automated monitoring and compliance Establishing robust security controls for your data is essential for improving your security posture, but it’s just as important to monitor your environment for threats and maintain compliance with industry standards. Security Command Center gives you a comprehensive view of your security posture with direct visibility into your BigQuery datasets. With Cloud Logging, you can collect, store, and analyze logs to gain insights into system activities, detect anomalies, and respond to security incidents. Assured Workloads further simplifies compliance, providing peace of mind that you’ve established strong baseline controls and compliant configurations.
All-in-one data security with integrated solutions from Google Cloud
Building a secure data ecosystem requires a multi-layered approach. With comprehensive security features from Google Cloud, you can safeguard your sensitive data, comply with industry regulations, and discover the full potential of your data. Dive deeper into these tools, solutions, and strategies in the full ebook — Building a Secure Data Platform with Google Cloud — to ensure the safety and integrity of your organization’s most valuable asset. Download the full version here.
As open-source large language models (LLMs) become increasingly popular, developers are looking for better ways to access new models and deploy them on Cloud Run GPU. That’s why Cloud Run now offers fully managed NVIDIA GPUs, which removes the complexity of driver installations and library configurations. This means you’ll benefit from the same on-demand availability and effortless scalability that you love with Cloud Run’s CPU and memory, with the added power of NVIDIA GPUs. When your application is idle, your GPU-equipped instances automatically scale down to zero, optimizing your costs.
In this blog post, we’ll guide you through deploying the Meta Llama 3.2 1B Instruction model on Cloud Run. We’ll also share best practices to streamline your development process using local model testing with Text Generation Inference (TGI) Docker image, making troubleshooting easy and boosting your productivity.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6f0cf8f040>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Why Cloud Run with GPU?
There are four critical reasons developers benefit from deploying open models on Cloud Run with GPU:
Fully managed: No need to worry about drivers, libraries, or infrastructure.
On-demand scaling: Scale up or down automatically based on demand.
Cost effective: Only pay for what you use, with automatic scaling down to zero when idle.
Performance: NVIDIA GPU-optimized for Meta Llama 3.2.
Initial Setup
First, create a Hugging Face token.
Second, check that your Hugging Face token has permission to access and download Llama 3.2 model weight here. Keep your token handy for the next step.
Third, use Google Cloud’s Secret Manager to store your Hugging Face token securely. In this example, we will be using Google user credentials. You may need to authenticate for using gcloud CLI, setting default project ID, and enable necessary APIs, and grant access to Secret Manager and Cloud Storage.
code_block
<ListValue: [StructValue([(‘code’, ‘# Authenticate CLIrngcloud auth loginrnrn# Set default projectrngcloud config set project <your_project_id>rnrn# Create new secret key, remember to update <your_huggingface_token>rngcloud secrets create HF_TOKEN –replication-policy=”automatic”rnecho -n <your_huggingface_token> | gcloud secrets versions add HF_TOKEN –data-file=-rnrn# Retrieve the keyrnHF_TOKEN=$(gcloud secrets versions access latest –secret=”HF_TOKEN”)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6f0cf8f490>)])]>
Local debugging
Install huggingface_cli python package in your virtual environment.
Run huggingface-cli login to set up a Hugging Face credential.
Use the TGI Docker image to test your model locally. This allows you to iterate and debug your model locally before deploying it to Cloud Run.
Now, we will create a new Cloud Run service using the deployment script as follows. (Remember to update BUCKET_NAME). You may also need to update the network and subnet name as well.
New solutions, old problems. Artificial intelligence (AI) and large language models (LLMs) are here to signal a new day in the cybersecurity world, but what does that mean for us—the attackers and defenders—and our battle to improve security through all the noise?
Data is everywhere. For most organizations, the access to security data is no longer the primary issue. Rather, it is the vast quantities of it, the noise in it, and the disjointed and spread-out nature of it. Understanding and making sense of it—THAT is the real challenge.
When we conduct adversarial emulation (red team) engagements, making sense of all the network, user, and domain data available to us is how we find the path forward. From a defensive perspective, efficiently finding the sharpest and most dangerous needles in the haystack—for example, easily accessible credentials on fileshares—is how we prioritize, improve, and defend.
How do you make sense of this vast amount of structured and unstructured data, and give yourself the advantage?
Data permeates the modern organization. This data can be challenging to parse, process, and understand from a security implication perspective, but AI might just change all that.
This blog post will focus on a number of case studies where we obtained data during our complex adversarial emulation engagements with our global clients, and how we innovated using AI and LLM systems to process this into structured data that could be used to better defend organizations. We will showcase the lessons learned and key takeaways for all organizations and highlight other problems that can be solved with this approach for both red and blue teams.
Approach
Data parsing and understanding is one of the biggest early benefits of AI. We have seen many situations where AI can help process data at a fast rate. Throughout this post, we use an LLM to process unstructured data, meaning that the data did not have a structure or format that we knew about before parsing the data.
If you want to try these examples out yourself, please make sure you use either a local model, or you have permission to send the data to an external service.
Getting Structured Data Out of an LLM
Step one is to get the data into a format we can use. If you ever used an LLM, you will have noticed it will output as a story or prose text, especially if you use chat-based versions. For a lot of use cases, this is fine; however, we want to analyze the data and get structured data. Thus, the first problem we have to solve is to get the LLM to output the data in a format we can specify. The simple method is to ask the LLM to output the data in a machine readable format like JSON, XML, or CSV. However, you will quickly notice that you have to be quite specific with the data format, and the LLM can easily output data in another format, ignoring your instructions.
Luckily for us, other people have encountered this problem and have solved it with something called Guardrails. One of the projects we have found is called guardrails-ai. It is a Python library that allows you to create guardrails—specific requirements—for a model based on Pydantic.
To illustrate, take a simple Python class from the documentation to validate a pet from the output of the LLM:
from pydantic import BaseModel, Field
class Pet(BaseModel):
pet_type: str = Field(description="Species of pet")
name: str = Field(description="a unique pet name")
You can use the next code from the Guardrails documentation to process the output of the LLM into a structured object:
from guardrails import Guard
import openai
prompt = """
What kind of pet should I get and what should I name it?
${gr.complete_json_suffix_v2}
"""
guard = Guard.from_pydantic(output_class=Pet, prompt=prompt)
raw_output, validated_output, *rest = guard(
llm_api=openai.completions.create,
engine="gpt-3.5-turbo-instruct"
)
print(validated_output)
If we look at what this library generates underwater for this prompt, we see that it adds a structured object part with the instructions for the LLM to output data in a specific way. This streamlines the way you can get structured data from an LLM.
Figure 1: The generated prompt from the Pydantic model
For the next use case, we will show the Pydantic models we’ve created to process the output.
Red Team Use Cases
The next sections contain some use cases where we can use an LLM to get structured data out of data obtained. The use cases are divided into three categories of the attack lifecycle:
Initial Reconnaissance
Escalate Privileges
Internal Reconnaissance
Figure 2: Attack lifecycle
Initial Reconnaissance
Open Source Intelligence (OSINT) is an important part of red teaming. It includes gathering data about the target organization from news articles, social media, and corporate reports.
This information can then be used in other red team phases such as during phishing. For defenders, it helps them understand which parts of their organization are exposed to the internet, anticipating a possible future attack. In the next use case, we talk about processing social media information to process roles and extract useful information.
Use Case 1: Social Media Job Functions Information
During OSINT, we often try to get information from employees about their function in their company. This helps with performing phishing attacks, as we do not want to target IT professionals, especially those that work in cybersecurity.
Social media sites allow their users to write about their job titles in a free format. This means that the information is unstructured and can be written in any language and any format.
We can try to extract the information from the title with simple matches; however, because the users can fill in anything and in any language, this problem can be better solved with an LLM.
Data Model
First, we create a Pydantic model for the Guardrail:
class RoleOutput(BaseModel):
role: str = Field(description="Role being analyzed")
it: bool = Field(description="The role is related to IT")
cybersecurity: bool = Field(description="The role is related to
CyberSecurity")
experience_level: str = Field(
description="Experience level of the role.",
)
This model has two Boolean options if the role is IT or cybersecurity related. Additionally, we would like to know the experience level of the role.
Prompt
Next, let’s create a prompt to instruct the LLM to extract the requested information from the role. This prompt is quite simple and just asks the LLM to fill in the data.
Given the following role, answer the following questions.
If the answer doesn't exist in the role, enter ``.
${role}
${gr.complete_xml_suffix_v2}
The two last lines are placeholders used by guardrails-ai.
Results
To test the models, we have scraped the titles that employees use on social media. This dataset contained the titles that the employees used and contained 235 entries. For testing, we used the gemini-1.0-pro model.
Gemini managed to parse 232 entries. The results are shown in Table 1.
Not IT
IT
Cybersecurity
Gemini
183
49
5
Manual evaluation by a red team operator
185
47
5
False positive
1
3
0
Table 1: Results of Gemini parsing 232 job title entries
In the end, Gemini processed the roles quite on par with a human. Most of the false positives were questionable because it is not very clear if the role was actually IT related. The experience level did not perform well, as the model deemed the experience level as “unknown” or “none” for most of the entries. To resolve this issue, the field was changed so that the experience level should be a number from 1 to 10. After running the analysis again, this yielded better results for the experience level. The lowest experience levels (1–4) contained function titles like “intern,” “specialist,” or “assistant.” This usually indicated that the person had been employed at that role for a shorter period of time. The updated data model is shown as follows:
class RoleOutput(BaseModel):
role: str = Field(description="Role being analyzed")
it: bool = Field(description="The role is related to IT")
cybersecurity: bool = Field(description="The role is related to
CyberSecurity")
experience_level: int = Field(
description="Estimate of the experience level of the role on
a scale of 1-10. Where 1 is low experience and 10 is high.",
)
This approach helped us to sort through a large dataset of phishing targets by identifying employees that did not have IT and cybersecurity roles, and sorting them by experience level. This can speed up target selection for large organizations and may allow us to better emulate attackers by changing the prompts or selection criteria. To defend against this, data analysis is more difficult. In theory, you can instruct all your employees to include “Cybersecurity” in their role, but that does not scale well or solve the underlying phishing problem. The best approach with regards to phishing is, in our experience, to invest into phishing resistant multifactor authentication (MFA) and application allowlisting. If applied well, these solutions can mitigate phishing attacks as an initial access vector.
Escalate Privileges
Once attackers establish a foothold into an organization, one of their first acts is often to improve their level of access or control through privilege escalation. There are quite a few methods that can be used for this. It comes in a local system-based variety as well as wider domain-wide types, with some based on exploits or misconfigurations, and others based on finding sensitive information when searching through files.
Our focus will be on the final aspect, which aligns with our challenge of identifying the desired information within the vast amount of data, like finding a needle in a haystack.
Use Case 2: Credentials in Files
After gaining initial access to the target network, one of the more common enumeration methods employed by attackers is to perform share enumeration and try to locate interesting files. There are quite a few tools that can do this, such as Snaffler.
After you identify files that potentially contain credentials, you can go through them manually to find useful ones. However, if you do this in a large organization, there is a chance that you will have hundreds to thousands of hits. In that case, there are some tools that can help with finding and classifying credentials like TruffleHog and Nosey Parker. Additionally, the Python library detect-secrets can help with this task.
Most of these tools look for common patterns or file types that they understand. To cover unknown file types or credentials in emails or other formats, it might instead be valuable to use an LLM to analyze the files to find any unknown or unrecognized formats.
Technically, we can just run all tools and use a linear regression model to combine the results into one. An anonymized example of a file with a password that we encountered during our tests is shown as follows:
@Echo Off
Net Use /Del * /Yes
Set /p Path=<"path.txt"
Net Use %Path% Welcome01@ /User:CHAOS.LOCALWorkstationAdmin
If Not Exist "C:Data" MKDIR "C:Data"
Copy %Path%. C:Data
Timeout 02
Data Model
We used the following Python classes to instruct Gemini to retrieve credentials with an optional domain. One file can contain multiple credentials, so we use a list of credentials to instruct Gemini to optionally retrieve multiple credentials from one file.
class Credential(BaseModel):
password: str = Field(description="Potential password of an account")
username: str = Field(description="Potential username of an account")
domain: Optional[str] = Field(
description="Optional domain of an account", default=""
)
class ListOfCredentials(BaseModel):
credentials: list[Credential] = []
Prompt
In the prompt, we give some examples of what kind of systems we are looking for, and output into JSON once again:
Given the following file, check if there are credentials in the file.
Only include results if there is at least one username and password.
If the domain doesn't exist in the file, enter `` as a default value.
${file}
${gr.complete_xml_suffix_v2}
Results
We tested on 600 files, where 304 contain credentials and 296 do not. Testing occurred with the gemini-1.5 model. Each file took about five seconds to process.
To compare results with other tools, we also tested Nosey Parker and TruffleHog. Both NoseyParker and Truffle Hog are made to find credentials in a structured way in files, including repositories. Their use case is usually for known file formats and randomly structured files.
The results are summarized in Table 2.
Tool
True Negative
False Positive
False Negative
True Positive
Nosey Parker
284 (47%)
12 (2%)
136 (23%)
168 (28%)
TruffleHog
294 (49%)
2 (<1%)
180 (30%)
124 (21%)
Gemini
278 (46%)
18 (3%)
23 (4%)
281 (47%)
Table 2: Results of testing for credentials in files, where 304 contain them and 296 do not
In this context, the definitions of true negative, false positive, false negative, and true positive are as follows:
True Negative: A file does not contain any credentials, and the tool correctly indicates that there are no credentials.
False Positive: The tool incorrectly indicates that a file contains credentials when it does not.
False Negative: The tool incorrectly indicates that a file does not contain any credentials when it does.
True Positive: The tool correctly indicates that a file contains credentials.
In conclusion, Gemini finds the most files with credentials, at a cost of a slightly higher false positive rate. TruffleHog has the lowest false positive rate, but also finds the least amount of true positives. This is to be expected, as a higher true positive rate usually is accompanied by a higher false positive rate. The current dataset has almost an equal number of files with and without credentials—in real-world scenarios this ratio can differ wildly, which means that the false positive rate is still important even though the percentages are quite close.
To optimize this approach, you can use all three tools, combine the output signals to a single signal, and then sort the potential files based on this combined signal.
Defenders can, and should, use the same techniques previously described to enumerate the internal file shares and remove or limit access to files that contain credentials. Make sure to check what file shares each server and workstation exposes to the network, because in some cases file shares are exposed accidentally or were forgotten about.
Internal Reconnaissance
When attackers have gained a better position in the network, the next step in their playbooks is understanding the domain in which they have landed so they can construct a path to their ultimate goal. This could be full domain control or access to specific systems or users, depending on the threat actor’s mission. From a red team perspective, we need to be able to emulate this. From a defender’s perspective, we need to find these paths before the attackers exploit them.
The main tool that red teamers use to analyze Active Directory is BloodHound, which uses a graph database to find paths in the Active Directory. BloodHound is executed in two steps. First, an ingester retrieves the data from the target Active Directory. Second, this data is ingested and analyzed by BloodHound to find attack paths.
Some tools that can gather data to be used in BloodHound are:
Sharphound
Bloodhound.py
Rusthound
Adexplorer
Bofhound
Soaphound
These tools gather data from the Active Directory and other systems and output it in a format that BloodHound can read. In theory, if we have all the information about the network in the graph, then we can just query the graph to figure out how to achieve our objective.
To improve the data in BloodHound, we have thought of additional use cases. Use Case 3 is about finding high-value systems. Discovering more hidden edges in BloodHound is part of Use Case 4 and Use Case 5.
Use Case 3: High-Value Target Detection in Active Directory
By default, BloodHound deems some groups and computers as high value. One of the main activities in internal reconnaissance is figuring out which systems in the client’s network are high-value targets. Some examples of systems that we are interested in, and that can lead to domain compromise, are:
Backup systems
SCCM
Certificate services
Exchange
WSUS systems
There are many ways to indicate which servers are used for a certain function, and it depends on how the IT administrators have configured it in their domain. There are some fields that may contain data in various forms to indicate what the system is used for. This is a prime example of unstructured data that might be analyzable with an LLM.
The following fields in the Active Directory might contain the relevant information:
Name
Samaccountname
Description
Distinguishedname
SPNs
Data Model
In the end, we would like to have a list of names of the systems the LLM has deemed high value. During development, we noticed that LLM results improved dramatically if you asked it to specify a reason. Thus, our Pydantic model looks like this:
class HighValueSystem(BaseModel):
name: str = Field(description="Name of this system")
reason: str = Field(description="Reason why this system is
high value", default="")
class HighValueResults(BaseModel):
systems: list[HighValueSystem] = Field(description="high value
systems", default=[])
Prompt
In the prompt, we give some examples of what kind of systems we are looking for:
Given the data, identify which systems are high value targets,
look for: sccm servers, jump systems, certificate systems, backup
systems and other valuable systems. Use the first (name) field to
identify the systems.
Results
We tested this prompt on a dataset of 400 systems and executed it five times. All systems were sent in one query to the model. To accommodate this, we used the gemini-1.5 model because it has a huge context window. Here are some examples of reasons Gemini provided, and what we think the reason was based off:
Domain controller: Looks like this was based on the “OU=Domain Controllers” distinguishedname field of BloodHound
Jumpbox: Based on the “OU=Jumpboxes,OU=Bastion Servers” distinguishedname
Lansweeper: Based on the description field of the computer
Backup Server: Based on “OU=Backup Servers” distinguishedname
Some of the high-value targets are valid yet already known, like domain controllers. Others are good finds, like the jumpbox and backup servers. This method can process system names in other languages and more verbose descriptions of systems to determine systems that may be high value. Additionally, this method can be adapted to allow for a more specific query—for example, that might suit a different client environment:
Given the data, identify which systems are related to
SWIFT. Use the first (name) field to identify the systems.
In this case, the LLM will look for SWIFT servers and may save you some time searching for it manually. This approach can potentially be even better when you combine this data with internal documentation to give you results, even if the Active Directory information is lacking any information about the usage of the system.
For defenders, there are some ways to deal with this situation:
Limit the amount of information in the Active Directory and put the system descriptions in your documentation instead of within the Active Directory
Limit the amount of information a regular user can retrieve from the Active Directory
Monitor LDAP queries to see if a large amount of data is being retrieved from LDAP
Use Case 4: User Clustering
After gaining an initial strong position, and understanding the systems in the network, attackers will often need to find the right users to compromise to gain further privileges in the domain. For defenders, legacy user accounts or administrators with too many rights is a common security issue.
Administrators often have multiple user accounts: one for normal operations like reading email and using it on their workstations, and one or multiple administrator accounts. This separation is done to make it harder for attackers to compromise the administrator account.
There are some common flaws in the implementations that sometimes make it possible to bypass these separations. Most of the methods require the attacker to cluster the users together to see which accounts belong to the same employee. In many cases, this can be done by inspecting the Active Directory objects and searching for patterns in the display name, description, or other fields. To automate this, we tried to find these patterns with Gemini.
Data Model
For this use case, we would like to have the account’s names that Gemini clusters together. During initial testing, the results were quite random. However, after adding a “reason” field, the results improved dramatically. So we used the next Pydantic model:
class User(BaseModel):
accounts: list[Account] = Field(
description="accounts that probably belongs
to this user", default=[]
)
reason: str = Field(
description="Reason why these accounts belong
to this user", default=""
)
class UserAccountResults(BaseModel):
users: list[User] = Field(description="users with multiple
accounts", default=[])
Prompt
In the prompt, we give some examples of what kind of systems we are looking for:
Given the data, cluster the accounts that belong to a single person
by checking for similarities in the name, displayname and sam.
Only include results that are likely to be the same user. Only include
results when there is a user with multiple accounts. It is possible
that a user has more than two accounts. Please specify a reason
why those accounts belong to the same user. Use the first (name)
field to identify the accounts.
Results
The test dataset had about 900 users. We manually determined that some users have two to four accounts with various permissions. Some of these accounts had the same pattern like “user@test.local” and “adm-user@test.local.” However, other accounts had patterns where the admin account was based on the first couple of letters. For example, their main account had the pattern matthijs.gielen@test.local, and the admin account was named: adm-magi@test.local. To keep track of those accounts, the description of the admin account contained some text similar to “admin account of Matthijs Gielen.”
With this prompt, Gemini managed to cluster 50 groups of accounts in our dataset. After manual verification, some of the results were discarded because they only contained one account in the cluster. This resulted in 43 correct clusters of accounts. Manually, we found the same correlation; however, where Gemini managed to output this information in a couple of minutes, manually this took quite a bit longer to analyze and correlate all accounts. This information was used in preparation for further attacks, as shown in the next use case.
Use Case 5: Correlation Between Users and Their Machines
Knowing which users to target or defend is often not enough. We also need to find them within the network in order to compromise them. Domain administrators are (usually) physical people; they need somewhere to type in their commands and perform administrative actions. This means that we need to correlate which domain administrator is working from which workstation. This is called session information, and BloodHound uses this information in an edge called “HasSession.”
In the past, it was possible to get all session information with a regular user during red teaming.
Using the technique in Use Case 4, we can correlate the different user accounts that one employee may have. The next step is to figure out which workstation belongs to that employee. Then we can target that workstation, and from there, hopefully recover the passwords of their administrator accounts.
In this case, employees have corporate laptops, and the company needs to keep track of which laptop belongs to which employee. Often this information is stored in one of the fields of the computer object in the Active Directory. However, there are many ways to do this, and using Gemini to parse the unstructured data is one such example.
Data Model
This model is quite simple, we just want to correlate machines to their users and have Gemini give us a reason why—to improve the output of the model. Because we will send all users and all computers at once, we will need a list of results.
class UserComputerCorrelation(BaseModel):
user: str = Field(description="name of the user")
computer: str = Field(description="name of the computer")
reason: str = Field(
description="Reason why these accounts belong to this user",
default=""
)
class CorrelationResults(BaseModel):
results: list[UserComputerCorrelation] = Field(
description="users and computers that correlate", default=[]
)
Prompt
In the prompt, we give some examples of what kind of systems we are looking for:
Given the two data sets, find the computer that correlates
to a user by checking for similarities in the name, displayname
and sam. Only include results that are likely to correspond.
Please specify a reason why that user and computer correlates.
Use the first (name) field to identify the users and computers.
Results
The dataset used contains around 900 users and 400 computers. During the assignment, we determined that the administrators correlated users and their machines via the description field of the computer, which was sort of equal to the display name of the user. Gemini correctly picked up this connection, correctly correlating around 120 users to their respective laptops (Figure 3).
Figure 3: Connections between user and laptop as correlated by Gemini
Gemini helped us to select an appropriate workstation, which enabled us to perform lateral movement to a workstation and obtain the password of an administrator, getting us closer to our goal.
To defend against these threats, it can be valuable to run tools like BloodHound in the network. As discussed, BloodHound might not find all the “hidden” edges in your network, but you can add these yourself to the graph. This will allow you to find more Active Directory-based attack paths that are possible in your network and mitigate these before an attacker has an opportunity to exploit those attack paths.
Conclusion
In this blog post, we looked at processing red team data using LLMs to aid in adversarial emulation or improving defenses. These use cases were related to processing human-generated, unstructured data. Table 3 summarizes the results.
Use Case
Accuracy of the Results
Usefulness
Roles
High: There were a few false positives that were in the gray area.
High: Especially when going through a large list of roles of users, this approach will provide fairly fast results.
Credentials in files
High: Found more credentials than comparable tools. More testing should look into the false-positive rate in real scenarios.
Medium: This approach finds a lot more results; however, processing it with Gemini is a lot slower (five seconds per file) than many other alternatives.
High-value targets
Medium: Not all results were new, nor were all high-value targets.
Medium: Some of the results were useful; however, all of them still require manual verification.
Account clustering
High: After taking into account the clusters with one account, the other ones were well clustered.
High: Clustering users is most of the time a tedious process to do manually. It gives fairly reliable results if you filter out the results with only one account.
Computer correlation
High: All results were correctly correlated users to their computers.
High: This approach produces accurate results potentially providing insights into extra possible attack paths.
Table 3: The results of our experiments of data processing with Gemini
As the results show, using an LLM like Gemini can help in converting this type of data into structured data to aid attackers and defenders. However, keep in mind that LLMs are not a silver bullet and have limitations. For example, they can sometimes produce false positives or be slow to process large amounts of data.
There are quite a few use cases we have not covered in this blog post. Some other examples where you can use this approach are:
Correlating user groups to administrator privileges on workstations and servers
Summarizing internal website content or documentation to search for target systems
Ingesting documentation to generate password candidates for cracking passwords
The Future
This was just an initial step that we on the Advanced Capabilities team on the Mandiant Red Team have explored so far when using LLMs for adversarial emulation and defense. For next steps, we know that the models and prompts can be improved by testing variations in the prompts, and other data sources can be investigated to see if Gemini can help analyze them. We are also looking at using linear regression models as well as clustering and pathfinding algorithms to enable cybersecurity practitioners to quickly evaluate attack paths that may exist in a network.
Managing applications across multiple Kubernetes clusters is complex, especially when those clusters span different environments or even cloud providers. One powerful and secure solution combines Google Kubernetes Engine (GKE) fleets and, Argo CD, a declarative, GitOps continuous delivery tool for Kubernetes. The solution is further enhanced with Connect Gateway and Workload Identity.
This blog post guides you in setting up a robust, team-centric multi-cluster infrastructure with these offerings. We use a sample GKE fleet with application clusters for your workloads and a control cluster to host Argo CD. To streamline authentication and enhance security, we leverage Connect Gateway and Workload Identity, enabling Argo CD to securely manage clusters without the need to manage cumbersome Kubernetes Services Accounts.
On top of this, we incorporate GKE Enterprise Teams to manage access and resources, helping to ensure that each team has the right permissions and namespaces within this secure framework.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0eb810d220>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
Finally, we introduce the fleet-argocd-plugin, a custom Argo CD generator designed to simplify cluster management within this sophisticated setup. This plugin automatically imports your GKE Fleet cluster list into Argo CD and maintains synchronized cluster information, making it easier for platform admins to manage resources and for application teams to focus on deployments.
Follow along as we:
Create a GKE fleet with application and control clusters
Deploy Argo CD on the control cluster, configured to use Connect Gateway and Workload Identity
Configure GKE Enterprise Teams for granular access control
Install and leverage the fleet-argocd-plugin to manage your secure, multi-cluster fleet with team awareness
By the end, you’ll have a powerful and automated multi-cluster system using GKE Fleets, Argo CD, Connect Gateway, Workload Identity, and Teams, ready to support your organization’s diverse needs and security requirements. Let’s dive in!
Set up multi-cluster infrastructure with GKE fleet and Argo CD
Setting up a sample GKE fleet is a straightforward process:
1. Enable the required APIs in the desired Google Cloud Project. We use this project as the fleet host project.
a. gcloud SDK must be installed, and you must be authenticated via gcloud auth login.
<ListValue: [StructValue([(‘code’, ‘# Create a frontend team. rngcloud container fleet scopes create frontendrnrn# Add your application clusters to the frontend team. rngcloud container fleet memberships bindings create app-cluster-1-b \rn –membership app-cluster-1 \rn –scope frontend \rn –location us-central1rnrngcloud container fleet memberships bindings create app-cluster-2-b \rn –membership app-cluster-2 \rn –scope frontend \rn –location us-central1rnrn# Create a fleet namespace for webserver.rngcloud container fleet scopes namespaces create webserver –scope=frontendrnrn# [Optional] Verify your fleet team setup.rn# Check member clusters in your fleet.rngcloud container fleet memberships list rn# Verify member clusters have been added to the right team (`scope`). rngcloud container fleet memberships bindings list –membership=app-cluster-1 –location=us-central1rngcloud container fleet memberships bindings list –membership=app-cluster-2 –location=us-central1’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0ebbb52760>)])]>
4. Now, set up Argo CD and deploy it to the control cluster. Create a new GKE cluster as your application and enable Workload Identity on it.
5. Install the Argo CD CLI to interact with the Argo CD API server. Version 2.8.0 or higher is required. Detailed installation instructions can be found via the CLI installation documentation.
Now you’ve got your GKE fleet up and running, and you’ve installed Argo CD on the control cluster. In Argo CD, application clusters are registered with the control cluster by storing their credentials (like API server address and authentication details) as Kubernetes Secrets within the Argo CD namespace. We’ve got a way to make this whole process a lot easier!
8. To make sure the fleet-argocd-plugin works as it should, give it the right permissions for fleet management.
a. Create an IAM service account in your Argo CD control cluster and grant it the appropriate permissions. The setup follows the official onboarding guide of GKE Workload Identity Federation.
code_block
<ListValue: [StructValue([(‘code’, ‘gcloud iam service-accounts create argocd-fleet-admin \rn –project=$FLEET_PROJECT_IDrnrngcloud projects add-iam-policy-binding $FLEET_PROJECT_ID \rn–member “serviceAccount:argocd-fleet-admin@$FLEET_PROJECT_ID.iam.gserviceaccount.com” \rn–role “roles/container.developer”rnrngcloud projects add-iam-policy-binding $FLEET_PROJECT_ID \rn–member “serviceAccount:argocd-fleet-admin@$FLEET_PROJECT_ID.iam.gserviceaccount.com” \rn–role “roles/gkehub.gatewayEditor”rnrngcloud projects add-iam-policy-binding $FLEET_PROJECT_ID \rn–member “serviceAccount:argocd-fleet-admin@$FLEET_PROJECT_ID.iam.gserviceaccount.com” \rn–role “roles/gkehub.viewer”rnrn# Allow ArgoCD application controller and fleet-argocd-plugin to impersonate this IAM service account.rngcloud iam service-accounts add-iam-policy-binding argocd-fleet-admin@$FLEET_PROJECT_ID.iam.gserviceaccount.com \rn–role roles/iam.workloadIdentityUser \rn–member “serviceAccount:$FLEET_PROJECT_ID.svc.id.goog[argocd/argocd-application-controller]”rngcloud iam service-accounts add-iam-policy-binding argocd-fleet-admin@$FLEET_PROJECT_ID.iam.gserviceaccount.com \rn–role roles/iam.workloadIdentityUser \rn–member “serviceAccount:$FLEET_PROJECT_ID.svc.id.goog[argocd/argocd-fleet-sync]”rnrn# Annotate the Kubernetes ServiceAccount so that GKE sees the link between the service accounts.rnkubectl annotate serviceaccount argocd-application-controller \rn –namespace argocd \rn iam.gke.io/gcp-service-account=argocd-fleet-admin@$FLEET_PROJECT_ID.iam.gserviceaccount.com’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0ebbb52fd0>)])]>
b. You also need to allow the Google Compute Engine service account to access images from your artifacts repository.
Let’s do a quick check to make sure the GKE fleet and Argo CD are playing nicely together. You should see that the secrets for your application clusters have been automatically generated.
code_block
<ListValue: [StructValue([(‘code’, ‘kubectl get secret -n argocdrnrn# Example Output: TYPE DATA AGErn# app-cluster-1.us-central1.141594892609 Opaque 3 64mrn# app-cluster-2.us-central1.141594892609 Opaque 3 64m’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0ebbb52b50>)])]>
Demo 1: Automatic fleet management in Argo CD
Okay, let’s see how this works! We’ll use the guestbook example app. First, we deploy it to the clusters that the frontend team uses. You should then see the guestbook app running on your application clusters, and you won’t have to deal with any cluster secrets manually!
code_block
<ListValue: [StructValue([(‘code’, “export TEAM_ID=frontendrnenvsubst ‘$FLEET_PROJECT_NUMBER $TEAM_ID’ < applicationset-demo.yaml | kubectl apply -f – -n argocdrnrnkubectl config set-context –current –namespace=argocdrnargocd app list -o name rn# Example Output:rn# argocd/app-cluster-1.us-central1.141594892609-webserverrn# argocd/app-cluster-2.us-central1.141594892609-webserver”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0ebbb52ca0>)])]>
Demo 2: Evolving your fleet is easy with fleet-argocd-plugin
Suppose you decide to add another cluster to the frontend team. Create a new GKE cluster and assign it to the frontend team. Then, check to see if your guestbook app has been deployed on the new cluster.
code_block
<ListValue: [StructValue([(‘code’, ‘gcloud container clusters create app-cluster-3 –enable-fleet –region=us-central1rngcloud container fleet memberships bindings create app-cluster-3-b \rn –membership app-cluster-3 \rn –scope frontend \rn –location us-central1rnrnargocd app list -o namern# Example Output: a new app shows up!rn# argocd/app-cluster-1.us-central1.141594892609-webserverrn# argocd/app-cluster-2.us-central1.141594892609-webserverrn# argocd/app-cluster-3.us-central1.141594892609-webserver’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0ebbb521f0>)])]>
Closing thoughts
In this blog post, we’ve shown you how to combine the power of GKE fleets, Argo CD, Connect Gateway, Workload Identity, and GKE Enterprise Teams to create a robust and automated multi-cluster platform. By leveraging these tools, you can streamline your Kubernetes operations, enhance security, and empower your teams to efficiently manage and deploy applications across your fleet.
However, this is just the beginning! There’s much more to explore in the world of multi-cluster Kubernetes. Here are some next steps to further enhance your setup:
Deep dive into GKE Enterprise Teams: Explore the advanced features of GKE Enterprise Teams to fine-tune access control, resource allocation, and namespace management for your teams. Learn more in the official documentation.
Secure your clusters with Connect Gateway: Delve deeper into Connect Gateway and Workload Identity to understand how they simplify and secure authentication to your clusters, eliminating the need for VPNs or complex network configurations. Check out this blog post for a detailed guide.
Master advanced deployment strategies: Explore advanced deployment strategies with Argo CD, such as blue/green deployments, canary releases, and automated rollouts, to achieve zero-downtime deployments and minimize risk during updates. This blog post provides a great starting point.
As you continue your journey with multi-cluster Kubernetes, remember that GKE fleets and Argo CD provide a solid foundation for building a scalable, secure, and efficient platform. Embrace the power of automation, GitOps principles, and team-based management to unlock the full potential of your Kubernetes infrastructure.
As AI models increase in sophistication, there’s increasingly large model data needed to serve them. Loading the models and weights along with necessary frameworks to serve them for inference can add seconds or even minutes of scaling delay, impacting both costs and the end-user’s experience.
For example, inference servers such as Triton, Text Generation Inference (TGI), or vLLM are packaged as containers that are often over 10GB in size; this can make them slow to download, and extend pod startup times in Kubernetes. Then, once the inference pod starts, it needs to load model weights, which can be hundreds of GBs in size, further adding to the data loading problem.
This blog explores techniques to accelerate data loading for both inference serving containers and downloading models + weights, so you can accelerate the overall time to load your AI/ML inference workload on Google Kubernetes Engine (GKE).
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e27d0a72d90>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
1. Accelerating container load times using secondary boot disksto cache container images with your inference engine and applicable libraries directly on the GKE node.
The image above shows a secondary boot disk (1) that stores the container image ahead of time, avoiding the image download process during pod/container startup. And for AI/ML inference workloads with demanding speed and scale requirements, Cloud Storage Fuse (2) and Hyperdisk ML (3) are options to connect the pod to model + weight data stored in Cloud Storage or a network attached disk. Let’s look at each of these approach in more detail below.
Accelerating container load times with secondary boot disks
GKE lets you pre-cache your container image into a secondary boot disk that is attached to your node at creation time. The benefit of loading your containers this way is that you skip the image download step and can begin launching your containers immediately, which drastically improves startup time. The diagram below shows container image download times grow linearly with container image size. Those times are then compared with using a cached version of the container image that is pre-loaded on the node.
Caching a 16GB container image ahead of time on a secondary boot disk has shown reductions in load time of up to 29x when compared with downloading the container image from a container registry. Additionally, this approach lets you benefit from the acceleration independent of container size, allowing for large container images to be loaded predictably fast!
To use secondary boot disks, first create the disk with all your images, create an image out of the disk, and specify the disk image while creating your GKE node pools as a secondary boot disk. For more, see the documentation.
Accelerating model weights load times
Many ML frameworks output their checkpoints (snapshots of model weights) to object storage such as Google Cloud Storage, a common choice for long-term storage. Using Cloud Storage as the source of truth, there are two main products to retrieve your data at the GKE-pod level: Cloud Storage Fuse and Hyperdisk ML (HdML).
When selecting one product or the other there are two main considerations:
Performance – how quickly can the data be loaded by the GKE node
Operational simplicity – how easy is it to update this data
Cloud Storage Fuse provides a direct link to Cloud Storage for model weights that reside in object storage buckets. Additionally there is a caching mechanism for files that need to be read multiple times to prevent additional downloads from the source bucket (which adds latency). Cloud Storage Fuse is appealing because there are no pre-hydration operational activities for a pod to do to download new files in a given bucket. It’s important to note that if you switch buckets that the pod is connected to, you will need to restart the pod with an updated Cloud Storage Fuse configuration. To further improve performance, you can enable parallel downloads, which spawns multiple workers to download a model, significantly improving model pull performance.
Hyperdisk ML gives you better performance and scalability than downloading files directly to the pod from Cloud Storage or other online location. Additionally, you can attach up to 2500 nodes to a single Hyperdisk ML instance, with aggregate bandwidth up 1.2 TiB/sec. This makes it a strong choice for inference workloads that span many nodes and where the same data is downloaded repeatedly in a read-only fashion. To use Hyperdisk ML, load your data on the Hyperdisk ML disk prior to using it, and again upon each update. Note that this adds operational overhead if your data changes frequently.
Which model+weight loading product you use depends on your use case.The table below provides a more detailed comparison of each:
Zonal. Data can be made regional with an automated GKE clone feature to make data available across zones.
Create new persistent volume, load new data, and redeploy pods that have a PVC to reference the new volume.
As you can see there are other considerations besides throughput to take into account when architecting a performant model loading strategy.
Conclusion
Loading large AI models, weights, and container images into GKE-based AI models can delay workload startup times. By using a combination of the three methods described above — secondary boot disk for container images, Hyperdisk ML OR Cloud Storage Fuse for models + weights — get ready to accelerate data load times for your AI/ML inference applications.
As generative AI evolves, we’re beginning to see the transformative potential it is having across industries and our lives. And as large language models (LLMs) increase in size — current models are reaching hundreds of billions of parameters, and the most advanced ones are approaching 2 trillion — the need for computational power will only intensify. In fact, training these large models on modern accelerators already requires clusters that exceed 10,000 nodes.
With support for 15,000-node clusters — the world’s largest — Google Kubernetes Engine (GKE) has the capacity to handle these demanding training workloads. Today, in anticipation of even larger models, we are introducing support for 65,000-node clusters.
With support for up to 65,000 nodes, we believe GKE offers more than 10X larger scale than the other two largest public cloud providers.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e8fa5165eb0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
Unmatched scale for training or inference
Scaling to 65,000 nodes provides much-needed capacity to the world’s most resource-hungry AI workloads. Combined with innovations in accelerator computing power, this will enable customers to reduce model training time or scale models to multi-trillion parameters or more. Each node is equipped with multiple accelerators (e.g., Cloud TPU v5e node with four chips), giving the ability to manage over 250,000 accelerators in one cluster.
To develop cutting-edge AI models, customers need to be able to allocate computing resources across diverse workloads. This includes not only model training but also serving, inference, conducting ad hoc research, and managing auxiliary tasks. Centralizing computing power within the smallest number of clusters provides customers the flexibility to quickly adapt to changes in demand from inference serving, research and training workloads.
With support for 65,000 nodes, GKE now allows running five jobs in a single cluster, each matching the scale of Google Cloud’s previous world record for the world’s largest training job for LLMs.
Customers on the cutting edge of AI welcome these developments. Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems, and is excited for GKE’s expanded scale.
“GKE’s new support for larger clusters provides the scale we need to accelerate our pace of AI innovation.” – James Bradbury, Head of Compute, Anthropic
Innovations under the hood
This achievement is made possible by a variety of enhancements: For one, we are transitioning GKE from the open-source etcd, distributed key-value store, to a new, more robust, key-value store based on Spanner, Google’s distributed database that delivers virtually unlimited scale. On top of the ability to support larger GKE clusters, this change will usher in new levels of reliability for GKE users, providing improved latency of cluster operations (e.g., cluster startup and upgrades) and a stateless cluster control plane. By implementing the etcd API for our Spanner-based storage, we help ensure backward compatibility and avoid having to make changes in core Kubernetes to adopt the new technology.
In addition, thanks to a major overhaul of the GKE infrastructure that manages the Kubernetes control plane, GKE now scales significantly faster, meeting the demands of your deployments with fewer delays. This enhanced cluster control plane delivers multiple benefits, including the ability to run high-volume operations with exceptional consistency. The control plane now automatically adjusts to these operations, while maintaining predictable operational latencies. This is particularly important for large and dynamic applications such as SaaS, disaster recovery and fallback, batch deployments, and testing environments, especially during periods of high churn.
We’re also constantly innovating on IaaS and GKE capabilities to make Google Cloud the best place to build your AI workloads. Recent innovations in this space include:
Secondary boot disk, which provides faster workload startups through container image caching
Custom compute classes, which offer greater control over compute resource allocation and scaling
Support for Trillium, our sixth-generation TPU, the most performant and most energy-efficient TPU to date
Support for A3 Ultra VM powered by NVIDIA H200 Tensor Core GPUs with our new Titanium ML network adapter, which delivers non-blocking 3.2 Tbps of GPU-to-GPU traffic with RDMA over Converged Ethernet (RoCE). A3 Ultra VMs will be available in preview next month.
A continued commitment to open source
Guided by Google’s long-standing and robust open-source culture, we make substantial contributions to the open-source community, including when it comes to scaling Kubernetes. With support for 65,000-node clusters, we made sure that all necessary optimizations and improvements for such scale are part of the core open-source Kubernetes.
Our investments to make Kubernetes the best foundation for AI platforms go beyond scalability. Here is a sampling of our contributions to the Kubernetes project over the past two years:
Incubated the K8S Batch Working Group to build a community around research, HPC and AI workloads, producing tools like Kueue.sh, which is becoming the de facto standard for job queueing on Kubernetes
Created the JobSet operator that is being integrated into the Kubeflow ecosystem to help run heterogenous jobs (e.g., driver-executer)
For multihost inference use cases, created the Leader Worker Set controller
Published a highly optimized internal model server of JetStream
Incubated the Kubernetes Serving Working Group, which is driving multiple efforts including model metrics standardization, Serving Catalog and Inference Gateway
At Google Cloud, we’re dedicated to providing the best platform for running containerized workloads, consistently pushing the boundaries of innovation. These new advancements allow us to support the next generation of AI technologies. For more, listen to the Kubernetes podcast, where Maciek Rozacki and Wojtek Tyczynski join host Kaslin Fields to talk about GKE’s support for 65,000 nodes. You can also see a demo on 65,000 nodes on a single GKE cluster here.
Rapidly evolving generative AI models place unprecedented demands on the performance and efficiency of hardware accelerators. Last month, we launched our sixth-generation Tensor Processing Unit (TPU), Trillium, to address the demands of next-generation models. Trillium is purpose-built for performance at scale, from the chip to the system to our Google data center deployments, to power ultra-large scale training.
Today, we present our first MLPerf training benchmark results for Trillium. The MLPerf 4.1 training benchmarks show that Trillium delivers up to 1.8x better performance-per-dollar compared to prior-generation Cloud TPU v5p and an impressive 99% scaling efficiency (throughput).
In this blog, we offer a concise analysis of Trillium’s performance, demonstrating why it stands out as the most performant and cost-efficient TPU training system to date. We begin with a quick overview of system comparison metrics, starting with traditional scaling efficiency. We introduce convergence scaling efficiency as a crucial metric to consider in addition to scaling efficiency. We assess these two metrics along with performance per dollar and present a comparative view of Trillium against Cloud TPU v5p. We conclude with guidance that you can use to make an informed choice for your cloud accelerators.
Traditional performance metrics
Accelerator systems can be evaluated and compared across multiple dimensions, ranging from peak throughput, to effective throughput, to throughput scaling efficiency. Each of these metrics are helpful indicators but do not take convergence time into consideration.
Hardware specifications and peak performance
Traditionally, comparisons focused on hardware specifications like peak throughput, memory bandwidth, and network connectivity. While these peak values establish theoretical boundaries, they are bad at predicting real-world performance, which depends heavily on architectural design and software implementation. Since modern ML workloads typically span hundreds or thousands of accelerators, the key metric is the effective throughput of an appropriately sized system for specific workloads.
Utilization performance
System performance can be quantified through utilization metrics like effective model FLOPS utilization (EMFU) and memory bandwidth utilization (MBU), which measure achieved throughput versus peak capacity. However, these hardware efficiency metrics don’t directly translate to business-value measures like training time or model quality.
Scaling efficiency and trade-offs
A system’s scalability is evaluated through both strong scaling (performance improvement with system size for fixed workloads) and weak scaling (efficiency when increasing both workload and system size proportionally). While both metrics are valuable indicators, the ultimate goal is to achieve high-quality models quickly, sometimes making it worthwhile to trade scaling efficiency for faster training time or better model convergence.
The need for convergence scaling efficiency
While hardware utilization and scaling metrics provide important system insights, convergence scaling efficiency focuses on the fundamental goal of training: reaching model convergence efficiently. Convergence refers to the point where a model’s output stops improving and the error rate becomes constant. Convergence scaling efficiency measures how effectively additional computing resources accelerate the training process to completion.
We define convergence scaling efficiency using two key measurements: the base case, where a cluster of N₀ accelerators achieves convergence in time T₀, and a scaled case with N₁ accelerators taking time T₁ to converge. The ratio of the speedup in convergence time to the increase in cluster size gives us:
A convergence scaling efficiency of 1 indicates that time-to-solution improves by the same ratio as the cluster size. It is therefore desirable to have convergence scaling efficiency as close to 1 as possible.
Now let’s apply these concepts to understand our ML Perf submission for GPT3-175b training task using Trillium and Cloud TPU v5p.
Trillium performance
We submitted GPT3-175b training results for four different Trillium configurations, and three different Cloud TPU v5p configurations. In the following analysis, we group the results by cluster sizes with the same total peak flops for comparison purposes. For example, the Cloud TPU v5p-4096 configuration is compared to 4xTrillium-256, and Cloud TPU v5p-8192 is compared with 8xTrillium-256, and so on.
All results presented in this analysis are based on MaxText, our high-performance reference implementation for Cloud TPUs and GPUs.
Weak scaling efficiency
For increasing cluster sizes with proportionately larger batch-sizes, both Trillium and TPU v5p deliver near linear scaling efficiency:
Figure-1: Weak scaling comparison for Trillium and Cloud TPU v5p. v5p-4096 and 4xTrillium-256 are considered as base for scaling factor measurement. n x Trillium-256 corresponds to n Trillium pods with 256 chips in one ICI domain. v5p-n corresponds to n/2 v5p chips in a single ICI domain.
Figure 1 demonstrates relative throughput scaling as cluster sizes increase from the base configuration. Trillium achieves 99% scaling efficiency even when operating across data-center networks using Cloud TPU multislice technology, outperforming the 94% scaling efficiency of Cloud TPU v5p cluster within a single ICI domain. For these comparisons, we used a base configuration of 1024 chips (4x Trillium-256 pods), establishing a consistent baseline with the smallest v5p submission (v5p-4096; 2048 chips). When measured against our smallest submitted configuration of 2x Trillium-256 pods, Trillium maintains a strong 97.6% scaling efficiency.
Convergence scaling efficiency
As stated above, weak scaling is useful but not a sufficient indicator of value, while convergence scaling efficiency brings time-to-solution into consideration.
Figure-2: Convergence scaling comparison for Trillium and Cloud TPU v5p.
For the largest cluster size, we observed comparable convergence scaling efficiency for Trillium and Cloud TPU v5p. In this example, a CSE of 0.8 means that for the rightmost configuration, the cluster size was 3x the (base) configuration, while the time to convergence improved by 2.4x with respect to the base configuration (2.4/3 = 0.8).
While the convergence scaling efficiency is comparable between Trillium and TPU v5p, where Trillium really shines is by delivering the convergence at a lower cost, which brings us to the last metric.
Cost-to-train
While weak scaling efficiency and convergence scaling efficiency indicate scaling properties of systems, we’ve yet to look at the most crucial metric: the cost to train.
Figure-3: Comparison of cost-to-train based on the wall-clock time and the on-demand list price for Cloud TPU v5p and Trillium.
Trillium lowers the cost to train by up to 1.8x (45% lower) compared to TPU v5p while delivering convergence to the same validation accuracy.
Making informed cloud accelerator choices
In this article, we explored the complexities of comparing accelerator systems, emphasizing the importance of looking beyond simple metrics to assess true performance and efficiency. We saw that while peak performance metrics provide a starting point, they often fall short in predicting real-world utility. Instead, metrics like Effective Model Flops Utilization (EMFU) and Memory Bandwidth Utilization (MBU) offer more meaningful insights into an accelerator’s capabilities.
We also highlighted the critical importance of scaling characteristics — both strong and weak scaling — in evaluating how systems perform as workloads and resources grow. However, the most objective measure we identified is the convergence scaling efficiency, which ensures that we’re comparing systems based on their ability to achieve the same end result, rather than just raw speed.
Applying these metrics to our benchmark submission with GPT3-175b training, we demonstrated that Trillium achieves comparable convergence scaling efficiency to Cloud TPU v5p while delivering up to 1.8x better performance per dollar, thereby lowering the cost-to-train. These results highlight the importance of evaluating accelerator systems through multiple dimensions of performance and efficiency.
For ML-accelerator evaluation, we recommend a comprehensive analysis combining resource utilization metrics (EMFU, MBU), scaling characteristics, and convergence scaling efficiency. This multi-faceted approach enables you to make data-driven decisions based on your specific workload requirements and scale.
Every November, we start sharing forward-looking insights on threats and other cybersecurity topics to help organizations and defenders prepare for the year ahead. The Cybersecurity Forecast 2025 report, available today, plays a big role in helping us accomplish this mission.
This year’s report draws on insights directly from Google Cloud’s security leaders, as well as dozens of analysts, researchers, responders, reverse engineers, and other experts on the frontlines of the latest and largest attacks.
Built on trends we are already seeing today, the Cybersecurity Forecast 2025 report provides a realistic outlook of what organizations can expect to face in the coming year. The report covers a lot of topics across all of cybersecurity, with a focus on various threats such as:
Attacker Use of Artificial Intelligence (AI): Threat actors will increasingly use AI for sophisticated phishing, vishing, and social engineering attacks. They will also leverage deepfakes for identity theft, fraud, and bypassing security measures.
AI for Information Operations (IO): IO actors will use AI to scale content creation, produce more persuasive content, and enhance inauthentic personas.
The Big Four: Russia, China, Iran, and North Korea will remain active, engaging in espionage operations, cyber crime, and information operations aligned with their geopolitical interests.
Ransomware and Multifaceted Extortion: Ransomware and multifaceted extortion will continue to be the most disruptive form of cyber crime, impacting various sectors and countries.
Infostealer Malware: Infostealer malware will continue to be a major threat, enabling data breaches and account compromises.
Democratization of Cyber Capabilities: Increased access to tools and services will lower barriers to entry for less-skilled actors.
Compromised Identities: Compromised identities in hybrid environments will pose significant risks.
Web3 and Crypto Heists: Web3 and cryptocurrency organizations will increasingly be targeted by attackers seeking to steal digital assets.
Faster Exploitation and More Vendors Targeted: The time to exploit vulnerabilities will continue to decrease, and the range of targeted vendors will expand.
Be Prepared for 2025
Read the Cybersecurity Forecast 2025 report for a more in-depth look at these and other threats, as well as other security topics such as post-quantum cryptography, and insights unique to the JAPAC and EMEA regions.
For an even deeper look at the threat landscape next year, register for our Cybersecurity Forecast 2025 webinar, which will be hosted once again by threat expert Andrew Kopcienski.
For even more insights, hear directly from our security leaders: Charles Carmakal, Sandra Joyce, Sunil Potti, and Phil Venables.