The Amazon Connect Contact Control Panel (CCP) now features an updated user interface using Cloudscape Design System components to enhance agent productivity and focus. The refreshed user interface includes a updated visual style for colors and buttons, and more visually consistent UI elements across interfaces. This visual refresh provides a more intuitive and streamlined experience for your contact center agents while maintaining the familiar layout and functionality to minimize impact and training requirements.
This updated user interface is available in all AWS regions where Amazon Connect is offered.
Organizations need ML compute resources that can accommodate bursty peaks and periodic troughs. That means the consumption models for AI infrastructure need to evolve to be more cost-efficient, provide term flexibility, and support rapid development on the latest GPU and TPU accelerators.
Calendar mode is currently available in preview as the newest feature of Dynamic Workload Scheduler. This mode provides short-term ML capacity — up to 90 days of reserved capacity — without requiring long-term commitments.
Calendar mode extends the capabilities of Compute Engine future reservations to provide co-located GPU and TPU capacity that’s a good fit for model training, fine-tuning, experimentation and inference workloads.
Similar to a flight or hotel booking experience, Calendar mode makes it easy to search for and reserve ML capacity. Simply define your resource type, number of instances, expected start date and duration, and in a few seconds, you’ll be able to see the available capacity and reserve it. Once the capacity reservation is confirmed and delivered to your project, you can consume it via Compute Engine, Google Kubernetes Engine (GKE), Vertex AI custom training, and Google Batch.
What customers are saying
Over the past year, early access customers have used Calendar mode to reserve ML compute resources for a variety of use cases, from drug discovery to training new models.
“To accelerate drug discovery, Schrödinger relies on large-scale simulations to identify promising, high-quality molecules. Reserving GPUs through Google Cloud’s DWS Calendar Mode provides us the crucial flexibility and assurance needed to cost-effectively scale our compute environment for critical, time-sensitive projects.” – Shane Brauner, EVP/CIO, Schrödinger
“For Vilya, Dynamic Workload Scheduler has delivered on two key fronts: affordability and performance. The cost efficiency received was a significant benefit, and the reliable access to GPUs has empowered our teams to complete projects much faster, and it’s been invaluable for our computationally intensive tasks. It’s allowed us to be more efficient and productive without breaking the budget.” – Patrick Salveson, co founder and CTO
“Databricks simplifies the deployment and management of machine learning models, enabling fine tuning and real-time inference for scalable production environments. DWS Calendar Mode alleviated the burden of GPU capacity planning and provided seamless access to the latest generation GPU hardware for dynamic demand for testing and ongoing training.” – Ravi Gadde, Sr. Director, Serverless Platform
Using Calendar mode
With these concepts and use cases under our belts, let’s take a look at how to find and reserve capacity via the Google Cloud console. Navigate to Cloud console -> Compute Engine -> Reservation. Then, on the Future Reservation tab, click Create a Future Reservation. Selecting a supported GPU or TPU will expose the Search for capacity section as shown below.
Proceed to the Advanced Settings to determine if the reservation should be shared across multiple projects. The final step is to name the reservations upon creation.
The reservation is approved within minutes and can be consumed once it is in the Fulfilled status at the specified start time.
Get started today
Calendar mode with AI Hypercomputer makes finding, reserving, consuming, and managing capacity easy for ML workloads. Get started today with Calendar mode for TPUs. Contact your account team for GPU access in Compute Engine, GKE, or Slurm. To learn more see Calendar mode documentation and Dynamic Workload Scheduler pricing.
As the excitement around AI agents reaches enterprise customers, a critical question emerges: How can we empower these agents to securely and intelligently interact with enterprise data systems like Google Cloud BigQuery?
Currently, the developers building agentic applications have been forced to build and maintain their own custom tools, a process that is slow, risky, and distracts from building innovative applications. This introduces considerable development overhead and risk, as they become responsible for everything from authentication and error handling to keeping pace with BigQuery’s evolving capabilities.
To solve this, we are introducing a new, first-party toolset for BigQuery that includes tools to fetch metadata and execute queries (and we have more on the way):
list_dataset_ids: Fetches BigQuery dataset ids present in a GCP project.
get_dataset_info: Fetches metadata about a BigQuery dataset.
list_table_ids: Fetches table ids present in a BigQuery dataset.
get_table_info: Fetches metadata about a BigQuery table.
execute_sql: Runs a SQL query in BigQuery and fetch the result.
These official, Google-maintained tools provide a secure and reliable bridge to your data, and you can use them in two powerful ways: a built-in toolset in Google’s Agent Development Kit (ADK)or through the flexible, open-source MCP Toolbox for Databases. This frees you to focus on creating value, not on building foundational plumbing.
In this post, we’ll explore these first-party tools for BigQuery and walk you through how they can be used to build a conversational analytics agent in ADK that can answer natural language questions.
Tutorial: Build a Conversational Analytics Agent using BigQuery’s first-party tools
Our agent will query BigQuery’s public dataset: thelook_ecommerce, a synthetic e-commerce dataset that includes customer details, product inventories, and order histories. The agent’s primary role will be to generate SQL queries and provide meaningful responses to common business questions, such as: What are the top-selling products? Which products are frequently ordered together? And how many customers do we have in Colombia?
If you’re new to ADK, this page provides an overview of its core concepts and components; otherwise, let’s dive in!
Choose your model, select Vertex AI as the backend, and confirm your project id and region:
You should now have a new folder named bq-agent-app. Navigate to agent.py and update the root LLM-Agent to reflect our conversational analytics agent:
code_block
<ListValue: [StructValue([(‘code’, ‘root_agent = Agent(rn model=”gemini-2.0-flash”,rn name=”bigquery_agent”,rn description=(rn “Agent that answers questions about BigQuery data by executing SQL queries”rn ),rn instruction=””” You are a data analysis agent with access to several BigQuery tools. Make use of those tools to answer the user’s questions.rnrn “””,rn tools=[bigquery_toolset],rn)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0fdb7fb0d0>)])]>
When defining your agent, you provide a unique name, specify the underlying LLM model, and can optionally include a description that helps other agents understand its purpose. The agent’s core task or goal is defined in the instructions.
Finally, to enable the agent to interact with your data, it must be equipped with tools that allow it to interact with BigQuery so it can understand the available datasets and tables, and of course, execute queries. Let’s consider our options when it comes to using BigQuery’s first-party toolset.
Option 1: Use ADK’s new built-in toolset for BigQuery
This first-party toolset is owned and maintained by Google. To assign these tools to your agent, you need to import the BigQueryToolset from the agents.tools module and then initialize the toolset:
code_block
<ListValue: [StructValue([(‘code’, ‘from google.adk.tools.bigquery import BigQueryCredentialsConfigrnfrom google.adk.tools.bigquery import BigQueryToolsetrnimport google.authrnrn# Define an appropriate credential typernCREDENTIALS_TYPE = AuthCredentialTypes.OAUTH2rnrn# Write modes define BigQuery access control of agent:rn# ALLOWED: Tools will have full write capabilites.rn# BLOCKED: Default mode. Effectively makes the tool read-only.rn# PROTECTED: Only allows writes on temporary data for a given BigQuery session.rnrnrntool_config = BigQueryToolConfig(write_mode=WriteMode.ALLOWED)rnrnif CREDENTIALS_TYPE == AuthCredentialTypes.OAUTH2:rn # Initiaze the tools to do interactive OAuthrn credentials_config = BigQueryCredentialsConfig(rn client_id=os.getenv(“OAUTH_CLIENT_ID”),rn client_secret=os.getenv(“OAUTH_CLIENT_SECRET”),rn )rnelif CREDENTIALS_TYPE == AuthCredentialTypes.SERVICE_ACCOUNT:rn # Initialize the tools to use the credentials in the service account key.rn creds, _ = google.auth.load_credentials_from_file(“service_account_key.json”)rn credentials_config = BigQueryCredentialsConfig(credentials=creds)rnelse:rn # Initialize the tools to use the application default credentials.rn application_default_credentials, _ = google.auth.default()rn credentials_config = BigQueryCredentialsConfig(rn credentials=application_default_credentialsrn )rnrnbigquery_toolset = BigQueryToolset(credentials_config=credentials_config, tool_filter=[rn’list_dataset_ids’,rn’get_dataset_info’,rn’list_table_ids’,rn’get_table_info’,rn’execute_sql’,rn ])’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0fdb7fb5b0>)])]>
You can use the tool_filter parameter to filter the tools you’d like to expose to the agent.
Provide an OAuth 2.0 client_id and secret. This approach is typically used when an application needs a user to grant it permission to access their BigQuery data.
For more granular control over your interaction with BigQuery, you can of course create your own custom function tools, which are implemented as Python functions that you expose to your agent.
When tools are implemented directly within an agent, even with built-in toolsets, the agent or application is responsible for managing its authentication to BigQuery, as well as the logic and implementation for each tool. Thistight coupling creates challenges: updates to a tool or changes in its BigQuery connection method will require manual modification and redeployment for every agent, which can lead to inconsistencies and maintenance overhead.
Option 2: Use BigQuery’s pre-built tools in MCP Toolbox for Databases
The MCP (Model Context Protocol) Toolbox for Databases is an open-source server that centralizes the hosting and management of toolsets, decoupling agentic applications from direct BigQuery interaction. Instead of managing tool logic and authentication themselves, agents act as MCP clients, requesting tools from the Toolbox. The MCP Toolbox handles all the underlying complexities, including secure connections to BigQuery, authentication and query execution.
This centralized approach simplifies tool reuse across multiple agents, streamlines updates (tool logic can be modified and deployed on the Toolbox without requiring changes to every agent), and provides a single point for enforcing security policies.
Want to host your own custom tools in MCP Toolbox for Databases?
You can define your own custom tools in SQL within atools.yaml configuration file and provide the –tools-file option when starting your server. You cannot, however, use the --prebuilt and –tools-file option together. If you want to use custom tools alongside prebuilt tools, you must use the –tool-file option and manually specify the prebuilt tools you want to include in the configuration file, like so.
To connect your ADK application to the MCP Toolbox for Databases, you need to install toolbox-core:
Assign either the built-in ADK toolset, or the MCP toolset to your agent, and you’re ready to go!
code_block
<ListValue: [StructValue([(‘code’, ‘root_agent = Agent(rn model=”gemini-2.0-flash”,rn name=”bigquery_agent”,rn description=(rn “Agent that answers questions about BigQuery data by executing SQL queries”rn ),rn instruction=””” You are a data analysis agent with access to several BigQuery tools. Make use of those tools to answer the user’s questions.rn “””,rn tools=[bigquery_toolset]rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0fdb7fb9d0>)])]>
You can now run your agent using the adk run or adk web command and start asking questions about your data!
Your agent will leverage pre-built tools to extract dataset metadata, and then generate and execute a SQL query in BigQuery to retrieve your result:
Get started
Dive into these tutorials and start building your conversational analytics agent today:
Anthropic’s Claude models on Vertex AI now have improved overall availability with the global endpoint for Claude models. Now generally available, the global endpoint unlocks the ability to dynamically route your requests to any region with available capacity supported by the Claude model you’re using. This helps you deploy Claude-powered applications and agents with more uptime and dependability.
During the preview period, customers like Replicate experienced firsthand the benefits of the global endpoint. Zeke Sikelianos, founding designer at Replicate noted, “people use Replicate because they want to deploy AI models at scale. Claude on Vertex AI fits perfectly with that — we get one of the best language models available, with Google’s solid infrastructure and the global endpoint that delivers fast responses worldwide. It just works.”
The global endpoint is launching with support for pay-as-you-go traffic for the following Claude models:
Claude Opus 4
Claude Sonnet 4
Claude Sonnet 3.7
Claude Sonnet 3.5 v2
What are global endpoints and when should you use them?
When you send a request to Anthropic’s Claude models on Vertex AI, you typically specify a region (e.g., us-central1). This is a regional endpoint, which keeps your data and processing within that geographical boundary—ideal for applications with strict data residency requirements.
The global endpoint, by contrast, does not tie your request to a single region. Instead, it directs traffic to a global entry point that dynamically routes your request to a region with available capacity. This multi-region approach is designed to maximize availability and reduce errors that can arise from high traffic in a given region.
So, when is the global endpoint the right choice?
If your application requires the highest possible availability and your data is not subject to residency restrictions, the global endpoint is an excellent fit.
If your services are facing regional capacity limits or if you are architecting for maximum resilience against regional disruptions.
However, if you have data residency requirements (specifically for ML processing), you should continue to use regional endpoints, as the global endpoint does not guarantee that requests will be processed in any specific location. Here’s a simple breakdown of global versus regional endpoints:
Global versus regional endpoint
Global endpoint
Regional endpoint
Availability
Maximized by leveraging multi-region resources
Dependent on single-region capacity and quota
Latency
May be higher in some cases due to dynamic global routing
Optimized for low latency within the specified region
Quota
Uses a separate, independent global quota
Uses the quota assigned to the specific region
Use case
High-availability applications without data residency needs.
Applications with strict data residency requirements
Traffic type
Pay-as-you-go
Pay-as-you-go & Provisioned Throughput (PT)
By giving you the choice between global and regional endpoints, Vertex AI empowers you to build more sophisticated, resilient, and scalable generative AI applications and agents that meet your specific architectural and business needs.
Prompt caching and pay-as-you-go pricing
As part of this launch, prompt caching is fully supported with global endpoints. When a prompt is cached, subsequent identical requests will be routed to the region holding the cache for the lowest latency. If that region is at capacity, the system will automatically try the next available region to serve the request. This integration ensures that users of global endpoints still receive the benefits of prompt caching (lower latency and lower costs).
Note that at this point, the global endpoint for Claude models only supports pay-as-you-go traffic. Provision throughput is available on regional-endpoints only.
Global endpoint requests are charged the same price as regional endpoint requests.
Best practices
To get the most out of this new feature, we recommend routing your primary traffic to the global endpoint. Use regional endpoints as a secondary option, specifically for workloads that must adhere to data residency rules. To ensure the best performance and avoid unnecessary cost, please do not submit the same request to both a global and a regional endpoint simultaneously.
A new, separate global quota is available for this feature. You can view and manage this quota on the “Quotas & Systems Limits” page in your Google Cloud console and request an increase if needed. The pricing for requests made to the Global Endpoint remains the same as for regional endpoints.
How to get started
To get started with the global endpoint for Anthropic’s Claude models on Vertex AI, There are only two steps:
Step 1: Select and enable a global endpoint supported Claude model on Vertex AI (Claude Opus 4, Claude Sonnet 4, Claude Sonnet 3.7, Claude Sonnet 3.5 v2).
Step 2: In the configuration, set “GLOBAL” as the location variable value, and use global endpoint cURL:
code_block
<ListValue: [StructValue([(‘code’, ‘https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/PUBLISHER_NAME/models/MODEL_NAME’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0fd978b520>)])]>
An overwhelming volume of threats and data combined with the shortage of skilled threat analysts has left many security and IT leaders believing that their organizations are vulnerable to cyberattacks and stuck in a reactive state.
That’s according to the new Threat Intelligence Benchmark, a commissioned study conducted by Forrester Consulting on behalf of Google Cloud, on the threat intelligence practices of more than 1,500 IT and cybersecurity leaders from eight countries and across 12 industries.
Operationalizing threat intelligence remains a major challenge, said a majority of the survey’s respondents.
“Rather than aiding efficiency, myriad [threat intelligence] feeds inundate security teams with data, making it hard to extract useful insights or prioritize and respond to threats. Security teams need visibility into relevant threats, AI-powered correlation at scale, and skilled defenders to use actionable insights, enabling a shift from a reactive to a proactive security posture,” said the study.
Data and analytical challenges organizations face in improving their threat intelligence capabilities.
Organizations today face a multifaceted, compound problem: They have too few analysts who can effectively interpret and act on threat intelligence, who are facing too many data feeds supplying that raw intelligence. This has led many security and IT leaders to worry that they are missing critical needles in the haystack, ultimately making it harder to take action against legitimate cyberattacks.
82% of respondents worry about missing threats due to the amount of alerts and data they are faced with.
We believe the key is to embed threat intelligence directly into security workflows and tools, so it can be accessed and analyzed quickly and effectively. AI has a vital role in this integration, helping to synthesize the raw data, manage repetitive tasks, and reduce toil to free human analysts to focus their efforts on critical decision-making.
aside_block
<ListValue: [StructValue([(‘title’, ‘Get the new Threat Intelligence Benchmark study’), (‘body’, <wagtail.rich_text.RichText object at 0x3e29ac56b7c0>), (‘btn_text’, ‘Download now’), (‘href’, ‘https://cloud.google.com/resources/content/security-forrester-harness-ai-transform-threat-intelligence?utm_source=cgc-blog&utm_medium=blog&utm_campaign=FY25-Q3-GLOBAL-ENT35910-website-dl-dgcsm-CTI-Study-80198&utm_content=-&utm_term=-‘), (‘image’, None)])]>
Key takeaways from the survey
Organizations value threat intelligence: More than 80% of organizations already use threat intelligence or are planning to across eight major use-cases.
Improving threat intelligence capabilities is challenging: Too many feeds (61%), too few analysts (60%), hard to derive clear action from threat intelligence data (59%), and difficulty determining which threats are valid (59%) were cited as the top challenges to actioning threat intelligence. All told, 82% are concerned about missing threats due to the volume of alerts and data.
Organizational blind spots:80% of respondents said their senior leadership team underestimates threats to the organization, and 66% state that they struggle to share threat intelligence with relevant teams.
Stuck in reactive mode: Too much data leaves security teams struggling to prioritize threats, creating significant security gaps. As a result, 86% of respondents said that their organization needs to improve its understanding of the threat landscape, 85% of respondents say that their organization could focus more time and energy on emerging critical threats, and 72% of respondents said they are mostly reactive to threats.
Helping defenders with AI:86% of respondents agreed that they “must” use AI to improve their ability to operationalize threat intelligence. When asked about the benefits of using AI in threat intelligence, improving efficiency by generating easy-to-read summaries was cited most frequently (69%).
Organizations are using AI to help in a number of ways including summarization, prioritization, and communication.
The Threat Intelligence Benchmark study underscores how complex the problem is, but we also see a path forward for even under-resourced organizations to get the most out of their threat intelligence. Through our engagements with customers and the broader threat intelligence community, we’ve developed suggestions on how organizations can maximize the resources they’ve already dedicated to threat intelligence.
How to operationalize threat intelligence more effectively
At Google Cloud, we’re strong advocates for security and IT leaders to integrate threat intelligence into their security environments as part of a comprehensive layered defense. The raw data of threat intelligence can be used to prevent, detect, and respond to attacks — as well as to inform broader strategic decision-making across the organization.
Here are four tactical steps to help you get started.
Step 1: Identify high-stakes intelligence needs
Security teams should use threat intelligence as a strategic tool to focus on the threats that are most relevant to their organization. It can be crucial in shaping the organization’s cyber threat profile (a structured way to identify, analyze, and prioritize potential cyber threats,) and help to better protect against the threats that matter most.
Define your crown jewels: Identify your most critical assets, data, and business functions, and calculate the impact if they’re compromised. This directly informs your Priority Intelligence Requirements (PIR).
Know your adversaries: Pinpoint the threat actors most likely to target your IT environment and the industry that your organization operates in. Study their common tactics, techniques, and procedures (TTPs). Focus on intelligence related to these groups and their methods of intrusion.
Establish a feedback loop: Regularly ask your incident response (IR) and security operations center (SOC) teams about the threat intelligence that could have helped them prevent, detect, and respond faster to recent incidents. Their answers can be used to refine PIRs.
Understand how security enables the organization: Developing robust threat intelligence analysis is all about supporting smarter, faster decisions. Security should be a close partner to leadership and other teams, focused on enabling the organization to achieve its goals while minimizing risk.
Step 2: Build a tactical threat intelligence pipeline
In cybersecurity, efficiency is key. The goal is to get threat intelligence from source to action as quickly as possible.
Centralized aggregation: Implement a Threat Intelligence Platform (TIP) and use existing security information and event management (SIEM) capabilities to ingest, normalize, and de-duplicate threat intelligence from all sources (OSINT, commercial feeds, ISACs, dark web monitoring).
Automated enrichment: Automatically enrich incoming indicators (IPs, domains, hashes) with context such as geolocation, reputation scores, and associated threat actors. Tools should do the heavy lifting.
Prioritization engine: Instead of letting analysts manually triage thousands of alerts, develop rules in your TIP and SIEM to automatically score and prioritize intelligence based on its relevance to PIRs and its severity.
Direct integration with controls: Push relevant, high-fidelity indicators and detection rules directly to firewalls and proxies, endpoint and extended detection and response (EDR and XDR), Intrusion detection and prevention systems (IDS and IPS), and SIEM systems.
Step 3: Empower security teams
Two important ways that IT and security professionals can feel that the threat intelligence they’re using is helpful are freeing analysts from toil, and focusing on training and tooling.
Analyst focus: Free up your SOC and IR analysts from data ingestion and basic correlation. Their time is better spent on proactive threat hunting, contextualizing alerts, developing custom detections, and augmenting incident response.
Training and expertise: 79% of survey respondents said that external threat intelligence providers should help “uplevel junior staff or embed a threat intelligence (CTI) analyst” into their team. Give analysts focused training on shifting to a more intelligence-led approach and providing threat intel expertise tailored to your organization.
Step 4: Measure and adapt continuously
Threat intelligence operationalization is an ongoing cycle, not a one-time project.
Key metrics: Track these key threat intelligence metrics and ask the following questions for each:
Mean time to detect (MTTD) and mean time to respond (MTTR) reduction: Does threat intelligence help us detect and respond to threats faster?
Alert fidelity: Are we seeing fewer false positives due to better-contextualized alerts from threat intelligence?
Blocked threats: How many threats were proactively blocked by systems fed with threat intelligence?
Hunting success: How many new threats were identified through intelligence-led hunting?
Regular reviews: Monthly or quarterly review of PIRs, threat intelligence sources, and the effectiveness of integrations can help keep your threat intelligence strategy current.
Incident-driven refinement: After every significant incident, conduct a lessons-learned session specifically on the contributions that threat intelligence made to the incident response.
How Google Threat Intelligence can help
Despite concerns about data overload, 80% of survey respondents said that threat intelligence providers should offer information sources that are both broad and deep. Security teams should feel confident that they have a holistic view of available intelligence.
Augmented by advanced AI, Google Threat Intelligence provides unparalleled visibility into threats, enabling us to deliver detailed and timely threat intelligence to security teams around the world. It combines Mandiant frontline expertise, the global reach of the VirusTotal community, and the breadth of visibility only Google can deliver.
Our Advanced Intelligence Access (AIA) and Essential Intelligence Access (EIA) programs provide organizations with access to embedded and targeted intelligence experts, as well as early access to threat data. Mandiant Academy offers training courses for security professionals, including many focused on how to best consume and apply threat intelligence to improve tactical defenses and overall security posture.
Amazon Connect now simplifies forecast editing with a new UI experience, enabling planners to make adjustments quickly and better respond to changing contact patterns. With this launch, users can select a forecast, make edits—such as increasing contact volume by a percentage or setting exact values—across specific date ranges, queues, and channels, preview and apply changes within the forecasting UI. For example, if there’s an upcoming marketing campaign expected to drive higher traffic, a planner can increase the short-term forecast by 15% for Tuesdays and Wednesdays between 12 PM and 2 PM for the next two weeks. With this feature, planners can simplify the process of managing forecast changes, improve planning accuracy, and respond faster to demand fluctuations.
This feature is available in all AWS Regions where Amazon Connect agent scheduling is available. To learn more about agent scheduling, click here.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) C7i instances powered by custom 4th Gen Intel Xeon Scalable processors (code-named Sapphire Rapids) are available in the Asia Pacific (Jakarta) Region. C7i instances are supported by custom Intel processors, available only on AWS, and offer up to 15% better performance over comparable x86-based Intel processors utilized by other cloud providers.
C7i instances deliver up to 15% better price-performance versus C6i instances and are a great choice for all compute-intensive workloads, such as batch processing, distributed analytics, ad-serving, and video encoding. C7i instances offer larger instance sizes, up to 48xlarge, and two bare metal sizes (metal-24xl, metal-48xl). These bare-metal sizes support built-in Intel accelerators: Data Streaming Accelerator, In-Memory Analytics Accelerator, and QuickAssist Technology that are used to facilitate efficient offload and acceleration of data operations and optimize performance for workloads.
C7i instances support new Intel Advanced Matrix Extensions (AMX) that accelerate matrix multiplication operations for applications such as CPU-based ML. Customers can attach up to 128 EBS volumes to a C7i instance vs. up to 28 EBS volumes to a C6i instance. This allows processing of larger amounts of data, scale workloads, and improved performance over C6i instances.
Amazon Connect now supports AWS CloudFormation for Outbound Campaign message template attachments, enabling you to create, manage, and deploy these template attachments using AWS CloudFormation. This enhancement allows customers to define and deploy attachments, such as images or documents, as part of the existing MessageTemplate CloudFormation resource.
Message template attachments are used in outbound email campaigns to enrich content and improve engagement. With this launch, customers can now manage attachments programmatically through infrastructure as code, ensuring consistency, repeatability, and automation across staging, test, and production environments.
This feature is available in all AWS Regions where Amazon Connect Outbound Campaigns are supported. For more information, see the AWS Region table. To learn more, see Message templates in the Amazon Connect Administrator Guide or visit the Amazon Connect product page.
At Google Cloud Security, our mission is to empower organizations to strengthen their defenses with innovative security capabilities, all while simplifying and modernizing their cybersecurity. In a world of evolving threats and increasing complexity, we believe true security comes from clarity, not more noise.
We’re excited to bring this commitment to innovation and simplification to Black Hat USA 2025, where you can discover how Google Cloud Security and Mandiant can help you navigate the complex threat landscape, adopt agentic security, and make Google an extension of your security team.
From connecting with our security experts to witnessing innovative cloud security technology in action, we’re offering Black Hat attendees a packed schedule of booth activities, insightful sessions, and exclusive events.
Visit our booth and connect with experts
Booth #2240 is where you can meet the Google Cloud Security team. Discover our latest innovations and learn directly from Mandiant experts about the techniques and tactics from their most recent investigations. See firsthand how agentic security can help you detect and remove threats more effectively and make your security team more productive.
Experience our expanded demo landscape
Catch our on-demand product and service demos during Business Hall/Expo hours to learn how Google Cloud Security can protect your organization. Plus, connect with our security experts and partners to discuss your specific needs.
Google Threat Intelligence: Experience how you can get ahead of the latest threats with Google Threat Intelligence. Know who’s targeting you and focus on the most relevant threats to your organization.
Google Security Operations:Discover how our intelligence-driven and AI-powered security operations platform, Google Security Operations, combines Google’s hyper-scale infrastructure along with unparalleled visibility and understanding of cyber adversaries to enable security teams to uncover the latest cyber threats in near real-time.
AI for Defenders:Learn how AI agents in Google Cloud Security products can autonomously investigate threats, triage alerts and resolve misconfigurations. Join us as we demo how AI agents can automate manual and repetitive tasks to help you move from insight to action faster.
Cloud Security: Explore how Google Cloud provides built-in, secure controls to help you maintain a strong cloud security posture. See in action how Google Cloud’s Security Foundation recommended products help address most common cloud adoption use cases.
Mandiant Incident Response: Learn how Mandiant uses frontline experience with threat intelligence and incident response to help organizations like yours tackle top cloud security challenges.
Chrome Enterprise: Stop by to find out why Chrome is the most trusted enterprise browser, meeting the secure enterprise browsing needs of today’s workforce.
Join us at Google Cloud Security Hub
Beyond the main expo hall, make your way to the Google Cloud Security Hub, located conveniently in The Cove next to Libertine Social at Mandalay Bay. From the expo hall, head past the Starbucks, and our Customer Hub will be on your right. Here’s a detailed map for easy way-finding:
How to find Google Cloud at the conference.
The Hub is home to several exclusive events and spaces:
Enjoy the exclusive Customer Lounge
Looking for a place to recharge and connect in a more relaxed setting? If you schedule a meeting with our team, you’ll gain exclusive access to our Customer Lounge at the Google Hub. We’ll have snacks, beverages, and a comfortable space for you to take a break from the conference floor. Reach out to your sales representative to schedule your meeting and get on the guest list.
Unwind at the Google Cloud Security Happy Hour
Join us for the Google Cloud Security Happy Hour on Wednesday, Aug. 6, from 5:00 p.m. to 7:00 p.m., at the Google Hub for a relaxed evening of networking. It’s the perfect opportunity to unwind after a day of briefings and connect with our team and your peers.
Attend the Threat Briefing and dinner
Customers are invited to join us for an exclusive Threat Briefing and Dinner on Tuesday, Aug. 5, from 6:00 p.m. to 9:00 p.m., at the Google Hub. You’ll gain deep insights from Mandiant Intelligence, with a special briefing from Luke McNamara, chief deputy analyst.
Enhance your skills with Mandiant Academy training
Improve your expertise with hands-on training directly from Mandiant’s frontline cybersecurity experts. Mandiant Academy is offering the following courses during Black Hat (requires prior registration):
With your Briefing conference pass, you can attend these sessions where Google Cloud Security and Mandiant experts will share their insights:
Bridging the AI reality gap: Join Vijay Ganti (director, product management, Google Cloud Security) and Spencer Lichtenstein (group product manager, Google Security Operations) as they pull back the curtain on AI in security. In the session, they’ll dive deep into how Google is integrating AI into its security products. You’ll learn about the rigorous data science processes we use to measure every task of the end-to-end system, and why this meticulous approach is crucial for giving you an edge against threat actors. We’ll also share the latest, most impactful agent demos.
Participate in an OT Incident Response: Join Tim Gallo (head of global solution architects, Google Cloud Security) and Paul Shaver (global OT security lead, Google Cloud Security) for a unique, interactive session where you can experience what it’s truly like to navigate a critical operational technology (OT) incident. In this live session, you’ll step into the shoes of a Mandiant Incident Responder as we guide you through a simulated OT incident. You’ll see firsthand the crucial decision points, compare your choices with those of our experts, and gain invaluable insights into the complexities of real-world OT incident response.
Autonomous Timeline Analysis and Threat Hunting: An AI Agent for Timesketch: In this talk, we will present the first AI-powered agent capable of autonomously performing digital forensic analysis on the large and varied log volumes typically encountered in real–world incidents. We will demonstrate the agent’s proficiency in threat hunting and evaluate our technique on a dataset of 100 diverse, real-world compromised systems.
The Ransomware Response Playbook: Join this session where security experts will discuss how best to prepare for and handle a ransomware extortion attack against your business. This panel discussion will explore critical questions such as: Where is the malicious payload and how is it spreading? How do you interact and barter with your attacker (or not)? Who do you call? Are your backups protected?
At its core, FACADE is a novel self-supervised ML system that detects suspicious actions by analyzing the context of corporate logs, leveraging a unique contrastive learning strategy. This, combined with an innovative clustering approach, leads to unparalleled accuracy: a false positive rate under 0.01%, and as low as 0.0003% for single rogue actions. This session will not only present the underlying technology but also demonstrate how to use the recently released FACADE open-source version to protect your own organization.
Threat Space Workshop: Join Nadean Tanner for this hands-on experience with Harbinger, an AI-powered red teaming platform for streamlined operations and enhanced decision-making.
Learn about open-source solutions at Arsenal
Harbinger: An AI-Powered Red Teaming Platform for Streamlined Operations and Enhanced Decision-Making: Harbinger is an AI-powered platform that streamlines your workflow by integrating essential components, automating tasks, and providing intelligent insights. It consolidates data from various sources, automates playbook execution, and uses AI to suggest your next moves, making red teaming more efficient and effective. With Harbinger, you can focus on what matters most – achieving your objectives and maximizing the impact of your assessments.
Timesketch: AI-Powered Super Timeline Analysis: Timesketch is a leading, free open-source software (licensed under Apache-2.0) for collaborative forensic-timeline analysis, with more than 2.6k stars on GitHub. In this arsenal, we announce and showcase Timesketch AI extension designed to drastically speedup (human) analysts, identify compromise root cause analysis and improve incident reaction time.This demo will showcase AI-driven investigations in Timesketch, highlighting its ability to:
Autonomously analyze timelines, answer investigative questions, identify key events, and find the root cause of compromises.
Provide interactive review, empowering analysts to verify, edit, and refine AI-generated findings with clear links to supporting facts, emphasizing human validation.
Facilitate collaborative timeline analysis by integrating with Timesketch’s collaborative environment, enabling teamwork on AI-powered investigations.
Meet you there
Black Hat USA 2025 promises to be an impactful week, and Google Cloud Security is ready to share valuable knowledge and innovative solutions. We encourage you to make the most of your time by visiting our booth, attending our sessions, re-energizing at the Google Cloud Security Hub, and connecting with our team.
We’re eager to discuss your security challenges and demonstrate how Google can be your strategic security partner in the face of evolving threats.
Developers building with gen AI are increasingly drawn to open models for their power and flexibility. But customizing and deploying them can be a huge challenge. You’re often left wrestling with complex dependencies, managing infrastructure, and fighting for expensive GPU access.
Don’t let that complexity slow you down.
In this guide, we’ll walk you through the end-to-end lifecycle of taking an open model from discovery to a production-ready endpoint on Vertex AI. In this blog post, we will use fine-tuning and deploying Qwen3 as our example, showing you how to handle the heavy lifting so you can focus on innovation.
Part 1: Quickly choose the right base model
So you’ve decided to use an open model for your project: which model, on what hardware, and which serving framework? The open model universe is vast, and the “old way” of finding the right model is time consuming. You could spend days setting up environments, downloading weights, and wrestling with requirements.txt files just to run a single test.
This is a common place for projects to stall. But with Vertex AI, your journey starts in a much better place: the Vertex AI Model Garden, a curated hub that simplifies the discovery, fine-tuning and deployment of cutting-edge open models. With over 200+ validated options (and growing!) including popular choices like Gemma, Qwen, DeepSeek, and Llama. Comprehensive model cards offer crucial information, including details on recommended hardware (such as GPU types and sizes) for optimal performance. Additionally, Vertex AI has default quotas for dedicated on-demand capacity of the latest Google Cloud accelerators to make it easier to get started.
Qwen 3 Model card on Vertex AI Model Garden
Importantly, Vertex AI conducts security scans on these models and their containers, which gives you an added layer of trust and mitigating potential vulnerabilities from the outset. Once you found a model, like Qwen3, for your use case, Model Garden provides one-click deployment options or pre-configured notebooks (code) making it easy to deploy the model as an endpoint using Vertex AI inference Service, ready to be integrated into your application.
Qwen3 Deployment options from Model Garden
Additionally, Model Garden provides optimized serving containers—often leveraging vLLM or SGLang, or Hex-LLM for high-throughput inference — specifically designed for performant model serving. Once your model is deployed (via an experimental endpoint or notebook) you can start experimenting and establishing a baseline for your use case. This baseline lets us benchmark our fine-tuned model later on.
Model Inference framework options
Qwen3 quick deployment on Endpoint
It’s important that you incorporate evaluation early on in the process. You can leverage Vertex AI’s Gen AI evaluation service to assess the model against your own data and criteria, or integrate open-source frameworks. This essential early validation ensures you confidently select the right base model.
By the end of this experimentation and research phase, you’ll have efficiently navigated from model discovery to initial evaluation ready for the next step.
Part 2: Start parameter efficient fine-tuning (PEFT) with your data
You’ve found your based model – in this case Qwen3. Now for the magic: making it yours by fine-tuning it on your specific data. This is where you can give the model a unique personality, teach it a specialized skill, or adapt it to your domain.
Step 1: Get your data ready First you need to get your data ready. Reading data can often be a bottleneck, but Vertex AI makes it simple. You can seamlessly pull your datasets directly from Google Cloud Storage (GCS) and BigQuery (BQ). For more complex data-cleaning and preparation tasks, you can build an automated Vertex AI Pipeline to orchestrate the preprocessing work for you.
Step 2: Hands-on tuning in the notebook Now you can start fine-tuning your Qwen3 model. For Qwen3, the Model Garden provides a pre-configured notebook that uses Axolotl, a popular framework for fine-tuning. This notebook already includes optimized settings for techniques like:
QLoRA: A highly memory-efficient tuning method, perfect for running experiments without needing massive GPUs.
FSDP (Fully shared data parallelism): A technique for distributing a large model across multiple GPUs for larger scale training.
You can run the Qwen3 fine-tuning process directly inside the notebook. This is the perfect “lab environment” for quick experiments to discover the right configuration for the fine-tuning job.
Step 3: Scaling up with Vertex AI training Experimenting and getting started in a notebook is great, but you might need more GPU resources and flexibility for customization. This is when you graduate from the notebook to a formal Vertex AI Training job.
Instead of being limited by a single notebook instance, you submit your training configuration (using the same container) to Vertex AI’s managed training service offering more scalability, flexibility and control. Here’s what that gives you:
On-demand accelerators: Access an on-demand pool of the latest accelerators (like H100s) when you need them or choose DWS Flex start, spot GPUs, BYO-reservation options for more flexibility or stability.
Managed infrastructure: No need to provision or manage servers or containers. Vertex AI handles it all. You just define your job, and it runs.
Reproducibility: Your training job is a repeatable artifact, making it easier to be used in a MLOps workflow.
Once your job is running, you can monitor its progress in real-time with TensorBoard to watch your model’s loss and accuracy improve. You can also check in on your tuning pipeline.
Beyond using the Vertex AI Training Job you can go with Ray on Vertex or DIY on GKE or GCE based on flexibility and control needed.
Part 3: Evaluate your fine-tuned model
After fine-tuning your Qwen3 model on Vertex AI, robust evaluation is crucial to assess its readiness. You compare the evaluation results to your baseline created during experimentation.
For complex generative AI tasks, Vertex AI’s Gen AI Evaluation Service uses a ‘judge’ model to assess nuanced qualities (coherence, relevance, groundedness) and task-specific criteria, supporting side-by-side (SxS) human reviews. Using the GenAI SDK, you can programmatically evaluate and compare your models. This service provides deep, actionable insights into model performance—going far beyond simple metrics like perplexity by also incorporating automated side-by-side comparisons and human review.
In the evaluation notebook, We evaluated our fine-tuned Qwen3 model against the base model using the GenAI Evaluation Service. For each query, we provided responses from both models and used the pairwise_summarization_quality metric to let the judge model determine which performed better.
For evaluation on other popular models, refer to this notebook
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2770f9b4c0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Part 4: Deploy to a production endpoint
Your model has been fine-tuned and validated. It’s time for the final, most rewarding step: deploying it as an endpoint. This is where many projects hit a wall of complexity. With Vertex AI inference it’s a streamlined process. When you deploy to a Vertex AI Endpoint, you’re not just getting a server; you’re getting a fully managed, production-grade serving stack optimized for two key things:
1. Fast performance
Optimized serving: Your model is served using a container built with cutting-edge frameworks like vLLM, ensuring high throughput and low latency.
Rapid start-up: Techniques like fast VM startup, container image streaming, model weight streaming, and prefix caching mean your model can start up quickly.
2. Cost-effective and flexible scaling
You have full control over your GPU budget. You can:
Use on-demand GPUs for standard workloads.
Apply existing Committed Use Discounts (CUDs) and reservations to lower your costs.
Use Dynamic Workload Scheduler (DWS) Flex Start to acquire capacity for up to 7 days at a discount.
Leverage Spot VMs for fault-tolerant workloads to get access to compute at a steep discount.
In short, Vertex AI Inference handles the scaling, the infrastructure, and the performance optimization. You just focus on your application.
Get started
Successfully navigating the lifecycle of an open model like Qwen on Vertex AI, from initial idea to production-ready endpoint, is a significant achievement. You’ve seen how the platform provides robust support for experimentation, fine-tuning, evaluation, and deployment.
Want to explore your own open model workload? The Vertex AI Model Garden is a great place to start.
AWS HealthOmics announces third-party Git repository integration via AWS CodeConnections for workflow creation, enabling bioinformaticians and researchers to seamlessly connect their existing source code management repositories to HealthOmics. This new capability allows customers to automatically pull workflow definitions, parameter templates, and README files directly from GitHub, GitLab, or Bitbucket repositories without intermediate steps. AWS HealthOmics is a HIPAA-eligible service that helps healthcare and life sciences customers accelerate scientific breakthroughs with fully managed biological data stores and workflows.
Git integrations can streamline bioinformatics workflow management by eliminating multiple manual steps required to stage and update workflow files. Customers can now specify a particular branch, tag, or commit ID to ensure version control and reproducibility while maintaining their existing development processes. This feature is particularly valuable for organizations with established bioinformatics pipelines that want to leverage HealthOmics’ scalable compute capabilities while preserving their third-party Git-based collaboration workflows. By connecting directly to versioned source repositories, teams can maintain a single source of truth for their workflow code, simplifying change management and enhancing reproducibility.
Git integration for workflow creation is now supported in all regions where AWS HealthOmics is available: US East (N. Virginia), US West (Oregon), Europe (Frankfurt, Ireland, London), Asia Pacific (Singapore), and Israel (Tel Aviv).
To learn more about integrating Git repositories with your HealthOmics workflows, see the AWS HealthOmics documentation.
Today, AWS HealthOmics introduces enhanced workflow documentation capabilities with the addition of readme file support. This new feature allows bioinformaticians and researchers to attach comprehensive documentation, implementation details, and diagrams directly to their bioinformatics workflows and workflow versions. AWS HealthOmics is a HIPAA-eligible service that helps healthcare and life sciences customers accelerate scientific breakthroughs with fully managed biological data stores and workflows.
The readme file feature improves workflow management by providing a centralized location for critical workflow documentation, enabling more effective knowledge sharing across research teams. Users can now document parameters, input requirements, output formats, and usage instructions within the workflow itself, eliminating the need for separate documentation systems. This capability is particularly valuable for organizations with shared workflows, where multiple scientists need to understand and correctly execute complex bioinformatics pipelines. Readme files can be viewed directly in the AWS Management Console or programmatically accessed via GetWorkflow API calls, and can be updated as workflows evolve.
Readme file support is now available in all regions where AWS HealthOmics is available: US East (N. Virginia), US West (Oregon), Europe (Frankfurt, Ireland, London), Asia Pacific (Singapore), and Israel (Tel Aviv).
To learn more about creating and managing workflow readme files, see the AWS HealthOmics documentation.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) M8g and R8g instances are available in AWS Asia Pacific (Hong Kong) region. These instances are powered by AWS Graviton4 processors and deliver up to 30% better performance compared to AWS Graviton3-based instances. Amazon EC2 M8g instances are built for general-purpose workloads, such as application servers, microservices, gaming servers, midsize data stores, and caching fleets. Amazon EC2 R8g instances are ideal for memory-intensive workloads such as databases, in-memory caches, and real-time big data analytics. These instances are built on the AWS Nitro System, which offloads CPU virtualization, storage, and networking functions to dedicated hardware and software to enhance the performance and security of your workloads.
AWS Graviton4-based Amazon EC2 instances deliver the best performance and energy efficiency for a broad range of workloads running on Amazon EC2. These instances offer larger instance sizes with up to 3x more vCPUs and memory compared to Graviton3-based instances. AWS Graviton4 processors are up to 40% faster for databases, 30% faster for web applications, and 45% faster for large Java applications than AWS Graviton3 processors.
AWS is expanding service reference information to include information about which service actions are supported by the IAM Last Accessed and IAM Access Analyzer Policy Generation features. IAM Last Accessed and Policy Generator features help you journey towards least privilege permissions, and now you can easily reference which service actions are supported by these features in machine-readable files.
You can automate the retrieval of service reference information, eliminating manual effort and ensuring your policies align with the latest service updates. You can also incorporate this service reference directly into your policy management tools and processes for a seamless integration. This feature is offered at no additional cost. To get started, refer to the documentation on programmatic service reference information.
Amazon Connect now supports AWS CloudFormation for quick responses, enabling customers to deploy and manage quick responses using AWS CloudFormation templates. Quick responses allow contact center agents to access pre-configured messages to respond consistently and efficiently to common customer inquiries. With AWS CloudFormation, administrators can now define and deploy these quick responses across environments in a scalable and repeatable way.
Using AWS CloudFormation, organizations can standardize agent communications and reduce manual configuration between Amazon Connect instances. For example, you can use AWS CloudFormation templates to roll out updated response sets for seasonal campaigns or regulatory compliance across multiple Connect environments. This launch simplifies deployment and integrates seamlessly with continuous delivery pipelines.
This feature is available in all AWS Regions where Amazon Connect is offered. For a full list of supported Regions, see the AWS Region table. To learn more, see Quick responses in the Amazon Connect Administrator Guide or visit the Amazon Connect product page.
Amazon ElastiCache now supports Bloom filters as a new data type in ElastiCache version 8.1 for Valkey and above. Bloom filters are a space efficient probabilistic data structure that lets you quickly check whether an item is possibly in a set. This new feature is fully compatible with the valkey-bloom module and API compatible with the Bloom filter command syntax of the Valkey client libraries, such as valkey-py, valkey-java, and valkey-go. Previously, to find whether elements were added to your cache, you used the Set data type to write items to a set and then check if that item already existed. Bloom filters achieve the same outcome using a probabilistic approach and are over 98% more memory efficient than using sets without compromising performance.
Bloom filters are available today in Amazon ElastiCache version 8.1 for Valkey in all AWS Regions and for serverless and node-based offerings at no additional cost. To learn more about Bloom filters on ElastiCache for Valkey, check out the ElastiCache documentation. For the full documentation and list of supported commands, see the Bloom filter documentation.
Amazon Aurora PostgreSQL Limitless Database is now available in the US West (N. California), Africa (Cape Town), Asia Pacific (Hyderabad, Jakarta, Malaysia, Melbourne, Mumbai, Osaka, Seoul, Thailand), Canada (Central), Canada West (Calgary), Europe (London, Milan, Paris, Spain, Zurich), Israel (Tel Aviv), Mexico (Central), Middle East (Bahrain, UAE), and South America (Sao Paulo) Regions.
Aurora PostgreSQL Limitless Database makes it easy for you to scale your relational database workloads by providing a serverless endpoint that automatically distributes data and queries across multiple Amazon Aurora Serverless instances while maintaining the transactional consistency of a single database. Aurora PostgreSQL Limitless Database offers capabilities such as distributed query planning and transaction management, removing the need for you to create custom solutions or manage multiple databases to scale. As your workloads increase, Aurora PostgreSQL Limitless Database adds additional compute resources while staying within your specified budget, so there is no need to provision for peak, and compute automatically scales down when demand is low.
Aurora PostgreSQL Limitless Database is available with PostgreSQL 16.6 and 16.8 compatibility in these regions.
AWS Glue now offers a new native connector for Microsoft Dynamics 365, enabling data engineers to easily integrate data from this enterprise resource planning (ERP) and customer relationship management (CRM) platform. This connector allows AWS Glue users to build efficient extract, transform, and load (ETL) jobs that seamlessly connect to Microsoft Dynamics 365 as a data source.
With this new connector, users can streamline their data integration processes, reducing the complexity and time required to incorporate Microsoft Dynamics 365 data into their AWS-based analytics and business intelligence workflows. Organizations can now leverage the power of AWS Glue’s fully-managed ETL service in conjunction with their Microsoft Dynamics 365 data, enabling more comprehensive insights and data-driven decision-making.
The AWS Glue connector for Microsoft Dynamics 365 is available in all regions where AWS Glue is supported.
To learn more about this new connector and how to get started, visit the AWS Glue documentation.
In April, we released Cluster Director, a unified management plane that makes deploying and managing large-scale AI infrastructure simpler and more intuitive than ever before, putting the power of an AI supercomputer at your fingertips. Today, we’re excited to release new features in preview including an intuitive interface, managed Slurm experience, and observability dashboard that intercepts performance anomalies.
From complex configuration to easy creation
AI infrastructure users can spend weeks wrestling with complex configurations for compute, networking, and storage. Because distributed training workloads are highly synchronized jobs across thousands of nodes and are highly sensitive to network latency, performance bottlenecks can be difficult to diagnose and resolve. Cluster Director solves these challenges with a single, unified interface that automates the complex setup of AI and HPC clusters, integrating Google Cloud’s optimized compute, networking, and storage into a cohesive, performant, and easily managed environment.
LG Research uses Google Cloud to train their large language models, most recently Exaone 3.5. They have significantly reduced the time it takes to have a cluster running with their code — from over a week to less than one day. That’s hundreds of GPU hours saved for real workloads.
“Thanks to Cluster Director, we’re able to deploy and operate large-scale, high-performance GPU clusters flexibly and efficiently, even with minimal human resources.” – Jiyeon Jung, AI Infra Sr Engineer, LG AI Research
Biomatter uses Google Cloud to scale their in silico design processes. Cluster Director has made the cluster deployment and management smooth, enabling them to dedicate more focus to the scientific challenges at the core of their work.
“Cluster Director on Google Cloud has significantly simplified the way we create, configure, and manage Slurm-based AI and HPC clusters. With an intuitive UI and easy access to GPU-accelerated instances, we’ve reduced the time and effort spent on infrastructure.” – Irmantas Rokaitis, Chief Technology Officer, Biomatter
Read on for what’s new in the latest version of Cluster Director.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e03bd28f970>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>
Simplified cluster management across compute, network, and storage
Use a new intuitive view in the Google Cloud console to easily create, update, and delete clusters. Instead of a blank slate, you start with a choice of validated, optimized reference architectures. You can add one or more machine configurations from a range of VM families (including A3 and A4 GPUs) and specify the machine type, the number of GPUs, and the number of instances. You can choose your consumption model, selecting on-demand capacity (where supported), DWS Calendar or Flex start modes, Spot VMs for cost savings, or attaching a specific reservation for capacity assurance.
Cluster Director also simplifies networking by allowing you to deploy the cluster on a new, purpose-built VPC network or an existing one. If you create a new network, the firewall rules required for internal communication and SSH access are configured automatically, removing a common pain point. For storage, you can create and attach a new Filestore or Google Cloud Managed Lustre instance, or connect to an existing Cloud Storage bucket. These integrations help ensure that your high-performance file system is correctly mounted and available to all nodes in the cluster from the moment they launch.
Powerful job scheduling with Managed Slurm
Cluster Director provides fault-tolerant and highly scalable job scheduling out of the box with a managed, pre-configured Slurm environment. The controller node is managed for you, and you can easily configure the login nodes, including machine type, source image, and boot-disk size. Partitions and nodesets are pre-configured based on your compute selections, but you retain the flexibility to customize them, now or in the future.
Topology-aware placement
To maximize performance, Cluster Director is deeply integrated with Google’s network topology. This begins when clusters are created, when VMs are placed in close physical proximity. Crucially, this intelligence is also built directly into the managed Slurm environment. The Slurm scheduler is natively topology-aware, meaning it understands the underlying physical network and automatically co-locates your job’s tasks on nodes with the lowest-latency paths between them. This integration of initial placement and ongoing job scheduling is a key performance enhancer, dramatically reducing network contention during large, distributed training jobs.
Comprehensive visibility and insights
Cluster Director’s integrated observability dashboard provides a clear view of your cluster’s health, utilization, and performance, so you can quickly understand your system’s behavior and diagnose issues in a single place. The dashboard is designed to easily scale to tens of thousands of VMs.
Advanced diagnostics to detect performance anomalies
In distributed ML training, stragglers refer to small numbers of faulty or slow nodes that eventually slow down the entire workload. Cluster Director makes it easy to quickly find and replace stragglers to avoid performance degradation and wasted spend.
Try out Cluster Director today!
We are excited to invite you to be among the first to experience Cluster Director. To learn more and express your interest in joining the preview, talk to your Google Cloud account team or sign up here. We can’t wait to see what you will build.
Building applications is sometimes messy, it’s always iterative, and it often works best when it’s collaborative. As a developer, you regularly experience the frustration of a cryptic error message and the quiet triumph of finding a clever workaround. Either way, finding help or sharing success is best facilitated by a community of builders.
That’s why we are excited to launch the Google Developer Program forums at at discuss.google.dev. The new forums are designed to help people build with Google technology. You will find discussion groups to engage with other developers and Google experts; how-to articles, reference architectures and use cases; and a community of users looking to help.
We’re also migrating the existing Google Cloud, Workspace Developer, AppSheet, and Looker communities, channels and content from googlecloudcommunity.com over to discuss.google.dev. So, existing knowledge isn’t lost – it’s just moving to a new home. And by migrating the community we’re able to focus on two core principles in the new design: high trust and high utility.
Signal over noise
Your Google Developer Program profile is how you will access the forums. By unifying our sign-in, and connecting forum profiles directly to Google Developer Program profiles, we can programmatically display your earned credentials and reputation which you’ve earned through learning, events, and meetups that happen across the Google ecosystem.
aside_block
<ListValue: [StructValue([(‘title’, ‘Not a Google Developer Program member yet?’), (‘body’, <wagtail.rich_text.RichText object at 0x3e03bd2a8580>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
We’re starting with the Google Developer Expert flair icon next to a user’s name and we plan to extend this to other programs in the near future. Additionally, if you are part of a private product beta or Early Access Program (EAP), your forum account is automatically granted access to the corresponding private discussion groups. No more filling out forms or waiting for permissions. Your Developer Program profile is your passport.
Why we chose Discourse for our new forums
While we were tempted to build a custom solution from scratch we chose Discourse for a few key reasons:
Built by and for developers: Discourse is an open-source platform that prioritizes function over flash with markdown, code formatting, keyboard navigation, and structured conversations.
Extensibility: Its robust API and plugin architecture allow us to integrate our own Google technologies—like Gemini-powered spam filtering and the Google Developer Program—without reinventing the wheel.
This is your invitation!
This new community is a space for all of us. Come say hello! Ask a question, or answer one. Share what you’re working on, or get help with what you’re stuck on. This is where the real work happens, and we want to be a part of it with you.
In the coming months, you’ll see more of our engineers, product managers, and developer advocates join the conversation to not only help answer questions, but also ask them, share their own ideas, and engage with the same passion as you do. They won’t always have a perfect solution to a tricky question, but they’re committed to listen, engage, and work with the community to find the best path forward.
How to Get Started
Explore Now: Visit https://discuss.google.dev. Browse the categories, read ongoing discussions, and find your community.
Join the Conversation: If you’re a Google Developer Program member, sign in and dive in! Ask those tough questions, share your solutions, and contribute your expertise. Not a member yet? Visit developers.google.com/program to learn more and join at no-cost.
For googlecloudcommunity.com users: We’re working to make the transition as smooth as possible. You’ll find familiar topics and a wealth of historical discussions here. We encourage you to explore and continue your conversations on this new, unified platform.