Azure – Infrastructure and quality enhancements for Azure Container Registry
ACR now supports expanded registry capacity of up to 40TiB and optimized geo-replication performance.
Read More for the details.
ACR now supports expanded registry capacity of up to 40TiB and optimized geo-replication performance.
Read More for the details.
Enhancing network and resource health visualization with unified and dynamic topology across subscriptions, regions, and resource groups integrated with actionable connectivity and traffic insights.
Read More for the details.
Introducing redeploying capability for Azure dedicated hosts
Read More for the details.
Traditional Kubernetes networking excels at basic Pod-to-Pod connectivity, but can fall short when addressing the security, performance, and compliance demands of telecom workloads. This limits telecom providers’ ability to fully leverage Kubernetes’ scalability and agility benefits.
Google Cloud’s multi-networking approach empowers telecom providers to overcome these limitations, enabling capabilities such as but not limited to:
Strict network isolation: Enforce regulatory compliance and enhance security by isolating management, signaling, and media traffic within Kubernetes.
Maximum performance: For telecom applications that need hardware acceleration (e.g., SR-IOV), achieve the high throughput and low latency you need for demanding 5G Mobile Core, RAN, and data-plane workloads.
There are existing solutions such as Multus, a meta-plugin for multi-homed Pods in Kubernetes, but it has some drawbacks:
Difficult to use: Multus relies on unstructured string-based annotations, making configuration complex and error-prone.
Inflexible: It doesn’t allow for dynamic addition or removal of networks to Pods, creating operational overhead and limiting adaptability.
Limited Kubernetes integration: Multus lacks native support for
Network Policies: Enforcing security across multiple networks is difficult.
Network Services: Load balancing and health checks for applications using secondary interfaces are unsupported.
Poor observability: Multus makes it hard to monitor and troubleshoot multi-network setups.
Google Cloud’s approach natively integrates multi-networking into Kubernetes. This makes it easy to use Kubernetes services, load balancers, Border Gateway Protocol (BGP), and network policies – all vital for building robust telecom networks. We’re also working on this concept with the Kubernetes community as part of a Kubernetes Enhancement Proposal in the Cloud Native Computing Foundation (CNCF).
Partners such as Ericsson are supportive of Google Cloud’s multi-networking approach with a focus on addressing the specific needs of telecom use cases.
“The challenge of ensuring security and operational efficiency in telecom deployments through network separation, while preserving Kubernetes’ standard networking capabilities, is solved by Google’s multi-network function. It’s noteworthy to observe how Google’s multi-network function provides a solution to this challenge without compromising on functionality.” – Ericsson
In this blog, we explore critical telecom use cases enabled by multi-networking. We also delve into two specific implementation examples in Google Distributed Cloud (GDC): isolating signaling traffic between a Mobility Management Entity (MME) and Home Subscriber Server (HSS) in a 4G/5G mobile core deployment as well as for isolating signaling traffic between the Network Exposure Function (NEF) and its Application Function (AF) peer.
A Cloud Native Network Function (CNF) application requires an additional interface to be provisioned into its Kubernetes pod. Each of these interfaces has to be in an isolated network for regulatory compliance purposes. The isolation has to be done on a Layer-2 network.
A CNF application requires a hardware-based interface (e.g., SR-IOV VF) to be provisioned to the workload Pod. The hardware will be leveraged by the user-space application (e.g. DPDK-based) for performance purposes (high bandwidth, low latency).
Multi-network use case: S6a Traffic
Now, let’s take a look at a specific use of the multi-network framework for networking between an Mobility Management Entity (MME) and Home Subscriber Server (HSS) function of a 4G mobile core network. In a 4G mobile network, the MME and the HSS are key network elements responsible for managing user connectivity and security:
MME:
Handles user attach/detach procedures, registering and deregistering devices on the network
Manages handovers between cell towers as users move around
Routes incoming and outgoing calls and data traffic to/from the user
HSS:
Stores subscriber information like authentication credentials, subscription details, and roaming agreements
Performs authentication and authorization procedures to verify user identity and access control
Provides user location information to the MME for emergency services and other purposes
Together, the MME and HSS provide a secure and smooth user experience while maintaining network efficiency and subscriber information integrity.
This connectivity scenario highlights a specific application of use case #1 above for purposes of connecting the MME and HSS Pods via an isolated signaling network using GDC in connected configuration which provides a fully managed hardware and software product that delivers modern applications equipped with AI, security, and open source at the edge. In this implementation, the MME uses a MACVLAN type of interface and the HSS Pods use a separate multi-network interface to steer the Diameter traffic onto the same signaling network.
Here’s a snippet of the signaling network definition and the modified HSS Pod spec to reference this additional Pod network on the respective Pods:
Multinetwork use case for N33 interface on NEF
Now, let’s look at a specific use of the multi-network framework for networking between a Network Exposure Function (NEF) and its Application Function (AF) Peer — two nodes in the 5G Network Architecture:
NEF – This network function provides a means to securely expose the services and capabilities provided by 3GPP network functions to application
AF Peer – This network function plays a key role in traffic management and QoS assignments.
In this scenario, the AF Peer of NEF resides in the customer’s trusted domain and is connected to the Signaling network (VRF). By default, the NEF Pods are connected to the default network (VRF) via their primary interface. In order to keep the traffic onto the same VRF, we enable signaling multi-networking on the NEF pods to connect them to the Signaling network.
For the above two use cases, the 3GPP endpoints of the HSS or NEF respectively, illustrated as Service VIPs (Virtual IPs), are exposed as Service type: LoadBalancer using the bundled BGP load balancer in GDC. This is a BGP-based L3 load balancer. It includes a BGP speaker as control plane and an eBPF based datapath. The BGP speaker automatically advertises Service IPs from Kubernetes Service (type=LoadBalancer) to the configured eBGP peers. The eBPF-based datapath then distributes the incoming Service VIP traffic to the backend pods.
Here’s a snippet of the load balancer spec:
And here we can see the config-map spec that allows us to deterministically choose the External IPs to be allocated to the 3GPP Services VIPs.
In this blog, we showed how Google Cloud’s multi-networking approach can unlock critical telecom use cases on Kubernetes. To learn more about using Google Distributed Cloud and GKE for telecom workloads, check out the following resources:
Kubernetes Enhancement Proposal for Multi-network, KEP-3698 Multi-network
Google Distributed Cloud overview: Google Distributed Cloud
Using Network Function Optimizer for GKE and GDC: Network Function Optimizer for GKE and GDC Edge
Multinetwork support and configuration in GKE: About the multi-network support for Pods | Google Kubernetes Engine (GKE)
Multinetwork Support and Configuration in GDC: Network Function operator | Distributed Cloud
Read More for the details.
BigQuery’s integrated speech-to-text functionality offers a powerful tool for unlocking valuable insights hidden within audio data. This service transcribes audio files, such as customer review calls, into text format, making them ready for analysis within BigQuery’s robust data platform. By combining speech-to-text with BigQuery’s analytics capabilities, you can delve into customer sentiment, identify recurring product issues, and gain a better understanding of the voice of your customer.
BigQuery speech-to-text transforms audio data into actionable insights, offering potential benefits across industries and enabling a deeper understanding of customer interactions across multiple channels. You can also use BigQuery ML to leverage Gemini 1.0 Pro to gain additional insights & data formatting such as entity extraction and sentiment analysis to the text extracted from audio files using BigQuery ML’s native speech-to-text capability. Below are some use cases and the business value for specific industries:
Industry
Use Cases
Business Potential
Retail/E-commerce
Analyzing customer call recordings to identify common pain points, product preferences, and overall sentiment
Improved product development by addressing issues mentioned in feedback
Enhanced customer service through personalization and targeted assistance
Enhanced marketing campaigns based on insights discovered in customer calls.
Healthcare
Transcribing patient-doctor interactions to automatically populate medical records, summarize diagnoses, and track treatment progress
More streamlined workflows for healthcare providers, reducing administrative burden
Comprehensive patient records for better decision-making
Potential identification of trends in patient concerns for research and improved care
Finance
Analyzing earnings calls and shareholder meetings to gauge market sentiment, identify potential risks, and extract key insights
Support for more informed investment decisions
Prompt identification of emerging trends or potential issues
Proactive Risk Management strategies
Media & Entertainment
Transcribing podcasts, interviews, and focus groups for content analysis and audience insights
Earlier identification of trending topics and themes for new content creation
Understanding audience preferences for program development or advertising
Accessibility improvements through automated closed-captioning
Using advanced AI features such as BigQuery ML, you still have access to all of the built-in governance features of BigQuery, which give you the ability to have access control passthrough, so you can restrict insights from customer audio files based upon row-level security you have on your BigQuery Object Table.
Ready to turn your audio data into insights? Let’s dive into how you can use speech-to-text in BigQuery:
Imagine you have a collection of customer feedback calls stored as audio files in a Google Cloud Storage bucket. BigQuery’s ML.TRANSCRIBE function, connected to a pre-trained speech-to-text model hosted on Google’s Vertex AI platform, lets you automatically convert these audio files into readable text within BigQuery. Think of it as a specialized translator for audio data. You tell the ML.TRANSCRIBE function where your audio files are located (in your object table) and which speech-to-text model to use. It then handles the transcription process, using the power of machine learning, and delivers the text results directly into BigQuery. This makes it easy to analyze customer conversations alongside other business data.
Let’s walk through the process together in BigQuery.
Setup instructions:
Before starting, choose your Google Cloud project, link a billing account, and enable the necessary API, full instructions here
Create a recognizer, a recognizer stores the configuration for speech recognition and is optional to create
Create a cloud resource connection and get the connection’s service account, full guide here
Grant access to the service account by following the steps here.
Create a dataset that will contain the model and the object table by following the steps here
Download and store the audio files in the Google Cloud Storage
Download 5 audio files from here
Create a bucket in Google Cloud Storage and a folder within the bucket
Upload the downloaded audio files in the folder
Create a remote model with a REMOTE_SERVICE_TYPE of CLOUD_AI_SPEECH_TO_TEXT_V2. A model makes the speech to text API available within BigQuery.
Syntax:
Example query:
Syntax:
Sample code:
Please replace ‘BUCKET_PATH’ with your Google Cloud Storage bucket/folder path where audio files are stored
Syntax:
Sample query:
Result:
The results of ML.TRANSCRIBE include these columns:
transcripts: Contains the text transcription of the processed audio files
ml_transcribe_result: JSON value that contains the result from the Speech-to-Text API
ml_transcribe_status: Contains a string value that indicates the success or failure of the transcription process for each row. It will be empty if the process is successful
The object table columns
The ML.TRANSCRIBE function eliminates the need for manual transcription, saving time and effort. Transcribed text becomes easily searchable and analyzable within BigQuery, enabling you to extract valuable insights from your audio data.
Follow-up Ideas
Take the text extracted from the audio files, and use Gemini 1.0 Pro with BigQuery ML’s ML.generate_text function, to extract entities such as product names, stock prices, or other types of entity data you are looking to extract and structure them in JSON.
Use Gemini 1.0 Pro with BigQuery ML to measure sentiment analysis of the extracted text, and structure positive & negative sentiments in JSON.
Join customer feedback verbatims & sentiment scores with Customer Lifetime Total Value score or other relevant customer data to see how quantitative data & qualitative data relate to each other.
Generate embeddings over the extracted text, and use vector search to search the audio files for specific content.
Curious to learn more? The official Google Cloud documentation on ML.TRANSCRIBE has all the details. Please also check out the blog on Gemini 1.0 Pro support for BigQuery ML to see other GenAI use cases as outlined in the Follow-up ideas.
Read More for the details.
One of Kubernetes’ big selling points is that each Pod has its own network address. This makes the Pod behave a bit like a VM, and frees developers from worrying about pesky things like port conflicts. It’s a property of Kubernetes that makes things easier for developers and operators, and has been credited as one of the design features that made it so popular as a container orchestrator. Google Kubernetes Engine (GKE) additionally adopts a flat network structure for all clusters in a VPC, which means that each Pod in each cluster has its own IP in the VPC and can communicate with Pods in other clusters directly (without needing NAT), a useful property which enables advanced features like container-native load balancing.
While this addressing layout has many advantages, the trade-off is that you can consume IPs rather quickly. With every Pod in every cluster being allocated IPs on the same VPC, and allowing space in those ranges for expansion, IPs get used very fast. IPv6 has long been proposed as the industry-wide solution to all these problems, and one day that will no doubt be true, but GKE doesn’t support single-stack IPv6 for Pod addressing, and not everyone is ready to drop IPv4 in any case, so how do you solve this problem with IPv4 ranges today?
In my travels as a product manager on GKE, the best solution I have seen is to use a non-RFC1918 IP range for Pods. While there are alternative approaches to solving this problem, what follows is the specific solution I have seen deployed successfully by multiple customers on GKE. Let’s take a closer look.
Most GKE clusters are created with the nodes in RFC1918 space, specifically 10.0.0.0/8. Did you know that you can still have all the benefits of a flat network structure like container-native addressing while preserving your 10.0.0.0/8 space utilization? The solution is to keep nodes in that CIDR range and to allocate just the Pod ranges (which use by far the most IPs) out of a larger non-RFC 1918 private address ranges like 100.64.0.0/10 or 240.0.0.0/4. Google Cloud VPC has native support for these other ranges, so within Google Cloud and between Pods in different clusters everything “just works.” You can for example connect to services like Cloud SQL, and between Pods in different clusters.
By keeping nodes in RFC 1918 space, IP masquerading can be used to mask the Pod’s address with that of the nodes, so that the rest of your off-VPC endpoints never need see a non-RFC 1918 address. Every endpoint outside of Google Cloud (or however you configure your masquerading rules) will see the 10.0.0.0/8 IP that it expects. Best of both worlds. The 100.64.0.0/10 range is reserved for shared use making it a great candidate to use first, giving you 4 million Pod addresses right off the bat, and with a potential quarter billion IPs for your Pods in 240.0.0.0/4, there’s plenty of room to grow beyond that.
It may not be immediately apparent that 240.0.0.0/4 is an acceptable range to use in your VPC for Kubernetes Pods. After all, on the public internet, this range has been reserved (since 1989 in fact) for future use, and could in theory be assigned one day. Private use of this range in your VPC doesn’t affect the public internet (no routes using this range will be advertised outside your VPC), but is there any downside? In the event this range was allocated one day, what it means is that hosts in your network wouldn’t be able to initiate outbound connections to hosts in that range. There’d be no impact to inbound connections that utilize load balancing (as most do). In other words, should that range ever be allocated, you can still serve customers on those addresses.
The other concern I’ve heard about using 240.0.0.0/4 ranges is that on-prem routers don’t support them, and neither do Windows hosts. There is a really simple solution to both those concerns, as you can easily configure IP masquerading for any destinations that don’t support it, meaning the only IP those services will see is from your 10.0.0.0/8 primary range.
Some Kubernetes platforms outside Google Cloud offer an “island mode” network design where you reuse the same Pod IP ranges in every cluster, and I’ve heard requests for this in GKE as well. The approach documented here is better in my view: you get the advantage of the flat network within the VPC (enabling things like container-native load balancing), while traffic can still be NATed over the Node’s IP when needed. By comparison, an “island mode” design will NAT all traffic that leaves the cluster (including Pod-to-Pod traffic between clusters), limiting what you can do inside the VPC.
So that’s Pods, but what about Services? Service ranges are another concern for IP allocation. GKE in Autopilot mode now automatically reuses the same /20 range for every cluster, giving you 4k services without allocating any of your network (service IPs are virtual and have no meaning outside the cluster, so there is no need to give them unique identifiers). On node-based GKE Standard mode, or if you need more than 4k services, you can create your own named subnet of whatever size you need (including out of 240.0.0.0/4 space), and reuse it for every cluster in the region as well (by passing it in the –services-secondary-range-name parameter when creating the cluster).
In summary, a recommended way to reduce IP usage while maintaining all the benefits of a flat network structure is to:
Allocate Node IP ranges from your main ranges (like 10.0.0.0/8)
Allocate Pod IP ranges from non-RFC 1918 space, like 100.64.0.0/10 and 240.0.0.0/4, while utilizing IP Masquerading to NAT with the Node’s IP for on-prem destinations or anywhere that expects a RFC 1918 range.
Use Autopilot mode, which automatically provides 4k IPs for your services, or create a named subnet for services and reuse it for all clusters by passing it during cluster creation
With these steps, I have seen several customers solve their IP constraints, and adopt this strategy as a bridge to one day running a cluster with single-stack IPv6 Pod addressing.
Next steps:
Plan your IP address management
Use non-RFC 1918 IP address ranges in GKE
Learn about the supported IPv4 ranges by Cloud VPC
Learn about IP Masquerading in GKE
Try GKE’s Autopilot mode for a workload-based API that also improves operational efficiency
Read More for the details.
The explosive growth of malware continues to challenge traditional, manual analysis methods, underscoring the urgent need for improved automation and innovative approaches. Generative AI models have become invaluable in some aspects of malware analysis, yet their effectiveness in handling large and complex malware samples has been limited. The introduction of Gemini 1.5 Pro, capable of processing up to 1 million tokens, marks a significant breakthrough. This advancement not only empowers AI to function as a powerful assistant in automating the malware analysis workflow but also significantly scales up the automation of code analysis. By substantially increasing the processing capacity, Gemini 1.5 Pro paves the way for a more adaptive and robust approach to cybersecurity, helping analysts manage the asymmetric volume of threats more effectively and efficiently.
The foundation of automated malware analysis is built on a combination of static and dynamic analysis techniques, both of which play crucial roles in dissecting and understanding malware behavior. Static analysis involves examining the malware without executing it, providing insights into its code structure and unobfuscated logic. Dynamic analysis, on the other hand, involves observing the execution of the malware in a controlled environment to monitor its behavior, regardless of obfuscation. Together, these techniques are leveraged to gain a comprehensive understanding of malware.
Parallel to these techniques, AI and machine learning (ML) have increasingly been employed to classify and cluster malware based on behavioral patterns, signatures, and anomalies. These methodologies have ranged from supervised learning, where models are trained on labeled datasets, to unsupervised learning for clustering, which identifies patterns without predefined labels to group similar malware.
Despite technological advancements, the increasing complexity and volume of malware present substantial challenges. While ML enhances the detection of malware variants, it remains inadequate against completely new threats. This detection gap allows advanced attacks to slip through cybersecurity defenses, compromising system protection.
Code Insight, unveiled at the RSA Conference 2023, marked a significant step forward in leveraging generative AI (gen AI) for malware analysis. This novel feature of Google’s VirusTotal platform specializes in analyzing code snippets and generating reports in natural language, effectively emulating the approach of a malware analyst. Initially supporting PowerShell scripts, Code Insight later expanded to other scripting languages and file formats, including Batch, Shell, VBScript, and Office documents.
By processing the code and generating summary reports, Code Insight assists analysts in understanding the behavior of the code and identifying attack techniques. This includes uncovering hidden functionalities, malicious intent, and potential attack vectors that might be missed by traditional detection methods.
However, due to the inherent constraints of large language models (LLMs) and their limited token input capacity, the size of files that Code Insight could handle was restricted. Although there have been continuous improvements to increase the maximum file size limit and support more formats, analyzing binaries and executables still poses a significant challenge. When these files are disassembled or decompiled, their code size typically surpasses the processing capabilities of the LLMs available at the time. Consequently, gen AI models have functioned primarily as assistants to human analysts, enabling the analysis of specific code fragments from binaries rather than processing the entire code, which is often too voluminous for these models.
Reverse engineering is arguably the most advanced malware analysis technique available to cybersecurity professionals. This process involves disassembling the binaries of malicious software and carrying out a meticulous examination of the code. Through reverse engineering, analysts can uncover the exact functionality of malware and understand its execution flow. However, this method is not without its challenges. It requires an immense amount of time, a deep level of expertise, and an analytical mindset to interpret each instruction, data structure, and function call to reconstruct the malware’s logic and uncover its secrets.
Furthermore, scaling reverse engineering efforts poses a significant challenge. The scarcity of specialized talent in this field exacerbates the difficulty of conducting these analyses at scale. Given the intricate and time-consuming nature of reverse engineering, the cybersecurity community has long sought ways to augment this process, making it more efficient and accessible.
The ability to process prompts of up to 1 million tokens enables a qualitative leap in malware analysis, particularly in the realm of reverse engineering. This advancement finally brings the power of gen AI to the analysis of binaries and executables, a task previously reserved for highly skilled human analysts due to its complexity.
How does Gemini 1.5 Pro achieve this?
Increased capacity: With its expanded token limit, Gemini 1.5 Pro can entirely analyze some disassembled or decompiled executables in a single pass, eliminating the need to break down code into smaller fragments. This is crucial because fragmenting code can lead to a loss of context and important correlations between different parts of the program. When analyzing only small snippets, it is difficult to understand the overall functionality and behavior of the malware, potentially missing key insights into its purpose and operation. By analyzing the entire code at once, Gemini 1.5 Pro gains a holistic understanding of the malware, allowing for more accurate and comprehensive analysis.
Code interpretation: Gemini 1.5 Pro can interpret the intent and purpose of the code, not just identify patterns or similarities. This is possible due to its training on a massive dataset of code, encompassing assembly language from various architectures, high-level languages like C, and pseudo-code produced by decompilers. This extensive knowledge base, combined with its understanding of operating systems, networking, and cybersecurity principles, allows Gemini 1.5 Pro to effectively emulate the reasoning and judgment of a malware analyst. As a result, it can predict the malware’s actions and provide valuable insights even for never-seen-before threats. For more information on this, see the zero day case study section later in this post.
Detailed analysis: Gemini 1.5 Pro can generate summary reports in human-readable language, making the analysis process more accessible and efficient. This goes far beyond the simple verdicts typically provided by traditional machine learning algorithms for classification and clustering. Gemini 1.5 Pro’s reports can include detailed information about the malware’s functionality, behavior, and potential attack vectors, as well as indicators of compromise (IOCs) that can be used to feed other security systems and improve threat detection and prevention capabilities.
Let’s explore a practical case study to examine how Gemini 1.5 Pro performs in analyzing decompiled code with a representative malware sample. We processed two WannaCry binaries automatically using the Hex-Rays decompiler, without adding any annotations or additional context. This approach resulted in two C code files, one 268 KB and the other 231 KB in size, which together amount to more than 280,000 tokens for processing by the LLM.
In our testing with other similar gen AI tools, we faced the necessity of dividing the code into chunks. This fragmentation often compromised the comprehensiveness of the analysis, resulting in vague and non-specific outcomes. These limitations highlight the challenges of using such tools with complex code bases.
Gemini 1.5 Pro, however, marks a significant departure from these constraints. It processes the entire decompiled code in a single pass, taking just 34 seconds to deliver its analysis. The initial summary provided by Gemini 1.5 Pro is notably accurate, showcasing its ability to handle large and complex datasets seamlessly and effectively:
Issues a malicious verdict associated with ransomware
Identifies some files as IOCs (c.wnry and tasksche.exe)
Acknowledges the use of an algorithm to generate IP addresses and perform network scans to find targets on port 445/SMB to spread to other computers
Identifies URL/domain (WannaCry’s “killswitch”) and relevant registry key and mutex
While it might seem that Gemini 1.5 Pro’s report of WannaCry is based on pre-trained knowledge of this specific malware, this isn’t the case. The analysis comes from the model’s ability to independently interpret the code. This will become even clearer as we look at the upcoming examples where Gemini 1.5 Pro analyzes unfamiliar malware samples, demonstrating its wide-ranging capabilities.
In the previous example showcasing WannaCry analysis, there was a crucial step before feeding the code to the LLM: decompilation. This process, which transforms binary code into a higher-level representation like C, is fully automated and mirrors the initial steps taken by malware analysts when manually dissecting malicious software. But what is the difference between disassembled and decompiled code, and how does it impact LLM analysis?
Disassembly: This process converts binary code into assembly language, a low-level representation specific to the processor architecture. While human-readable, assembly code is still quite complex and requires significant expertise to understand. It is also much longer and more repetitive than the original source code.
Decompilation: This process attempts to reconstruct the original source code from the binary. While not always perfect, decompilation can significantly improve readability and conciseness compared to disassembled code. It achieves this by identifying high-level constructs like functions, loops, and variables, making the code easier to understand for analysts.
Given these factors, when using LLMs for binary analysis, decompilation offers several advantages on efficiency and scalability. The shorter and more structured output from decompilation fits more readily within the processing constraints of LLMs, allowing for a more efficient analysis of large or complex binaries. In fact, the output from a decompiler is five to 10 times more concise than that produced by a disassembler.
Disassembly is necessary to perform accurate decompilation and remains an invaluable tool in certain scenarios where detailed, low-level analysis is crucial. Given the structured and higher-level nature of decompiled output, there are specific circumstances where disassembly provides insights that decompilation cannot match.
Fortunately, Gemini 1.5 Pro demonstrates equal capability in processing both high-level languages and assembly across various architectures. Thus, our implementation for automating binary analysis can utilize both strategies or adopt a hybrid approach, as suited to the specific circumstances of each case. This flexibility allows us to tailor our analysis method to the nature of the binary in question, optimizing for efficiency, depth of insight, and the specific objectives of the analysis, whether that means dissecting the logic and flow of the program or diving into the intricate details of its low-level operations.
Next, we’ll examine a case where we directly employ disassembly for analysis. This time, we’re working with a more recent and unknown binary; in fact, the executable submitted to VirusTotal is flagged as malicious by only four out of the 70 VirusTotal anti-malware engines, and only in a generic sense, without providing any details about the malware family that could offer further clues about its behavior.
After automatic preprocessing with HexRays/IDA Pro, the 306.50 KB executable binary produces a 1.5 MB assembly file that Gemini 1.5 Pro can process in a single pass within 46 seconds , thanks to its large token window in the prompt. This capability allows for an analysis of the entire assembly output, offering detailed insights into the binary’s operations.
This case of the unknown binary showcases the remarkable capabilities of Gemini 1.5 Pro. Despite only four out of 70 anti-malware engines on VirusTotal flagging the file as malicious—using only generic signatures—Gemini 1.5 Pro identified the file as malicious, providing a detailed explanation for its verdict. The file is likely a game cheat designed to inject a game hack dynamic-link library (DLL) into the Grand Theft Auto video game process. The designation of “malicious” may depend on perspective: deemed malicious by the game’s developers or their security team focused on anti-cheating measures, yet potentially desirable for some players. Nevertheless, this automated first-pass analysis is not only impressive but also illuminating regarding the nature and intent of the binary.
The true test of any malware analysis tool lies in its ability to identify never-before-seen threats undetected by traditional methods and proactively protecting systems from zero-day attacks. Here, we examine a case where an executable file is undetected by any anti-virus or sandbox on VirusTotal.
The 833 KB file, medui.exe, was decompiled into 189,080 tokens and subsequently processed by Gemini 1.5 Pro in a mere 27 seconds to produce a complete malware analysis report in a single pass.
This analysis revealed suspicious functionalities, leading Gemini 1.5 Pro to issue a malicious verdict. Based on its observations, it concluded that the primary goal of this malware is to steal cryptocurrency by hijacking Bitcoin transactions and evading detection through the disabling of security software.
This showcases Gemini’s ability to go beyond simple pattern matching or ML classification and leverage its deep understanding of code behavior to identify malicious intent, even in previously unseen threats. This is a significant advancement in the field of malware analysis, as it allows us to proactively detect and respond to new and emerging threats that traditional methods might miss.
Gemini 1.5 Pro unlocks impressive capabilities, enabling the analysis of large volumes of decompiled and disassembled code. It has the potential to significantly change our approach to fighting malware by enhancing efficiency, accuracy, and our ability to scale in response to a growing number of threats.
However, it’s important to remember that this is just the beginning. While Gemini 1.5 Pro represents a significant leap forward, the field of gen AI is still in its infancy. There are several challenges that need to be addressed to achieve truly robust and reliable automated malware analysis:
Obfuscation and packing: Malware authors are constantly developing new techniques to obfuscate their code and evade detection. In response, there’s a growing need to not only continuously improve gen AI models but also to enhance the preprocessing of binaries before analysis. Adopting dynamic approaches that utilize various preprocessing tools can more effectively unpack and deobfuscate malware. This preparatory step is crucial for enabling gen AI models to accurately analyze the underlying code, ensuring they keep pace with evolving obfuscation techniques and remain effective in detecting and understanding sophisticated malware threats.
Increasing binary size: The complexity of modern software is mirrored in the growing size of its binaries. This trend presents a significant challenge, as the majority of gen AI models are constrained by much lower token window limits. In contrast, Gemini 1.5 Pro stands out by supporting up to 1 million tokens—currently the highest known capacity in the field. Nevertheless, even with this remarkable capability, Gemini 1.5 Pro may encounter limitations when handling exceptionally large binaries. This underscores the ongoing need for advancements in AI technology to accommodate the analysis of increasingly large files, ensuring comprehensive and effective malware analysis as software complexity continues to escalate.
Evolving attack techniques: As attackers continuously innovate, crafting new methods to bypass security measures, the challenge for gen AI models extends beyond simple adaptability. These models must not only learn and recognize new threats but also evolve in conjunction with the efforts of researchers and developers. There’s a need to devise new methods for automating the preprocessing of threat data, which would enrich the context provided to AI models. For instance, integrating additional data from static and dynamic analysis tools, such as sandbox reports, plus the decompiled and disassembled code, can significantly enhance the models’ understanding and detection capabilities.
The journey towards scaling automated malware analysis is ongoing, but Gemini 1.5 Pro marks a significant milestone. At GSEC Malaga, we continue to research and develop ways to apply these models effectively in AI, pushing the boundaries of what’s possible in cybersecurity and contributing to a safer digital future.
The following table contains details on the malware samples discussed in this post.
Filename
SHA-256 Hash
Size
First Seen
File Type
lhdfrgui.exe (WannaCry dropper)
24d004a104d4d54034dbcffc2a4b19a
11f39008a575aa614ea04703480b1022c
3.55 MB (3723264 bytes)
2017-05-12
Win32 EXE
tasksche.exe (WannaCry cryptor)
ed01ebfbc9eb5bbea545af4d01bf5f10
71661840480439c6e5babe8e080e41aa
3.35 MB (3514368 bytes)
2017-05-12
Win32 EXE
EXEC.exe
1917ec456c371778a32bdd74e113b0
7f33208740327c3cfef268898cbe4efbfe
306.50 KB (313856 bytes)
2022-04-18
Win32 EXE
medui.exe
719b44d93ab39b4fe6113825349add
fe5bd411b4d25081916561f9c403599e50
833.50 KB (853504 bytes)
2024-03-27
Win32 EXE
The following is the exact prompt used in all the examples covered in the post. The only exception is the example where the word “disassembled” is used instead of “decompiled” because, as explained, we’re working with disassembled code rather than decompiled code to show that Gemini 1.5 Pro can interpret both.
Act as a malware analyst by thoroughly examining this decompiled executable code. Methodically break down each step, focusing keenly on understanding the underlying logic and objective. Your task is to craft a detailed summary that encapsulates the code’s behavior, pinpointing any malicious functionality. Start with a verdict (Benign or Malicious), then a list of activities including a list of IOCs if any URLs, created files, registry entries, mutex, network activity, etc.
+[attached decompiled.c.txt sample file]
Read More for the details.
Viewing changes to your Azure resources just became easier! With Azure Resource Graph’s Change Analysis, you can now view all your resource changes across all your tenants and subscriptions in the Azure Portal.
Read More for the details.
Azure Deployment Environments is adding a new extensibility model that empowers customers to customize deployment workflows using Bicep, Terraform, Pulumi, or any other infrastructure-as-code (IaC) framework of their choice.
Read More for the details.
Azure Data Catalog will now be retired on 15 May 2024 – migrate to Microsoft Purview
Read More for the details.
AWS CodePipeline V2 type pipelines now support stage level rollback to help customers to confidently deploy changes to their production environment. When a pipeline execution fails in a stage due to any action(s) failing, customers can quickly get that stage to a known good state by rolling back to a previously successful pipeline execution in that stage. Customers can roll back changes in any stage, whether succeeded or failed, except the Source stage.
Read More for the details.
Customers can now create and manage default policies across their entire organization or organizational unit (OU) with AWS CloudFormation StackSets. Default policies work in conjunction with customers’ existing backup mechanisms to only create EBS-backed AMIs and EBS Snapshots of instances and volumes without recent backups. This helps administrators ensure that all member accounts have comprehensive backup protection without creating duplicate backups or increasing management overhead and cost.
Read More for the details.
You can now restore your Amazon Managed Service for Apache Flink application to the previous running version and application state from the most recent, successful snapshot. This feature will work when your application is running and is most useful when you want to immediately rollback to the previous application version to mitigate downstream impact of an application update. Prior to this launch, you could only rollback applications that were in updating or autoscaling statuses.
Read More for the details.
Network Load Balancer (NLB) now supports Resource Map, a tool in the console that displays all your NLB resources and their relationships in a visual format on a single page, providing you a clear understanding of your NLB architecture.
Read More for the details.
Amid all the excitement around the potential of generative AI to transform business and unlock trillions of dollars in value across the global economy, it is easy to overlook the significant impact that the technology is already having. Indeed, the era of gen AI does not exist at some vague point in the not-too-distant future: it is here and now.
The advent of generative AI marks a significant leap in the evolution of computing. For Media customers, generative AI introduces the ability to generate real time, personalized and unique interactions that weren’t possible before. This technology is not just revolutionizing the way we streamline the content creation process but it is also transforming broadcasting operations, such as discovering and searching media archives.
Simultaneously, in Telco, generative AI boosts productivity by creating a knowledge based engine that can summarize and extract information from both large structures and unstructured data that employees can use to solve a customer problem, or by shortening the learning curve. Furthermore, generative AI can be easily implemented and understood by all levels of the organization without needing to know the model complexity.
The telecommunications and media industry is at the forefront of integrating generative AI into their operations, viewing it as a catalyst for growth and innovation. Industry leaders are enthusiastic about its ability to not only enhance the current processes but also spearhead new innovations, create new opportunities, unlock new sources of value and improve the overall business efficiency.
Communication Service Providers (CSPs) are now using generative AI to significantly reduce the time it takes to perform network-outage root-cause analysis. Traditionally, identifying the root cause of an outage involved engineers mining through several logs, vendor documents, past trouble tickets, and their resolutions. Vertex AI Search enables CSPs to extract relevant information across structured and unstructured data, and significantly shorten the time for a human engineer to identify probable causes.
“Generative AI is helping our employees to do their jobs and increase their productivity, allowing them to spend more time strengthening the relationship with our customers” explains Uli Irnich, CIO of Vodafone Germany.
Media organizations are using generative AI to smoothly and successfully engage and retain viewers by enabling more powerful search and recommendations. With Vertex AI, customers are building an advanced media recommendations application and enabling audiences to discover personalized content, with Google-quality results that are customized by optimization objectives.
While the potential of generative AI is widely recognised, challenges to its widespread adoption still persist. On the one hand, many of these stem from the sheer size of the businesses involved, with legacy architecture, siloed data, and the need for skills training presenting obstacles to more widespread and effective usage of generative AI solutions. On the other hand, many of these risk-averse enterprise-scale organizations want to be sure that the benefits of generative AI outweigh any perceived risks. In particular, businesses seek reassurance around the security of customer data and the need to conform to regulation, as well as around some of the challenges that can arise when building generative AI models, such as hallucinations (more on that below).
As part of our long-standing commitment to the responsible development of AI, Google Cloud put our AI Principles into practice. Through guidance, documentation, and practical tools, we are supporting customers to help ensure that businesses are able to roll out their solutions in a safe, secure, and responsible way. By tackling challenges and concerns head on, we are working to empower organizations to leverage generative AI safely and effectively.
One such challenge is “hallucinations,” which are when a generative AI model outputs incorrect or invented information in response to a prompt. For enterprises, it’s key to build robust safety layers before deploying generative AI powered applications. Models, and the ways that generative AI apps leverage them, will continue to get better, and many methods for reducing hallucinations are available to organizations.
Last year, we introduced grounding capabilities for Vertex AI, enabling large language models to incorporate specific data sources for model response generation. By providing models with access to specific data sources, grounding tethers their output to specific data and reduces the chances of inventing content. Consequently, it reduces model hallucinations, anchors the model response to specific data sources and enhances the trustworthiness of generated content. Grounding lets the model access information that goes beyond its training data. By linking to designated data stores within Vertex AI Search, the grounded model can produce relevant responses.
As AI-generated images become increasingly popular, we offer digital watermarking and verification on Vertex AI, making us the first cloud provider to enable enterprises with a robust, usable and scalable approach to create AI-generated images responsibly, and identify them with confidence. Digital watermarking on Vertex AI provides two capabilities: Watermarking, which produces a watermark designed to be invisible to the human eye, and does not damage or reduce the image quality, and Verification, which determines whether an image is generated by Imagen vis a vis a confidence interval. This technology is powered by Google DeepMind SynthID, a state-of-the art technology that embeds the watermark directly into the image of pixels, making it imperceptible to the human eye, and very difficult to tamper with without damaging the image.
Given the versatility of Large Language Models, predicting unintended or unexpected output is challenging. To address this, our generative AI APIs have safety attribute scoring, enabling customers to test Google’s safety filters and set confidence thresholds suitable for their specific use case and business. These safety attributes include “harmful categories” and topics that can be considered sensitive, each assigned a confidence score between 0 to 1. This score reflects the likelihood of the input or response belonging to a given category. Implementing this measure is a step forward to a positive user experience, ensuring outputs align more closely with the desired safety standards.
As we work to develop generative AI responsibly, we keep a close eye on emerging regulatory frameworks. Google’s AI/ML Privacy Commitment outlines our belief that customers should have a higher level of security and control over their data on the cloud. That commitment extends to Google Cloud generative AI solutions: by default Google Cloud doesn’t use customer data (including prompts, responses and adapter model training data) to train its foundation models. We also offer third-party intellectual property indemnity as standard for all customers.
By integrating responsible AI principles and toolkits into all aspects of AI development, we are witnessing a growing confidence among organizations in using Google Cloud generative AI models and the platform. This approach enables them to enhance customer experience, and overall, foster a productive business environment in a secure, safe and responsible manner. As we progress on a shared generative AI journey, we are committed to empowering customers with tools and protection they need to use our services safely, securely and with confidence.
“Google Cloud generative AI is optimizing the flow from ideation to dissemination,” says Daniel Hulme, Chief AI Officer at WPP. “And as we start to scale these technologies, what is really important over the coming years is how we use them in a safe, responsible and ethical way.”
Read More for the details.
A lack of skills holds back tens of millions of people from finding jobs, growing in their careers, and adapting to today’s business opportunities. For example, an estimated 920 million people globally have an education that does not match their job1, while 60% of workers will require new training before 2027 but only some have access to adequate training opportunities2.
Expanding access to continuing education is a great way to level the playing field for everyone and give people a clearer understanding of the skills needed for a given job – and how to build those skills.
Jobspeaker, a Google Cloud EdTech partner, believes that bringing together educators, learners, and employers can significantly reduce the strain on people and businesses caused by the economic cycle and exponentially increasing technology effects on the job market.
“People need different things at different stages in their careers,” says Jarlath O’Carroll, Founder and Chief Executive Officer of Jobspeaker. “In the past decade, we’ve seen more people looking to re-skill or upskill in response to the quickly evolving economy. We focus on making re-skilling and upskilling as effective and efficient as possible.”
Jobspeaker chose to use Google Cloud and become a Google EdTech partner in building their exploration, learning, and work platform that improves skills matching for learners – including students, job seekers, and professionals – as well as educators and employers.
Mapping skills through a new common language
Since its inception, Jobspeaker has worked to create a complete suite of tools for career planning that focuses on clarifying what skills are required for desired jobs or careers and provides a path to gain those skills.
“We chose to focus on the language of skills because there was such a gap in understanding by both employers and job seekers,” says Richard Varn, Chief Information Officer and board member at Jobspeaker. “Establishing reliable skills descriptions and communications among learners, educators, and employers will lead to better outcomes for everyone.”
To accomplish its goals, Jobspeaker needed IT solutions that would enable it to extract specific information regarding the skills students acquire throughout their academic journeys, as well as those that employers seek. Google Cloud proved to be the best option because it provides the tools to extract vast amounts of information at scale.
“The task we had for AI was to pull out details about skills, competencies, activities, knowledge, and abilities in business and academia from highly unstructured data,” says Varn. “Given the scale and complexity of the data, we needed highly automated processes powered by a configurable AI infrastructure to support our machine learning.”
Jobspeaker chose to work with Vertex AI for its curriculum-to-skills mapping. After achieving initial success with classification work, Jobspeaker saw opportunities to use new generative AI capabilities in Vertex AI. These tools are now applied to extracting data that identifies and aggregates skills developed in education and maps them to job descriptions.
Jobspeaker is working to map every type of learning exercise, from a 15-minute educational YouTube video to a full four-year degree program, as professionals and students continue to learn from a wider array of sources. So far, Jobspeaker has successfully processed over 6,300 programs and 25,000 courses across higher and continuing education.
Speed has also improved by using Vertex AI. Jobspeaker’s processing took three to four weeks when it used a more manual process and on-prem IT. That was reduced to one to two weeks after moving to Google Cloud and now sits at under two days as it fine tunes more models. Jobspeaker expects to see even more complex skill mapping take as little as two or three hours in the near future.
Jobspeaker expects to see even more complex skill mapping take significantly less time as its processes evolve in the near future.
Screenshot of Jobspeaker user interface image
Aligning with the right cloud provider
Jobspeaker also chose to work with Google Cloud because of its scalable infrastructure and expertise in search technologies. The company hopes to do for education and employment what Google does for so many industries.
“Our ultimate goal is to discover and use any kind of information that connects education to careers, understand the information in detail, and articulate the insights to our users,” says O’Carroll.
Jarlath O’Carroll
Founder and Chief Executive Officer of Jobspeaker
Jobspeaker now has the underlying infrastructure to scale up its skills mapping processes. Compute Engine powers Jobspeaker’s applications and infrastructure, providing an on-demand, efficient foundation for the company’s IT architecture.
In addition, Jobspeaker believes Google Cloud’s commitment to improving education for all is another strong area of alignment.
“While our initial interest in Google Cloud was based on the technology it offers, we’ve learned a lot more about the impressive things Google has done in education,” says O’Carroll. “We hope to create and deploy a Google Chromebook plug-in version of our service to increase its availability to more learners.”
All eyes on AI
Jobspeaker believes its decision to run on Google Cloud puts it in a strong position to experiment with AI as the technology evolves. The company is planning to use Gemini models, which is Google’s most capable and general model design to be multimodal, as it scales out its skills mapping processes to reach a wider variety of learners, educators, and employers.
“We are committed to helping educators, employers and learners navigate constantly changing economic landscapes,” says O’Carroll. “Recessions, technology advances, and the pandemic have all disrupted careers. We believe our platform gives people the chance to course correct their careers at any point. Google Cloud, through promising AI technologies like Gemini, will help us achieve our goals.”
For more information on how Google Cloud is helping EdTech companies succeed, read more EdTech success stories on the Public Sector blog.
International Labor Organization: ILOSTAT, Feb 2023
2. World Economic Forum, Future of Jobs Report, 2023
Read More for the details.
Editor’s Note: The Ford Motor Company, one of the most recognizable auto brands in the world, recently updated its database strategy to modernize its workloads and focus on managed database services from Google Cloud. Ford has seen a large drop in the time spent on database-related operational tasks by managing databases in Google Cloud.
Since 1903, Ford has been a household name when it comes to automotive innovation. From the first moving assembly line to the latest driver-assist technology, we strive to stay at the forefront of the industry so that every person is free to move and pursue their dreams. That overarching vision also extends to our internal IT teams, which are always looking for ways to modernize and improve our technology stack.
At the database level, our goal is to enable always-on products with minimum downtime. By migrating to fully managed Google Cloud databases like Cloud SQL, we significantly reduced our management overhead. We’ve already seen a large drop in database-related operational tasks.
Our database fleet was spread across cloud and on-premises environments, leveraging various technologies. Provisioning and managing resources in those systems posed a big challenge. Every tech refresh was time consuming, especially upgrading to major versions and applying security patches to each database instance. In addition, scalability was a big — and unpredictable — issue. We had to forecast the amount of resources we would need to keep up and increase our database capacity. And beyond that, we were also managing backups of on-premises resources. All of this work required a global team of database administrators who were busy supporting Day 2 activities just for our on-premises fleet.
Our database strategy needed to be cost-effective, boost resiliency, increase cloud adoption and collaboration, and modernize applications across the business. As we looked for cloud-based alternatives, the Google Cloud portfolio of database services offered a variety of solutions. We saw they would not only quickly address our current requirements, but also help us build for the future.
Google Cloud’s database offerings provide versatility with open-source options — spanning relational, document-based, analytics, and hybrid workloads. Global accessibility and multi-regional distributed databases offered seamless data flow and the elastic nature provided the scalability, resilience, and minimal downtime essential to our always-on product vision.
The way Google envisions the integration of data and artificial intelligence (AI) supports Ford’s modernization plans.
Today, we’ve migrated databases to Google Cloud across Cloud SQL, Spanner, Bigtable, Firestore, AlloyDB, Memorystore, and MongoDB Atlas. This process was facilitated by both external and native tools for both homogeneous and heterogeneous migrations, and was a smooth, efficient process. We’ve put together a standard set of tools within Google Cloud that we call our Opinionated Stack. This tailored framework helps with the migration of additional applications, smooths the transition to databases, and provides the ability to leverage Cloud Run and Cloud SQL. All of this helps us meet our goal of a cloud-first architecture.
With the help of Cloud SQL, we have met our database processing requirements, including data protection needs. We’ve seen a reduction in time spent on database-related operational tasks with zero backup failures. The agility afforded by serverless products has enhanced our operational efficiency and saved time in managing the lifecycle of our databases.
Google Cloud databases has led to a significant performance boost, with some products showing a 30% improvement in performance.
Our goal is not just about modernization for the sake of staying current — it’s about redefining what efficiency, collaboration, and innovation mean at Ford.
We’re fostering an environment of continuous learning and adaptation, crucial for keeping pace with the ever-evolving tech frontier and maintaining our leadership in the automotive industry.
As we look to the future, we aim to fully harness the potential of managed services, generative AI, and cloud database technology to drive efficiency, resilience, and innovation. This helps Ford continue to set industry benchmarks and deliver on our promise of freedom of movement in an increasingly connected world.
Dive into the Google Cloud databases portfolio.
Discover the benefits of Cloud SQL and get started with a free trial today!
Read More for the details.
At Google Cloud, we understand you have a diverse set of regulatory, compliance, and sovereignty needs. We strive to provide you with the controls you need and the flexibility to meet your requirements. We offer a range of customizable control packages, so you can choose the level of control that best aligns with your risk tolerance and compliance needs. This flexibility allows you to tailor your approach with minimal tradeoffs. Additionally, we work closely with local partners in select countries to offer Sovereign Controls by Partners to address regional requirements.
At Google Cloud Next, we announced several significant enhancements to further expand your power of choice. These include new Regional Controls and Sovereign Controls by Partners packages, new controls and audit enhancements, and a simplified compliance configuration and management experience for new workloads. These enhancements give you even more options to meet your requirements, at lower cost, and with increased ease of use.
Regional Controls, now in preview, expands Assured Workloads control package availability to 32 regions across 14 countries. Regional Controls includes foundational controls such as data residency (at-rest and during processing) and administrative Access Transparency, at no additional cost. With these updates, controls provided through Assured Workloads are now more accessible than ever to a wider range of Google Cloud customers.
We are also expanding our Sovereign Controls by Partners offering with the preview of Sovereign Controls by PSN in Italy and Sovereign Controls by SIA/Minsait in Spain. These local partners, as with T-Systems in Germany and S3NS in France, can provide additional layers of control including local support personnel, managed External Key Management (EKM) with Key Access Justifications (KAJ), and additional oversight options. EKM with KAJ provides strong control over your data. Since keys are stored outside of Google’s infrastructure, you or your local partner have the power to directly approve or deny any access requests.
You can read more about how partnerships like these have met the specific demands of our European customers and have helped to propel their businesses forward.
We also continue to expand the compliance controls and audit capabilities available to Google Cloud customers.
We are thrilled to announce that we now offer data residency core processing commitments to customers using Assured Workloads. This is a major milestone towards additional data residency guarantees, which makes it possible for enterprise and public sector customers to deploy regulated workloads and help keep their data within the country while it is processed by the service.
To help customers simplify their compliance audit process, our new Audit Manager can help automate control verification with proof of compliance for your workloads and data on Google Cloud. The compliance assessments and proof can help reduce the time and effort required in costly audit processes. Additionally, the available responsibility matrices clarify the shared responsibility between you and Google Cloud, and help you set the right configurations.
Organizations that need to process sensitive data in the cloud with strong guarantees around confidentiality can continue to use our Confidential Computing portfolio. We offer support for Confidential VMs, Containers, and your entire data processing pipeline, as well as ubiquitous data encryption, which can provide additional security and peace of mind about the encryption and protection of your data.
We’ve worked to make it easier to configure workload controls by default and migrate workloads that were not initially set up in Assured Workloads to a controlled environment. A new onboarding flow is now directly integrated into the Cloud Resource Manager (CRM). When setting up a Google Cloud folder from the CRM, simply choose ‘Assured Workloads Folder’ to automatically apply a chosen set of Regional, Sovereign, or Compliance controls to resources in that folder.
The new “Learn More” panel provides contextual information to help understand Assured Workloads capabilities during the folder creation process, and it can help you make an informed decision as to the right control package for your specific needs. We’ve also streamlined and simplified the setup flow to help you save time.
You can take advantage of our free trial program to check out our premium compliance offerings at no additional cost for a limited time.
If you’re looking to migrate existing Google Cloud workloads into an Assured Workloads controlled environment, we have an Analyze Move API that can assist you by pointing out any incompatibilities in moving your current projects into your chosen Assured Workloads program.
And if you’re not sure where to start your sovereignty journey, you can use our free interactive Digital Sovereignty Explorer to get personalized recommendations on potential cloud controls and other Google Sovereign Cloud solutions based on your unique requirements.
Read More for the details.
Monitoring machine learning (ML) models in production is now as simple as using a function in BigQuery! Today we’re introducing a new set of functions that enable model monitoring directly within BigQuery. Now, you can describe data throughout the model workflow by profiling training or inference data, monitor skew between training and serving data, and monitor drift in serving data over time using SQL — for BigQuery ML models as well as any model whose feature training and serving data is available through BigQuery. With these new functions, you can ensure your production models continue to deliver value while simplifying their monitoring.
In this blog, we present two companion notebooks to help you get hands-on with these features today!
Companion Introduction – a fast introduction to all the new functions
Companion Tutorial – an in-depth tutorial covering many usage patterns for the new functions, including using Vertex AI Endpoints, monitoring feature attributions, and an overview of how monitoring metrics are calculated.
A model is only as good as the data it learns from. Understanding the data deeply is essential for effective feature engineering, model selection, and ensuring quality through MLOps. BigQuery’s table-valued function ML.DESCRIBE_DATA provides a powerful tool for this, allowing you to summarize and describe an entire table with a single query.
Example: Identifying data issues
In the accompanying introduction notebook, we profile the training data ( penguin classification dataset) using the ML.DESCRIBE DATA function and quickly identify a data issue.
Here’s the resulting output table:
Notice that the min value for the sex column is a ‘.’. Ideally, we’d see the values MALE, FEMALE or null as indicated in the top_values.values column. This means that in addition to the 10 null values (indicated by the num_null column) there are also some null values indicated by a string with value ‘.’. This should be corrected before using it as training data.
The ML.DESCRIBE_DATA function is extra helpful because it summarizes each data type all in one table. There are also optional parameters that can be specified to control the number of quantiles for different numerical column types and the number of top values to return for categorical columns. The input data can be specified as a table or a query statement, allowing you to describe specific subsets of data (e.g., serving timeframes, or groups within your training data). The function’s flexibility extends beyond ML tasks: it even allows you to describe data stored outside of BigQuery, facilitating quick analysis for both model-building and broader data exploration purposes.
A trained model will perform only when the serving data is similar in distribution to the training data. Model monitoring helps ensure this by comparing training and serving data for shifts known as skew. BigQuery’s ML.VALIDATE_DATA_SKEW table valued function streamlines this process, allowing you to directly compare serving data to any BigQuery ML model’s training data.
Let’s see it in action:
This query directly compares the data in the serving table to the BigQuery ML model classify_species_logistic. The accompanying introduction notebook has the full code in an interactive example. In that notebook the serving data is simulated to create change in two of the features: body_mass_g and flipper_length_mm. The results of the ML.VALIDATE_SKEW function show anomalies detect for each of these:
The detection of skew is as easy as comparing a model in BigQuery to a table of serving data. During training, BigQuery ML models automatically compute and store relevant statistics. This eliminates the need for reusing the entire training dataset, making skew monitoring simple and cost-efficient. Importantly, the function intelligently focuses on features present in the model, further enhancing efficiency and workflow. With optional parameters, you can customize anomaly detection thresholds, metric types for categorical features, and even set different thresholds for specific features. Later, we’ll demonstrate how easily you can monitor skew for any model!
Beyond comparing serving data to training data, it’s also important to keep an eye on changes within serving data over time. Comparing recent serving data to previous serving data is another type of model monitoring known as drift detection. This uses the same detection techniques of metrics that compare distributions between a baseline and comparison dataset and flag anomalies that exceed set threshold. With the table valued function ML.VALIDATE_DATA_DRIFT, you can compare any two tables, or query statements results, directly for detection.
Drift detection in action:
Here, the same serving table is used as the baseline and comparison table but with different WHERE statements to filter the rows and compare today to yesterday as an example. The results below show that while the detection values did not surpass the threshold, they are approaching the threshold between two consecutive days for the features that have simulated change.
Just like with skew detection, you can also adjust the default detection threshold for anomaly detection as well as the metric type used for categorical features, and specify different thresholds for different columns and feature types. There are additional parameters to control the binning of numerical features for the metrics calculations.
If you’re already familiar with the TensorFlow Data Validation (TFDV) library, you’ll appreciate how these new BigQuery functions enhance your model monitoring toolkit. They bring the power of TFDV directly into your BigQuery workflows, allowing you to generate rich statistics, detect anomalies, and leverage TFDV’s powerful visualization tools — all with SQL. And the best part is it uses BigQuery’s scalable, serverless compute. Leverage BigQuery’s scalable, serverless compute for near-instant analysis, empowering you to take rapid action on model monitoring insights!
Let’s explore how it works:
Generate statistics with ML.TFDV_DESCRIBE
You can generate in-depth statistics summaries with table valued function ML.TFDV_DESCRIBE for any table, or query, in the same format as the TensorFlow tfdv.generate_statistics_from_csv() API:
The ML.TFDV_DESCRIBE function outputs statistics in a structured data format (a ‘proto’) that is directly compatible with TFDV: tfmd.proto.statistics_pb2.DatasetFeatureStatisticsList.
Using a bit of Python code in a BigQuery notebook, we can import the TFDV package as well as TensorFlow Metadata package and then make a call to the tfdv.visualize_statistics method while converting the data to the expected format. The ML.TFDV_DESCRIBE results were loaded to Python for the training data as train_describe and for the current day’s serving data as today_describe. See the accompanying tutorial for complete details.
This generates the amazing visualizations shown below that directly highlight shifts in the two parameters that we purposefully shifted in the serving data for this example: body_mass_g and flipper_length_mm
This streamlined workflow brings the power and precision of TensorFlow Data Validation directly to BigQuery and enables you to quickly visualize how sets of data differ. This provides deeper insight to model health monitoring and informs how to proceed with model training iterations.
Detect anomalies With ML.TFDV_VALIDATE
You can also precisely detect skew or drift anomalies with the scalar function ML.TFDV_VALIDATE, which compares tables, or queries, pinpointing potential model-breaking shifts.
Example:
These results are formatted in a structured data format (‘proto’) that is specifically compatible with TFDV’s display tools: tfmd.proto.anomalies_pbs2.Anomalies. Passing this as input to Python method tfdv.display_anomalies presents an easy-to-read table of anomaly detection results as presented after the code snippet:
Feature name
Anomaly short description
Anomaly long description
‘culmen_depth_mm’
High approximate Jensen-Shannon divergence between training and serving
The approximate Jensen-Shannon divergence between training and serving is 0.0483968 (up to six significant digits), above the threshold 0.03.
‘flipper_length_mm’
High approximate Jensen-Shannon divergence between training and serving
The approximate Jensen-Shannon divergence between training and serving is 0.917495 (up to six significant digits), above the threshold 0.03.
‘body_mass_g’
High approximate Jensen-Shannon divergence between training and serving
The approximate Jensen-Shannon divergence between training and serving is 0.356159 (up to six significant digits), above the threshold 0.03.
‘island’
High Linfty distance between training and serving
The Linfty distance between training and serving is 0.118041 (up to six significant digits), above the threshold 0.03. The feature value with maximum difference is: Dream
‘culmen_length_mm’
High approximate Jensen-Shannon divergence between training and serving
The approximate Jensen-Shannon divergence between training and serving is 0.0594803 (up to six significant digits), above the threshold 0.03.
‘sex’
High Linfty distance between training and serving
The Linfty distance between training and serving is 0.0513795 (up to six significant digits), above the threshold 0.03. The feature value with maximum difference is: FEMALE
The default detection methods for numerical and categorical data, as well as thresholds are the same as for the other functions shown above. You can customize detection with parameters in the function for precision monitoring needs. For a deeper dive, the accompanying tutorial includes a section that demonstrates how these metrics are calculated manually and uses this function to compare to the manual calculation results as a validation.
BigQuery’s model monitoring functions offer a streamlined solution whether you’re working with models deployed on Vertex AI Prediction Endpoints or using batch serving data stored within BigQuery (as shown above). Here’s how:
Batch serving: For batch prediction data already stored or accessible by BigQuery, the monitoring features are readily accessible just as demonstrated previously in this blog.
Online serving: Directly monitor models deployed on Vertex AI Prediction Endpoints. By configuring logging requests and responses to BigQuery, you can easily apply BigQuery ML model monitoring functions to detect skew and drift.
The accompanying tutorial provides a step-by-step walkthrough, demonstrating endpoint creation, model deployment, logging setup (for Vertex AI to BigQuery), and how to monitor both online and batch serving data within BigQuery.
To achieve truly scalable monitoring of shifts and drifts, automation is essential. BigQuery’s procedural language offers a powerful way to streamline this process, as demonstrated in the SQL query from our introductory notebook. This automation isn’t limited to monitoring; it can extend to continuous model retraining. In a production environment, continuous training would be accompanied by: proactively identifying data quality issues, adapting to real-world changes, and maintaining a rigorous deployment strategy aligned with your organization’s needs.
Let’s take a look at what the results look like:
A skew anomaly was detected and successfully triggered model retraining, restoring accuracy after the data changes. This demonstrates the value of automated monitoring and retraining for maintaining model performance in dynamic production environments.
To streamline this process, Google Cloud offers several powerful automation options::
Want a hands-on demonstration? Our accompanying tutorial dives into BigQuery scheduled queries, including historical backfilling, daily monitoring, and setting up email alerts for detected shifts and drifts. We’ll also be releasing future tutorials covering the other automation tools.
Building trustworthy machine learning systems requires continuous monitoring. BigQuery’s new model monitoring functions streamline this to just a few SQL functions:
Deeply understand your data: ML.DESCRIBE_DATA provides a comprehensive view of your datasets, aiding in feature engineering and quality checks.
Detect skew between training and serving data: ML.VALIDATE_DATA_SKEW directly compares BigQuery ML models against their serving data.
Monitor data drift over time: ML.VALIDATE_DATA_DRIFT empowers you to track changes in serving data, ensuring your model’s performance remains consistent.
Enhance your TFDV workflow: ML.TFDV_DESCRIBE and ML.TFDV_VALIDATE bring the precision of TensorFlow Data Validation directly into BigQuery, enabling more detailed visualizations and anomaly detection while leveraging BigQuery’s scalable, and efficient compute.
Getting Started
Extend from BigQuery ML models to Vertex AI Models and automate these new functions with Google Cloud offerings like BigQuery scheduled queries, Dataform, Workflows, Cloud Composer, or Vertex AI Pipelines. Dive into our hands-on notebooks to get started today:
Companion Introduction – a fast introduction to all the new functions
Companion Tutorial – an in-depth tutorial covering many usage patterns for the new functions, including using Vertex AI Endpoints, monitoring feature attributions, and an overview of how monitoring metrics are calculated
Read More for the details.
PyTorch’s flexibility and dynamic nature make it a popular choice for deep learning researchers and practitioners. Developed by Google, XLA is a specialized compiler designed to optimize linear algebra computations – the foundation of deep learning models. PyTorch/XLA offers the best of both worlds: the user experience and ecosystem advantages of PyTorch, with the compiler performance of XLA.
PyTorch/XLA stack diagram
We are excited to launch PyTorch/XLA 2.3 this week. The 2.3 release brings with it even more productivity, performance and usability improvements.
Before we get into the release updates, here’s a short overview of why PyTorch/XLA is great for model training, fine-tuning and serving. The combination of PyTorch and XLA provides key advantages:
Easy Performance: Retain PyTorch’s intuitive, pythonic flow while gaining significant and easy performance improvements through the XLA compiler. For example, PyTorch/XLA produces a throughput of 5000 tokens/second while finetuning Gemma and Llama 2 7B models and reduces the cost of serving down to $0.25 per million tokens.
Ecosystem advantage: Seamlessly access PyTorch’s extensive resources, including tools, pretrained models, and its large community.
These benefits underscore the value of PyTorch/XLA. Lightricks shares the following feedback on their experience with PyTorch/XLA 2.2:
“By leveraging Google Cloud’s TPU v5p, Lightricks has achieved a remarkable 2.5X speedup in training our text-to-image and text-to-video models compared to TPU v4. With the incorporation of PyTorch XLA’s gradient checkpointing, we’ve effectively addressed memory bottlenecks, leading to improved memory performance and speed. Additionally, autocasting to bf16 has provided crucial flexibility, allowing certain parts of our graph to operate on fp32, optimizing our model’s performance. The XLA cache feature, undoubtedly the highlight of PyTorch XLA 2.2, has saved us significant development time by eliminating compilation waits. These advancements have not only streamlined our development process, making iterations faster but also enhanced video consistency significantly. This progress is pivotal in keeping Lightricks at the forefront of the generative AI sector, with LTX Studio showcasing these technological leaps.” – Yoav HaCohen, Research team lead, Lightricks
PyTorch/XLA 2.3 keeps us current with PyTorch Foundation’s 2.3 release from earlier this week, and offers notable upgrades from PyTorch/XLA 2.2. Here’s what to expect:
1. Distributed training improvements
SPMD with FSDP: Fully Sharded Data Parallel (FSDP) support enables you to scale large models. The new Single Program, Multiple Data (SPMD) implementation in 2.3 integrates compiler optimizations for faster, more efficient FSDP.
Pallas integration: For maximum control, PyTorch/XLA + Pallas lets you write custom kernels specifically tuned for TPUs.
2. Smoother development
SPMD auto-sharding: SPMD automates model distribution across devices. Auto-sharding further simplifies this process, eliminating the need for manual tensor distribution. In this release, this feature is experimental, supporting XLA:TPU and single-host training.
PyTorch/XLA autosharding architecture
Distributed checkpointing: This makes long training sessions less risky. Asynchronous checkpointing saves your progress in the background, protecting against potential hardware failures.
3. Hello, GPUs!
SPMD XLA: GPU support: We have extended the benefits of SPMD parallelization to GPUs, making scaling easier, especially when handling large models or datasets.
PyTorch/XLA continues to evolve, streamlining the creation and deployment of powerful deep learning models. The 2.3 release emphasizes improved distributed training, a smoother development experience, and broader GPU support. If you’re in the PyTorch ecosystem and seeking performance optimization, PyTorch/XLA 2.3 is worth exploring!
Stay up-to-date, find installation instructions or get support on the official PyTorch/XLA repository on GitHub: https://github.com/pytorch/xla
PyTorch/XLA is also well-integrated into the AI Hypercomputer stack that optimizes AI training, fine-tuning and serving performance end-to-end at every layer of the stack:
Ask your sales representative about how you can apply these capabilities within your own organization.
Read More for the details.