Cloud

2024 04 29

Azure – Infrastructure and quality enhancements for Azure Container Registry

Azure, Cloud Azure

ACR now supports expanded registry capacity of up to 40TiB and optimized geo-replication performance.

Read More for the details.

2024 04 29

Azure – General availability: Topology

Azure, Cloud Azure

Enhancing network and resource health visualization with unified and dynamic topology across subscriptions, regions, and resource groups integrated with actionable connectivity and traffic insights.

Read More for the details.

2024 04 29

Azure – Public Preview: Azure Dedicated Host – Redeploy

Azure, Cloud Azure

Introducing redeploying capability for Azure dedicated hosts

Read More for the details.

2024 04 29

GCP – Transform your telecom applications with multi-networking and Kubernetes

Cloud, Google Cloud gcp

Traditional Kubernetes networking excels at basic Pod-to-Pod connectivity, but can fall short when addressing the security, performance, and compliance demands of telecom workloads. This limits telecom providers’ ability to fully leverage Kubernetes’ scalability and agility benefits.

Google Cloud’s multi-networking approach empowers telecom providers to overcome these limitations, enabling capabilities such as but not limited to:

Strict network isolation: Enforce regulatory compliance and enhance security by isolating management, signaling, and media traffic within Kubernetes.

Maximum performance: For telecom applications that need hardware acceleration (e.g., SR-IOV), achieve the high throughput and low latency you need for demanding 5G Mobile Core, RAN, and data-plane workloads.

There are existing solutions such as Multus, a meta-plugin for multi-homed Pods in Kubernetes, but it has some drawbacks:

Difficult to use: Multus relies on unstructured string-based annotations, making configuration complex and error-prone.

Inflexible: It doesn’t allow for dynamic addition or removal of networks to Pods, creating operational overhead and limiting adaptability.

Limited Kubernetes integration: Multus lacks native support for

Network Policies: Enforcing security across multiple networks is difficult.

Network Services: Load balancing and health checks for applications using secondary interfaces are unsupported.

Poor observability: Multus makes it hard to monitor and troubleshoot multi-network setups.

Google Cloud’s approach natively integrates multi-networking into Kubernetes. This makes it easy to use Kubernetes services, load balancers, Border Gateway Protocol (BGP), and network policies – all vital for building robust telecom networks. We’re also working on this concept with the Kubernetes community as part of a Kubernetes Enhancement Proposal in the Cloud Native Computing Foundation (CNCF).

Partners such as Ericsson are supportive of Google Cloud’s multi-networking approach with a focus on addressing the specific needs of telecom use cases.

“The challenge of ensuring security and operational efficiency in telecom deployments through network separation, while preserving Kubernetes’ standard networking capabilities, is solved by Google’s multi-network function. It’s noteworthy to observe how Google’s multi-network function provides a solution to this challenge without compromising on functionality.” – Ericsson

In this blog, we explore critical telecom use cases enabled by multi-networking. We also delve into two specific implementation examples in Google Distributed Cloud (GDC): isolating signaling traffic between a Mobility Management Entity (MME) and Home Subscriber Server (HSS) in a 4G/5G mobile core deployment as well as for isolating signaling traffic between the Network Exposure Function (NEF) and its Application Function (AF) peer.

Multi-network use case #1

A Cloud Native Network Function (CNF) application requires an additional interface to be provisioned into its Kubernetes pod. Each of these interfaces has to be in an isolated network for regulatory compliance purposes. The isolation has to be done on a Layer-2 network.

Multi-networking use case #2

A CNF application requires a hardware-based interface (e.g., SR-IOV VF) to be provisioned to the workload Pod. The hardware will be leveraged by the user-space application (e.g. DPDK-based) for performance purposes (high bandwidth, low latency).

Application of multi-networking for a 4G/5G networking scenario

Multi-network use case: S6a Traffic

Now, let’s take a look at a specific use of the multi-network framework for networking between an Mobility Management Entity (MME) and Home Subscriber Server (HSS) function of a 4G mobile core network. In a 4G mobile network, the MME and the HSS are key network elements responsible for managing user connectivity and security:

MME:

Handles user attach/detach procedures, registering and deregistering devices on the network

Manages handovers between cell towers as users move around

Routes incoming and outgoing calls and data traffic to/from the user

HSS:

Stores subscriber information like authentication credentials, subscription details, and roaming agreements

Performs authentication and authorization procedures to verify user identity and access control

Provides user location information to the MME for emergency services and other purposes

Together, the MME and HSS provide a secure and smooth user experience while maintaining network efficiency and subscriber information integrity.

This connectivity scenario highlights a specific application of use case #1 above for purposes of connecting the MME and HSS Pods via an isolated signaling network using GDC in connected configuration which provides a fully managed hardware and software product that delivers modern applications equipped with AI, security, and open source at the edge. In this implementation, the MME uses a MACVLAN type of interface and the HSS Pods use a separate multi-network interface to steer the Diameter traffic onto the same signaling network.

Here’s a snippet of the signaling network definition and the modified HSS Pod spec to reference this additional Pod network on the respective Pods:

code_block
<ListValue: [StructValue([(‘code’, ‘apiVersion: networking.gke.io/v1rnkind: Networkrnmetadata:rn name: signaling-networkrn annotations:rn networking.gke.io/gdce-vlan-id: 200rn networking.gke.io/gdce-vlan-mtu: 2000rn networking.gke.io/gdce-lb-service-vip-cidrs: “10.1.1.0/24”rnspec:rn type: L2rn nodeInterfaceMatcher:rn interfaceName: “sig-interface-200″rn gateway4: “192.168.200.1”rn l2NetworkConfig:rn prefixLength4: 24’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1a037a9280>)])]>

code_block
<ListValue: [StructValue([(‘code’, ‘apiVersion: v1rnkind: Podrnmetadata:rn name: myPodrn annotations:rn networking.gke.io/interfaces: [{“interfaceName”:”eth1″,”interface”:”sig-interface-200″}]rn networking.gke.io/default-interface: eth1rn…’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1a037a9550>)])]>

Multinetwork use case for N33 interface on NEF
Now, let’s look at a specific use of the multi-network framework for networking between a Network Exposure Function (NEF) and its Application Function (AF) Peer — two nodes in the 5G Network Architecture:

NEF – This network function provides a means to securely expose the services and capabilities provided by 3GPP network functions to application

AF Peer – This network function plays a key role in traffic management and QoS assignments.

In this scenario, the AF Peer of NEF resides in the customer’s trusted domain and is connected to the Signaling network (VRF). By default, the NEF Pods are connected to the default network (VRF) via their primary interface. In order to keep the traffic onto the same VRF, we enable signaling multi-networking on the NEF pods to connect them to the Signaling network.

For the above two use cases, the 3GPP endpoints of the HSS or NEF respectively, illustrated as Service VIPs (Virtual IPs), are exposed as Service type: LoadBalancer using the bundled BGP load balancer in GDC. This is a BGP-based L3 load balancer. It includes a BGP speaker as control plane and an eBPF based datapath. The BGP speaker automatically advertises Service IPs from Kubernetes Service (type=LoadBalancer) to the configured eBGP peers. The eBPF-based datapath then distributes the incoming Service VIP traffic to the backend pods.

Here’s a snippet of the load balancer spec:

code_block
<ListValue: [StructValue([(‘code’, ‘apiVersion: networking.gke.io/v1rnkind: BGPLoadBalancerrnmetadata:rn labels:rn networking.gke.io/nf-operator-autocreated: “true”rn name: signaling-network-bgplbrn namespace: kube-systemrnspec:rn network: signaling-networkrn peerSelector:rn cluster.baremetal.gke.io/multinetwork-peer: “true”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1a037a90d0>)])]>

And here we can see the config-map spec that allows us to deterministically choose the External IPs to be allocated to the 3GPP Services VIPs.

code_block
<ListValue: [StructValue([(‘code’, ‘apiVersion: v1rndata:rn config: |rn address-pools:rn – name: signalingrn protocol: bgprn addresses:rn – 10.1.1.0/24rnkind: ConfigMaprnmetadata:rn annotations:rn networking.gke.io/network: signaling-networkrn name: signaling-network-vip-poolrn namespace: metallb-systemrn labels:rn nf-operator-created: true’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1a037a9af0>)])]>

Empowering telecom providers with multi-networking

In this blog, we showed how Google Cloud’s multi-networking approach can unlock critical telecom use cases on Kubernetes. To learn more about using Google Distributed Cloud and GKE for telecom workloads, check out the following resources:

Kubernetes Enhancement Proposal for Multi-network, KEP-3698 Multi-network
Google Distributed Cloud overview: Google Distributed Cloud
Using Network Function Optimizer for GKE and GDC: Network Function Optimizer for GKE and GDC Edge
Multinetwork support and configuration in GKE: About the multi-network support for Pods | Google Kubernetes Engine (GKE)
Multinetwork Support and Configuration in GDC: Network Function operator | Distributed Cloud

Read More for the details.

2024 04 29

GCP – Transforming customer feedback: analyzing audio customer reviews with BigQuery ML’s speech-to-text

Cloud, Google Cloud gcp

BigQuery’s integrated speech-to-text functionality offers a powerful tool for unlocking valuable insights hidden within audio data. This service transcribes audio files, such as customer review calls, into text format, making them ready for analysis within BigQuery’s robust data platform. By combining speech-to-text with BigQuery’s analytics capabilities, you can delve into customer sentiment, identify recurring product issues, and gain a better understanding of the voice of your customer.

BigQuery speech-to-text transforms audio data into actionable insights, offering potential benefits across industries and enabling a deeper understanding of customer interactions across multiple channels. You can also use BigQuery ML to leverage Gemini 1.0 Pro to gain additional insights & data formatting such as entity extraction and sentiment analysis to the text extracted from audio files using BigQuery ML’s native speech-to-text capability. Below are some use cases and the business value for specific industries:

Industry

Use Cases

Business Potential

Retail/E-commerce

Analyzing customer call recordings to identify common pain points, product preferences, and overall sentiment

Improved product development by addressing issues mentioned in feedback

Enhanced customer service through personalization and targeted assistance

Enhanced marketing campaigns based on insights discovered in customer calls.

Healthcare

Transcribing patient-doctor interactions to automatically populate medical records, summarize diagnoses, and track treatment progress

More streamlined workflows for healthcare providers, reducing administrative burden

Comprehensive patient records for better decision-making

Potential identification of trends in patient concerns for research and improved care

Finance

Analyzing earnings calls and shareholder meetings to gauge market sentiment, identify potential risks, and extract key insights

Support for more informed investment decisions

Prompt identification of emerging trends or potential issues

Proactive Risk Management strategies

Media & Entertainment

Transcribing podcasts, interviews, and focus groups for content analysis and audience insights

Earlier identification of trending topics and themes for new content creation

Understanding audience preferences for program development or advertising

Accessibility improvements through automated closed-captioning

Using advanced AI features such as BigQuery ML, you still have access to all of the built-in governance features of BigQuery, which give you the ability to have access control passthrough, so you can restrict insights from customer audio files based upon row-level security you have on your BigQuery Object Table.

Ready to turn your audio data into insights? Let’s dive into how you can use speech-to-text in BigQuery:

Imagine you have a collection of customer feedback calls stored as audio files in a Google Cloud Storage bucket. BigQuery’s ML.TRANSCRIBE function, connected to a pre-trained speech-to-text model hosted on Google’s Vertex AI platform, lets you automatically convert these audio files into readable text within BigQuery. Think of it as a specialized translator for audio data. You tell the ML.TRANSCRIBE function where your audio files are located (in your object table) and which speech-to-text model to use. It then handles the transcription process, using the power of machine learning, and delivers the text results directly into BigQuery. This makes it easy to analyze customer conversations alongside other business data.

Let’s walk through the process together in BigQuery.

Setup instructions:

Before starting, choose your Google Cloud project, link a billing account, and enable the necessary API, full instructions here

Create a recognizer, a recognizer stores the configuration for speech recognition and is optional to create

Create a cloud resource connection and get the connection’s service account, full guide here

Grant access to the service account by following the steps here.

Create a dataset that will contain the model and the object table by following the steps here

Download and store the audio files in the Google Cloud Storage

Download 5 audio files from here

Create a bucket in Google Cloud Storage and a folder within the bucket

Upload the downloaded audio files in the folder

Create a model

Create a remote model with a REMOTE_SERVICE_TYPE of CLOUD_AI_SPEECH_TO_TEXT_V2. A model makes the speech to text API available within BigQuery.

Syntax:

code_block
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODELrn`PROJECT_ID.DATASET_ID.MODEL_NAME`rnREMOTE WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`rnOPTIONS (rn REMOTE_SERVICE_TYPE = ‘CLOUD_AI_SPEECH_TO_TEXT_V2’,rn SPEECH_RECOGNIZER = ‘projects/PROJECT_NUMBER/locations/LOCATION/recognizers/RECOGNIZER_ID’rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e19fcb16f10>)])]>

Example query:

code_block
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODELrn`demo_project.speech_to_text_demo_dataset.speech_to_text_bq_model`rnREMOTE WITH CONNECTION `demo_project.us.speech_to_text_demo`rnOPTIONS (rn REMOTE_SERVICE_TYPE = ‘CLOUD_AI_SPEECH_TO_TEXT_V2’rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e19fcb16f70>)])]>

Create an object table to reference the audio files

Syntax:

code_block
<ListValue: [StructValue([(‘code’, “CREATE EXTERNAL TABLE `PROJECT_ID.DATASET_ID.TABLE_NAME`rnWITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`rnOPTIONS(rn object_metadata = ‘SIMPLE’,rn uris = [‘BUCKET_PATH'[,…]],rn max_staleness = STALENESS_INTERVAL,rn metadata_cache_mode = ‘CACHE_MODE’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e19fcb16fd0>)])]>

Sample code:

Please replace ‘BUCKET_PATH’ with your Google Cloud Storage bucket/folder path where audio files are stored

code_block
<ListValue: [StructValue([(‘code’, “CREATE EXTERNAL TABLE `demo_project.speech_to_text_demo_dataset.demo_obj_tb`rnWITH CONNECTION `demo_project.us.speech_to_text_demo`rnOPTIONS(rn object_metadata = ‘SIMPLE’,rn uris = [‘BUCKET_PATH’],rn max_staleness = INTERVAL 1 DAY,rn metadata_cache_mode = ‘AUTOMATIC’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e19fcb24070>)])]>

Transcribe audio files using BigQuery ML

Syntax:

code_block
<ListValue: [StructValue([(‘code’, “SELECT *rnFROM ML.TRANSCRIBE(rn MODEL `PROJECT_ID.DATASET_ID.MODEL_NAME`,rn TABLE `PROJECT_ID.DATASET_ID.OBJECT_TABLE_NAME`,rn RECOGNITION_CONFIG => ( JSON ‘recognition_config’)rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e19fcb240d0>)])]>

Sample query:

code_block
<ListValue: [StructValue([(‘code’, ‘SELECT *rnFROM ML.TRANSCRIBE(rn MODEL `demo_project.speech_to_text_demo_dataset.speech_to_text_bq_model`,rn TABLE `demo_project.speech_to_text_demo_dataset.demo_obj_tb`,rn RECOGNITION_CONFIG =>( JSON ‘{“language_codes”: [“en-US” ],”model”: “telephony”,”auto_decoding_config”: {}}’)rn);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e19fcb24130>)])]>

Result:

The results of ML.TRANSCRIBE include these columns:

transcripts: Contains the text transcription of the processed audio files

ml_transcribe_result: JSON value that contains the result from the Speech-to-Text API

ml_transcribe_status: Contains a string value that indicates the success or failure of the transcription process for each row. It will be empty if the process is successful

The object table columns

The ML.TRANSCRIBE function eliminates the need for manual transcription, saving time and effort. Transcribed text becomes easily searchable and analyzable within BigQuery, enabling you to extract valuable insights from your audio data.

Follow-up Ideas

Take the text extracted from the audio files, and use Gemini 1.0 Pro with BigQuery ML’s ML.generate_text function, to extract entities such as product names, stock prices, or other types of entity data you are looking to extract and structure them in JSON.

Use Gemini 1.0 Pro with BigQuery ML to measure sentiment analysis of the extracted text, and structure positive & negative sentiments in JSON.

Join customer feedback verbatims & sentiment scores with Customer Lifetime Total Value score or other relevant customer data to see how quantitative data & qualitative data relate to each other.

Generate embeddings over the extracted text, and use vector search to search the audio files for specific content.

Curious to learn more? The official Google Cloud documentation on ML.TRANSCRIBE has all the details. Please also check out the blog on Gemini 1.0 Pro support for BigQuery ML to see other GenAI use cases as outlined in the Follow-up ideas.

Read More for the details.

2024 04 29

GCP – Running out of IP addresses for your Kubernetes Pods? Here’s a tried and true solution

Cloud, Google Cloud gcp

One of Kubernetes’ big selling points is that each Pod has its own network address. This makes the Pod behave a bit like a VM, and frees developers from worrying about pesky things like port conflicts. It’s a property of Kubernetes that makes things easier for developers and operators, and has been credited as one of the design features that made it so popular as a container orchestrator. Google Kubernetes Engine (GKE) additionally adopts a flat network structure for all clusters in a VPC, which means that each Pod in each cluster has its own IP in the VPC and can communicate with Pods in other clusters directly (without needing NAT), a useful property which enables advanced features like container-native load balancing.

While this addressing layout has many advantages, the trade-off is that you can consume IPs rather quickly. With every Pod in every cluster being allocated IPs on the same VPC, and allowing space in those ranges for expansion, IPs get used very fast. IPv6 has long been proposed as the industry-wide solution to all these problems, and one day that will no doubt be true, but GKE doesn’t support single-stack IPv6 for Pod addressing, and not everyone is ready to drop IPv4 in any case, so how do you solve this problem with IPv4 ranges today?

In my travels as a product manager on GKE, the best solution I have seen is to use a non-RFC1918 IP range for Pods. While there are alternative approaches to solving this problem, what follows is the specific solution I have seen deployed successfully by multiple customers on GKE. Let’s take a closer look.

Have your 10.0.0.0/8 space utilization and eat it too

Most GKE clusters are created with the nodes in RFC1918 space, specifically 10.0.0.0/8. Did you know that you can still have all the benefits of a flat network structure like container-native addressing while preserving your 10.0.0.0/8 space utilization? The solution is to keep nodes in that CIDR range and to allocate just the Pod ranges (which use by far the most IPs) out of a larger non-RFC 1918 private address ranges like 100.64.0.0/10 or 240.0.0.0/4. Google Cloud VPC has native support for these other ranges, so within Google Cloud and between Pods in different clusters everything “just works.” You can for example connect to services like Cloud SQL, and between Pods in different clusters.

By keeping nodes in RFC 1918 space, IP masquerading can be used to mask the Pod’s address with that of the nodes, so that the rest of your off-VPC endpoints never need see a non-RFC 1918 address. Every endpoint outside of Google Cloud (or however you configure your masquerading rules) will see the 10.0.0.0/8 IP that it expects. Best of both worlds. The 100.64.0.0/10 range is reserved for shared use making it a great candidate to use first, giving you 4 million Pod addresses right off the bat, and with a potential quarter billion IPs for your Pods in 240.0.0.0/4, there’s plenty of room to grow beyond that.

It may not be immediately apparent that 240.0.0.0/4 is an acceptable range to use in your VPC for Kubernetes Pods. After all, on the public internet, this range has been reserved (since 1989 in fact) for future use, and could in theory be assigned one day. Private use of this range in your VPC doesn’t affect the public internet (no routes using this range will be advertised outside your VPC), but is there any downside? In the event this range was allocated one day, what it means is that hosts in your network wouldn’t be able to initiate outbound connections to hosts in that range. There’d be no impact to inbound connections that utilize load balancing (as most do). In other words, should that range ever be allocated, you can still serve customers on those addresses.

The other concern I’ve heard about using 240.0.0.0/4 ranges is that on-prem routers don’t support them, and neither do Windows hosts. There is a really simple solution to both those concerns, as you can easily configure IP masquerading for any destinations that don’t support it, meaning the only IP those services will see is from your 10.0.0.0/8 primary range.

Some Kubernetes platforms outside Google Cloud offer an “island mode” network design where you reuse the same Pod IP ranges in every cluster, and I’ve heard requests for this in GKE as well. The approach documented here is better in my view: you get the advantage of the flat network within the VPC (enabling things like container-native load balancing), while traffic can still be NATed over the Node’s IP when needed. By comparison, an “island mode” design will NAT all traffic that leaves the cluster (including Pod-to-Pod traffic between clusters), limiting what you can do inside the VPC.

Service IP ranges

So that’s Pods, but what about Services? Service ranges are another concern for IP allocation. GKE in Autopilot mode now automatically reuses the same /20 range for every cluster, giving you 4k services without allocating any of your network (service IPs are virtual and have no meaning outside the cluster, so there is no need to give them unique identifiers). On node-based GKE Standard mode, or if you need more than 4k services, you can create your own named subnet of whatever size you need (including out of 240.0.0.0/4 space), and reuse it for every cluster in the region as well (by passing it in the –services-secondary-range-name parameter when creating the cluster).

In summary, a recommended way to reduce IP usage while maintaining all the benefits of a flat network structure is to:

Allocate Node IP ranges from your main ranges (like 10.0.0.0/8)

Allocate Pod IP ranges from non-RFC 1918 space, like 100.64.0.0/10 and 240.0.0.0/4, while utilizing IP Masquerading to NAT with the Node’s IP for on-prem destinations or anywhere that expects a RFC 1918 range.

Use Autopilot mode, which automatically provides 4k IPs for your services, or create a named subnet for services and reuse it for all clusters by passing it during cluster creation

With these steps, I have seen several customers solve their IP constraints, and adopt this strategy as a bridge to one day running a cluster with single-stack IPv6 Pod addressing.

Next steps:

Plan your IP address management

Use non-RFC 1918 IP address ranges in GKE

Learn about the supported IPv4 ranges by Cloud VPC

Learn about IP Masquerading in GKE

Try GKE’s Autopilot mode for a workload-based API that also improves operational efficiency

Read More for the details.

2024 04 29

GCP – From Assistant to Analyst: The Power of Gemini 1.5 Pro for Malware Analysis

Cloud, Google Cloud gcp

The explosive growth of malware continues to challenge traditional, manual analysis methods, underscoring the urgent need for improved automation and innovative approaches. Generative AI models have become invaluable in some aspects of malware analysis, yet their effectiveness in handling large and complex malware samples has been limited. The introduction of Gemini 1.5 Pro, capable of processing up to 1 million tokens, marks a significant breakthrough. This advancement not only empowers AI to function as a powerful assistant in automating the malware analysis workflow but also significantly scales up the automation of code analysis. By substantially increasing the processing capacity, Gemini 1.5 Pro paves the way for a more adaptive and robust approach to cybersecurity, helping analysts manage the asymmetric volume of threats more effectively and efficiently.

Traditional Techniques for Automated Malware Analysis

The foundation of automated malware analysis is built on a combination of static and dynamic analysis techniques, both of which play crucial roles in dissecting and understanding malware behavior. Static analysis involves examining the malware without executing it, providing insights into its code structure and unobfuscated logic. Dynamic analysis, on the other hand, involves observing the execution of the malware in a controlled environment to monitor its behavior, regardless of obfuscation. Together, these techniques are leveraged to gain a comprehensive understanding of malware.

Parallel to these techniques, AI and machine learning (ML) have increasingly been employed to classify and cluster malware based on behavioral patterns, signatures, and anomalies. These methodologies have ranged from supervised learning, where models are trained on labeled datasets, to unsupervised learning for clustering, which identifies patterns without predefined labels to group similar malware.

Despite technological advancements, the increasing complexity and volume of malware present substantial challenges. While ML enhances the detection of malware variants, it remains inadequate against completely new threats. This detection gap allows advanced attacks to slip through cybersecurity defenses, compromising system protection.

Generative AI as Malware Analysis Assistant

Code Insight, unveiled at the RSA Conference 2023, marked a significant step forward in leveraging generative AI (gen AI) for malware analysis. This novel feature of Google’s VirusTotal platform specializes in analyzing code snippets and generating reports in natural language, effectively emulating the approach of a malware analyst. Initially supporting PowerShell scripts, Code Insight later expanded to other scripting languages and file formats, including Batch, Shell, VBScript, and Office documents.

By processing the code and generating summary reports, Code Insight assists analysts in understanding the behavior of the code and identifying attack techniques. This includes uncovering hidden functionalities, malicious intent, and potential attack vectors that might be missed by traditional detection methods.

However, due to the inherent constraints of large language models (LLMs) and their limited token input capacity, the size of files that Code Insight could handle was restricted. Although there have been continuous improvements to increase the maximum file size limit and support more formats, analyzing binaries and executables still poses a significant challenge. When these files are disassembled or decompiled, their code size typically surpasses the processing capabilities of the LLMs available at the time. Consequently, gen AI models have functioned primarily as assistants to human analysts, enabling the analysis of specific code fragments from binaries rather than processing the entire code, which is often too voluminous for these models.

Reverse Engineering: The Human Face of Malware Analysis

Reverse engineering is arguably the most advanced malware analysis technique available to cybersecurity professionals. This process involves disassembling the binaries of malicious software and carrying out a meticulous examination of the code. Through reverse engineering, analysts can uncover the exact functionality of malware and understand its execution flow. However, this method is not without its challenges. It requires an immense amount of time, a deep level of expertise, and an analytical mindset to interpret each instruction, data structure, and function call to reconstruct the malware’s logic and uncover its secrets.

Furthermore, scaling reverse engineering efforts poses a significant challenge. The scarcity of specialized talent in this field exacerbates the difficulty of conducting these analyses at scale. Given the intricate and time-consuming nature of reverse engineering, the cybersecurity community has long sought ways to augment this process, making it more efficient and accessible.

Gemini 1.5 Pro: Scalable Reverse Engineering for Malware Analysis

The ability to process prompts of up to 1 million tokens enables a qualitative leap in malware analysis, particularly in the realm of reverse engineering. This advancement finally brings the power of gen AI to the analysis of binaries and executables, a task previously reserved for highly skilled human analysts due to its complexity.

How does Gemini 1.5 Pro achieve this?

Increased capacity: With its expanded token limit, Gemini 1.5 Pro can entirely analyze some disassembled or decompiled executables in a single pass, eliminating the need to break down code into smaller fragments. This is crucial because fragmenting code can lead to a loss of context and important correlations between different parts of the program. When analyzing only small snippets, it is difficult to understand the overall functionality and behavior of the malware, potentially missing key insights into its purpose and operation. By analyzing the entire code at once, Gemini 1.5 Pro gains a holistic understanding of the malware, allowing for more accurate and comprehensive analysis.

Code interpretation: Gemini 1.5 Pro can interpret the intent and purpose of the code, not just identify patterns or similarities. This is possible due to its training on a massive dataset of code, encompassing assembly language from various architectures, high-level languages like C, and pseudo-code produced by decompilers. This extensive knowledge base, combined with its understanding of operating systems, networking, and cybersecurity principles, allows Gemini 1.5 Pro to effectively emulate the reasoning and judgment of a malware analyst. As a result, it can predict the malware’s actions and provide valuable insights even for never-seen-before threats. For more information on this, see the zero day case study section later in this post.

Detailed analysis: Gemini 1.5 Pro can generate summary reports in human-readable language, making the analysis process more accessible and efficient. This goes far beyond the simple verdicts typically provided by traditional machine learning algorithms for classification and clustering. Gemini 1.5 Pro’s reports can include detailed information about the malware’s functionality, behavior, and potential attack vectors, as well as indicators of compromise (IOCs) that can be used to feed other security systems and improve threat detection and prevention capabilities.

Let’s explore a practical case study to examine how Gemini 1.5 Pro performs in analyzing decompiled code with a representative malware sample. We processed two WannaCry binaries automatically using the Hex-Rays decompiler, without adding any annotations or additional context. This approach resulted in two C code files, one 268 KB and the other 231 KB in size, which together amount to more than 280,000 tokens for processing by the LLM.

In our testing with other similar gen AI tools, we faced the necessity of dividing the code into chunks. This fragmentation often compromised the comprehensiveness of the analysis, resulting in vague and non-specific outcomes. These limitations highlight the challenges of using such tools with complex code bases.

Gemini 1.5 Pro, however, marks a significant departure from these constraints. It processes the entire decompiled code in a single pass, taking just 34 seconds to deliver its analysis. The initial summary provided by Gemini 1.5 Pro is notably accurate, showcasing its ability to handle large and complex datasets seamlessly and effectively:

Issues a malicious verdict associated with ransomware

Identifies some files as IOCs (c.wnry and tasksche.exe)

Acknowledges the use of an algorithm to generate IP addresses and perform network scans to find targets on port 445/SMB to spread to other computers

Identifies URL/domain (WannaCry’s “killswitch”) and relevant registry key and mutex

While it might seem that Gemini 1.5 Pro’s report of WannaCry is based on pre-trained knowledge of this specific malware, this isn’t the case. The analysis comes from the model’s ability to independently interpret the code. This will become even clearer as we look at the upcoming examples where Gemini 1.5 Pro analyzes unfamiliar malware samples, demonstrating its wide-ranging capabilities.

LLM on Code: Disassembled vs. Decompiled

In the previous example showcasing WannaCry analysis, there was a crucial step before feeding the code to the LLM: decompilation. This process, which transforms binary code into a higher-level representation like C, is fully automated and mirrors the initial steps taken by malware analysts when manually dissecting malicious software. But what is the difference between disassembled and decompiled code, and how does it impact LLM analysis?

Disassembly: This process converts binary code into assembly language, a low-level representation specific to the processor architecture. While human-readable, assembly code is still quite complex and requires significant expertise to understand. It is also much longer and more repetitive than the original source code.

Decompilation: This process attempts to reconstruct the original source code from the binary. While not always perfect, decompilation can significantly improve readability and conciseness compared to disassembled code. It achieves this by identifying high-level constructs like functions, loops, and variables, making the code easier to understand for analysts.

Given these factors, when using LLMs for binary analysis, decompilation offers several advantages on efficiency and scalability. The shorter and more structured output from decompilation fits more readily within the processing constraints of LLMs, allowing for a more efficient analysis of large or complex binaries. In fact, the output from a decompiler is five to 10 times more concise than that produced by a disassembler.

Disassembly is necessary to perform accurate decompilation and remains an invaluable tool in certain scenarios where detailed, low-level analysis is crucial. Given the structured and higher-level nature of decompiled output, there are specific circumstances where disassembly provides insights that decompilation cannot match.

Fortunately, Gemini 1.5 Pro demonstrates equal capability in processing both high-level languages and assembly across various architectures. Thus, our implementation for automating binary analysis can utilize both strategies or adopt a hybrid approach, as suited to the specific circumstances of each case. This flexibility allows us to tailor our analysis method to the nature of the binary in question, optimizing for efficiency, depth of insight, and the specific objectives of the analysis, whether that means dissecting the logic and flow of the program or diving into the intricate details of its low-level operations.

Next, we’ll examine a case where we directly employ disassembly for analysis. This time, we’re working with a more recent and unknown binary; in fact, the executable submitted to VirusTotal is flagged as malicious by only four out of the 70 VirusTotal anti-malware engines, and only in a generic sense, without providing any details about the malware family that could offer further clues about its behavior.

After automatic preprocessing with HexRays/IDA Pro, the 306.50 KB executable binary produces a 1.5 MB assembly file that Gemini 1.5 Pro can process in a single pass within 46 seconds , thanks to its large token window in the prompt. This capability allows for an analysis of the entire assembly output, offering detailed insights into the binary’s operations.

This case of the unknown binary showcases the remarkable capabilities of Gemini 1.5 Pro. Despite only four out of 70 anti-malware engines on VirusTotal flagging the file as malicious—using only generic signatures—Gemini 1.5 Pro identified the file as malicious, providing a detailed explanation for its verdict. The file is likely a game cheat designed to inject a game hack dynamic-link library (DLL) into the Grand Theft Auto video game process. The designation of “malicious” may depend on perspective: deemed malicious by the game’s developers or their security team focused on anti-cheating measures, yet potentially desirable for some players. Nevertheless, this automated first-pass analysis is not only impressive but also illuminating regarding the nature and intent of the binary.

Unveiling the Unknown: A Case Study in Zero-Day Detection

The true test of any malware analysis tool lies in its ability to identify never-before-seen threats undetected by traditional methods and proactively protecting systems from zero-day attacks. Here, we examine a case where an executable file is undetected by any anti-virus or sandbox on VirusTotal.

The 833 KB file, medui.exe, was decompiled into 189,080 tokens and subsequently processed by Gemini 1.5 Pro in a mere 27 seconds to produce a complete malware analysis report in a single pass.

This analysis revealed suspicious functionalities, leading Gemini 1.5 Pro to issue a malicious verdict. Based on its observations, it concluded that the primary goal of this malware is to steal cryptocurrency by hijacking Bitcoin transactions and evading detection through the disabling of security software.

This showcases Gemini’s ability to go beyond simple pattern matching or ML classification and leverage its deep understanding of code behavior to identify malicious intent, even in previously unseen threats. This is a significant advancement in the field of malware analysis, as it allows us to proactively detect and respond to new and emerging threats that traditional methods might miss.

From Assistant to Analyst

Gemini 1.5 Pro unlocks impressive capabilities, enabling the analysis of large volumes of decompiled and disassembled code. It has the potential to significantly change our approach to fighting malware by enhancing efficiency, accuracy, and our ability to scale in response to a growing number of threats.

However, it’s important to remember that this is just the beginning. While Gemini 1.5 Pro represents a significant leap forward, the field of gen AI is still in its infancy. There are several challenges that need to be addressed to achieve truly robust and reliable automated malware analysis:

Obfuscation and packing: Malware authors are constantly developing new techniques to obfuscate their code and evade detection. In response, there’s a growing need to not only continuously improve gen AI models but also to enhance the preprocessing of binaries before analysis. Adopting dynamic approaches that utilize various preprocessing tools can more effectively unpack and deobfuscate malware. This preparatory step is crucial for enabling gen AI models to accurately analyze the underlying code, ensuring they keep pace with evolving obfuscation techniques and remain effective in detecting and understanding sophisticated malware threats.

Increasing binary size: The complexity of modern software is mirrored in the growing size of its binaries. This trend presents a significant challenge, as the majority of gen AI models are constrained by much lower token window limits. In contrast, Gemini 1.5 Pro stands out by supporting up to 1 million tokens—currently the highest known capacity in the field. Nevertheless, even with this remarkable capability, Gemini 1.5 Pro may encounter limitations when handling exceptionally large binaries. This underscores the ongoing need for advancements in AI technology to accommodate the analysis of increasingly large files, ensuring comprehensive and effective malware analysis as software complexity continues to escalate.

Evolving attack techniques: As attackers continuously innovate, crafting new methods to bypass security measures, the challenge for gen AI models extends beyond simple adaptability. These models must not only learn and recognize new threats but also evolve in conjunction with the efforts of researchers and developers. There’s a need to devise new methods for automating the preprocessing of threat data, which would enrich the context provided to AI models. For instance, integrating additional data from static and dynamic analysis tools, such as sandbox reports, plus the decompiled and disassembled code, can significantly enhance the models’ understanding and detection capabilities.

The journey towards scaling automated malware analysis is ongoing, but Gemini 1.5 Pro marks a significant milestone. At GSEC Malaga, we continue to research and develop ways to apply these models effectively in AI, pushing the boundaries of what’s possible in cybersecurity and contributing to a safer digital future.

Malware Details

The following table contains details on the malware samples discussed in this post.

Filename

SHA-256 Hash

Size

First Seen

File Type

lhdfrgui.exe (WannaCry dropper)

24d004a104d4d54034dbcffc2a4b19a
11f39008a575aa614ea04703480b1022c

3.55 MB (3723264 bytes)

2017-05-12

Win32 EXE

tasksche.exe (WannaCry cryptor)

ed01ebfbc9eb5bbea545af4d01bf5f10
71661840480439c6e5babe8e080e41aa

3.35 MB (3514368 bytes)

2017-05-12

Win32 EXE

EXEC.exe

1917ec456c371778a32bdd74e113b0
7f33208740327c3cfef268898cbe4efbfe

306.50 KB (313856 bytes)

2022-04-18

Win32 EXE

medui.exe

719b44d93ab39b4fe6113825349add
fe5bd411b4d25081916561f9c403599e50

833.50 KB (853504 bytes)

2024-03-27

Win32 EXE

Prompt

The following is the exact prompt used in all the examples covered in the post. The only exception is the example where the word “disassembled” is used instead of “decompiled” because, as explained, we’re working with disassembled code rather than decompiled code to show that Gemini 1.5 Pro can interpret both.

While the potential of generative AI is widely recognised, challenges to its widespread adoption still persist. On the one hand, many of these stem from the sheer size of the businesses involved, with legacy architecture, siloed data, and the need for skills training presenting obstacles to more widespread and effective usage of generative AI solutions. On the other hand, many of these risk-averse enterprise-scale organizations want to be sure that the benefits of generative AI outweigh any perceived risks. In particular, businesses seek reassurance around the security of customer data and the need to conform to regulation, as well as around some of the challenges that can arise when building generative AI models, such as hallucinations (more on that below).

As part of our long-standing commitment to the responsible development of AI, Google Cloud put our AI Principles into practice. Through guidance, documentation, and practical tools, we are supporting customers to help ensure that businesses are able to roll out their solutions in a safe, secure, and responsible way. By tackling challenges and concerns head on, we are working to empower organizations to leverage generative AI safely and effectively.

One such challenge is “hallucinations,” which are when a generative AI model outputs incorrect or invented information in response to a prompt. For enterprises, it’s key to build robust safety layers before deploying generative AI powered applications. Models, and the ways that generative AI apps leverage them, will continue to get better, and many methods for reducing hallucinations are available to organizations.

Last year, we introduced grounding capabilities for Vertex AI, enabling large language models to incorporate specific data sources for model response generation. By providing models with access to specific data sources, grounding tethers their output to specific data and reduces the chances of inventing content. Consequently, it reduces model hallucinations, anchors the model response to specific data sources and enhances the trustworthiness of generated content. Grounding lets the model access information that goes beyond its training data. By linking to designated data stores within Vertex AI Search, the grounded model can produce relevant responses.

As AI-generated images become increasingly popular, we offer digital watermarking and verification on Vertex AI, making us the first cloud provider to enable enterprises with a robust, usable and scalable approach to create AI-generated images responsibly, and identify them with confidence. Digital watermarking on Vertex AI provides two capabilities: Watermarking, which produces a watermark designed to be invisible to the human eye, and does not damage or reduce the image quality, and Verification, which determines whether an image is generated by Imagen vis a vis a confidence interval. This technology is powered by Google DeepMind SynthID, a state-of-the art technology that embeds the watermark directly into the image of pixels, making it imperceptible to the human eye, and very difficult to tamper with without damaging the image.

Removing harmful content for more positive user experiences

Given the versatility of Large Language Models, predicting unintended or unexpected output is challenging. To address this, our generative AI APIs have safety attribute scoring, enabling customers to test Google’s safety filters and set confidence thresholds suitable for their specific use case and business. These safety attributes include “harmful categories” and topics that can be considered sensitive, each assigned a confidence score between 0 to 1. This score reflects the likelihood of the input or response belonging to a given category. Implementing this measure is a step forward to a positive user experience, ensuring outputs align more closely with the desired safety standards.

Embedding responsible AI governance throughout our processes

As we work to develop generative AI responsibly, we keep a close eye on emerging regulatory frameworks. Google’s AI/ML Privacy Commitment outlines our belief that customers should have a higher level of security and control over their data on the cloud. That commitment extends to Google Cloud generative AI solutions: by default Google Cloud doesn’t use customer data (including prompts, responses and adapter model training data) to train its foundation models. We also offer third-party intellectual property indemnity as standard for all customers.

By integrating responsible AI principles and toolkits into all aspects of AI development, we are witnessing a growing confidence among organizations in using Google Cloud generative AI models and the platform. This approach enables them to enhance customer experience, and overall, foster a productive business environment in a secure, safe and responsible manner. As we progress on a shared generative AI journey, we are committed to empowering customers with tools and protection they need to use our services safely, securely and with confidence.

“Google Cloud generative AI is optimizing the flow from ideation to dissemination,” says Daniel Hulme, Chief AI Officer at WPP. “And as we start to scale these technologies, what is really important over the coming years is how we use them in a safe, responsible and ethical way.”

Read More for the details.

2024 04 26

GCP – Creating a common language for learners, educators, and employers with AI

Cloud, Google Cloud gcp

A lack of skills holds back tens of millions of people from finding jobs, growing in their careers, and adapting to today’s business opportunities. For example, an estimated 920 million people globally have an education that does not match their job1, while 60% of workers will require new training before 2027 but only some have access to adequate training opportunities2.

Expanding access to continuing education is a great way to level the playing field for everyone and give people a clearer understanding of the skills needed for a given job – and how to build those skills.

Jobspeaker, a Google Cloud EdTech partner, believes that bringing together educators, learners, and employers can significantly reduce the strain on people and businesses caused by the economic cycle and exponentially increasing technology effects on the job market.

“People need different things at different stages in their careers,” says Jarlath O’Carroll, Founder and Chief Executive Officer of Jobspeaker. “In the past decade, we’ve seen more people looking to re-skill or upskill in response to the quickly evolving economy. We focus on making re-skilling and upskilling as effective and efficient as possible.”

Jobspeaker chose to use Google Cloud and become a Google EdTech partner in building their exploration, learning, and work platform that improves skills matching for learners – including students, job seekers, and professionals – as well as educators and employers.

Mapping skills through a new common language

Since its inception, Jobspeaker has worked to create a complete suite of tools for career planning that focuses on clarifying what skills are required for desired jobs or careers and provides a path to gain those skills.

“We chose to focus on the language of skills because there was such a gap in understanding by both employers and job seekers,” says Richard Varn, Chief Information Officer and board member at Jobspeaker. “Establishing reliable skills descriptions and communications among learners, educators, and employers will lead to better outcomes for everyone.”

To accomplish its goals, Jobspeaker needed IT solutions that would enable it to extract specific information regarding the skills students acquire throughout their academic journeys, as well as those that employers seek. Google Cloud proved to be the best option because it provides the tools to extract vast amounts of information at scale.

“The task we had for AI was to pull out details about skills, competencies, activities, knowledge, and abilities in business and academia from highly unstructured data,” says Varn. “Given the scale and complexity of the data, we needed highly automated processes powered by a configurable AI infrastructure to support our machine learning.”

Jobspeaker chose to work with Vertex AI for its curriculum-to-skills mapping. After achieving initial success with classification work, Jobspeaker saw opportunities to use new generative AI capabilities in Vertex AI. These tools are now applied to extracting data that identifies and aggregates skills developed in education and maps them to job descriptions.

Jobspeaker is working to map every type of learning exercise, from a 15-minute educational YouTube video to a full four-year degree program, as professionals and students continue to learn from a wider array of sources. So far, Jobspeaker has successfully processed over 6,300 programs and 25,000 courses across higher and continuing education.

Speed has also improved by using Vertex AI. Jobspeaker’s processing took three to four weeks when it used a more manual process and on-prem IT. That was reduced to one to two weeks after moving to Google Cloud and now sits at under two days as it fine tunes more models. Jobspeaker expects to see even more complex skill mapping take as little as two or three hours in the near future.

Jobspeaker expects to see even more complex skill mapping take significantly less time as its processes evolve in the near future.

Screenshot of Jobspeaker user interface image

Aligning with the right cloud provider

Jobspeaker also chose to work with Google Cloud because of its scalable infrastructure and expertise in search technologies. The company hopes to do for education and employment what Google does for so many industries.

“Our ultimate goal is to discover and use any kind of information that connects education to careers, understand the information in detail, and articulate the insights to our users,” says O’Carroll.

There’s a strong alignment between our principles and those of Google Cloud. We want to make valuable information available to learners, employers, educators, and others around the world

Jarlath O’Carroll

Founder and Chief Executive Officer of Jobspeaker

Jobspeaker now has the underlying infrastructure to scale up its skills mapping processes. Compute Engine powers Jobspeaker’s applications and infrastructure, providing an on-demand, efficient foundation for the company’s IT architecture.

In addition, Jobspeaker believes Google Cloud’s commitment to improving education for all is another strong area of alignment.

“While our initial interest in Google Cloud was based on the technology it offers, we’ve learned a lot more about the impressive things Google has done in education,” says O’Carroll. “We hope to create and deploy a Google Chromebook plug-in version of our service to increase its availability to more learners.”

All eyes on AI

Jobspeaker believes its decision to run on Google Cloud puts it in a strong position to experiment with AI as the technology evolves. The company is planning to use Gemini models, which is Google’s most capable and general model design to be multimodal, as it scales out its skills mapping processes to reach a wider variety of learners, educators, and employers.

“We are committed to helping educators, employers and learners navigate constantly changing economic landscapes,” says O’Carroll. “Recessions, technology advances, and the pandemic have all disrupted careers. We believe our platform gives people the chance to course correct their careers at any point. Google Cloud, through promising AI technologies like Gemini, will help us achieve our goals.”

For more information on how Google Cloud is helping EdTech companies succeed, read more EdTech success stories on the Public Sector blog.

International Labor Organization: ILOSTAT, Feb 2023
2. World Economic Forum, Future of Jobs Report, 2023

Read More for the details.

2024 04 26

GCP – Ford achieves reductions in routine database management tasks using Google Cloud database services

Cloud, Google Cloud gcp

Editor’s Note: The Ford Motor Company, one of the most recognizable auto brands in the world, recently updated its database strategy to modernize its workloads and focus on managed database services from Google Cloud. Ford has seen a large drop in the time spent on database-related operational tasks by managing databases in Google Cloud.

Since 1903, Ford has been a household name when it comes to automotive innovation. From the first moving assembly line to the latest driver-assist technology, we strive to stay at the forefront of the industry so that every person is free to move and pursue their dreams. That overarching vision also extends to our internal IT teams, which are always looking for ways to modernize and improve our technology stack.

At the database level, our goal is to enable always-on products with minimum downtime. By migrating to fully managed Google Cloud databases like Cloud SQL, we significantly reduced our management overhead. We’ve already seen a large drop in database-related operational tasks.

The need for a shift in focus

Our database fleet was spread across cloud and on-premises environments, leveraging various technologies. Provisioning and managing resources in those systems posed a big challenge. Every tech refresh was time consuming, especially upgrading to major versions and applying security patches to each database instance. In addition, scalability was a big — and unpredictable — issue. We had to forecast the amount of resources we would need to keep up and increase our database capacity. And beyond that, we were also managing backups of on-premises resources. All of this work required a global team of database administrators who were busy supporting Day 2 activities just for our on-premises fleet.

Driven by cloud database solutions

Our database strategy needed to be cost-effective, boost resiliency, increase cloud adoption and collaboration, and modernize applications across the business. As we looked for cloud-based alternatives, the Google Cloud portfolio of database services offered a variety of solutions. We saw they would not only quickly address our current requirements, but also help us build for the future.

Google Cloud’s database offerings provide versatility with open-source options — spanning relational, document-based, analytics, and hybrid workloads. Global accessibility and multi-regional distributed databases offered seamless data flow and the elastic nature provided the scalability, resilience, and minimal downtime essential to our always-on product vision.

The way Google envisions the integration of data and artificial intelligence (AI) supports Ford’s modernization plans.

Accelerating our journey to Google Cloud databases

Today, we’ve migrated databases to Google Cloud across Cloud SQL, Spanner, Bigtable, Firestore, AlloyDB, Memorystore, and MongoDB Atlas. This process was facilitated by both external and native tools for both homogeneous and heterogeneous migrations, and was a smooth, efficient process. We’ve put together a standard set of tools within Google Cloud that we call our Opinionated Stack. This tailored framework helps with the migration of additional applications, smooths the transition to databases, and provides the ability to leverage Cloud Run and Cloud SQL. All of this helps us meet our goal of a cloud-first architecture.

With the help of Cloud SQL, we have met our database processing requirements, including data protection needs. We’ve seen a reduction in time spent on database-related operational tasks with zero backup failures. The agility afforded by serverless products has enhanced our operational efficiency and saved time in managing the lifecycle of our databases.

Google Cloud databases has led to a significant performance boost, with some products showing a 30% improvement in performance.

Gaining an edge for the future

Our goal is not just about modernization for the sake of staying current — it’s about redefining what efficiency, collaboration, and innovation mean at Ford.

We’re fostering an environment of continuous learning and adaptation, crucial for keeping pace with the ever-evolving tech frontier and maintaining our leadership in the automotive industry.

As we look to the future, we aim to fully harness the potential of managed services, generative AI, and cloud database technology to drive efficiency, resilience, and innovation. This helps Ford continue to set industry benchmarks and deliver on our promise of freedom of movement in an increasingly connected world.

Get started:

Dive into the Google Cloud databases portfolio.

Discover the benefits of Cloud SQL and get started with a free trial today!

Read More for the details.

2024 04 26

GCP – The power of choice: Simplifying your regulatory and compliance journey

Cloud, Google Cloud gcp

At Google Cloud, we understand you have a diverse set of regulatory, compliance, and sovereignty needs. We strive to provide you with the controls you need and the flexibility to meet your requirements. We offer a range of customizable control packages, so you can choose the level of control that best aligns with your risk tolerance and compliance needs. This flexibility allows you to tailor your approach with minimal tradeoffs. Additionally, we work closely with local partners in select countries to offer Sovereign Controls by Partners to address regional requirements.

At Google Cloud Next, we announced several significant enhancements to further expand your power of choice. These include new Regional Controls and Sovereign Controls by Partners packages, new controls and audit enhancements, and a simplified compliance configuration and management experience for new workloads. These enhancements give you even more options to meet your requirements, at lower cost, and with increased ease of use.

Launch of Regional Controls

Regional Controls, now in preview, expands Assured Workloads control package availability to 32 regions across 14 countries. Regional Controls includes foundational controls such as data residency (at-rest and during processing) and administrative Access Transparency, at no additional cost. With these updates, controls provided through Assured Workloads are now more accessible than ever to a wider range of Google Cloud customers.

Sovereign Controls by Partners

We are also expanding our Sovereign Controls by Partners offering with the preview of Sovereign Controls by PSN in Italy and Sovereign Controls by SIA/Minsait in Spain. These local partners, as with T-Systems in Germany and S3NS in France, can provide additional layers of control including local support personnel, managed External Key Management (EKM) with Key Access Justifications (KAJ), and additional oversight options. EKM with KAJ provides strong control over your data. Since keys are stored outside of Google’s infrastructure, you or your local partner have the power to directly approve or deny any access requests.

You can read more about how partnerships like these have met the specific demands of our European customers and have helped to propel their businesses forward.

Expanding compliance controls and audit capabilities

We also continue to expand the compliance controls and audit capabilities available to Google Cloud customers.

We are thrilled to announce that we now offer data residency core processing commitments to customers using Assured Workloads. This is a major milestone towards additional data residency guarantees, which makes it possible for enterprise and public sector customers to deploy regulated workloads and help keep their data within the country while it is processed by the service.

To help customers simplify their compliance audit process, our new Audit Manager can help automate control verification with proof of compliance for your workloads and data on Google Cloud. The compliance assessments and proof can help reduce the time and effort required in costly audit processes. Additionally, the available responsibility matrices clarify the shared responsibility between you and Google Cloud, and help you set the right configurations.

Organizations that need to process sensitive data in the cloud with strong guarantees around confidentiality can continue to use our Confidential Computing portfolio. We offer support for Confidential VMs, Containers, and your entire data processing pipeline, as well as ubiquitous data encryption, which can provide additional security and peace of mind about the encryption and protection of your data.

Simplifying the onboarding experience

We’ve worked to make it easier to configure workload controls by default and migrate workloads that were not initially set up in Assured Workloads to a controlled environment. A new onboarding flow is now directly integrated into the Cloud Resource Manager (CRM). When setting up a Google Cloud folder from the CRM, simply choose ‘Assured Workloads Folder’ to automatically apply a chosen set of Regional, Sovereign, or Compliance controls to resources in that folder.

The new “Learn More” panel provides contextual information to help understand Assured Workloads capabilities during the folder creation process, and it can help you make an informed decision as to the right control package for your specific needs. We’ve also streamlined and simplified the setup flow to help you save time.

Getting started

You can take advantage of our free trial program to check out our premium compliance offerings at no additional cost for a limited time.

If you’re looking to migrate existing Google Cloud workloads into an Assured Workloads controlled environment, we have an Analyze Move API that can assist you by pointing out any incompatibilities in moving your current projects into your chosen Assured Workloads program.

And if you’re not sure where to start your sovereignty journey, you can use our free interactive Digital Sovereignty Explorer to get personalized recommendations on potential cloud controls and other Google Sovereign Cloud solutions based on your unique requirements.

Read More for the details.

2024 04 26

GCP – Introducing new ML model monitoring capabilities in BigQuery

Cloud, Google Cloud gcp

Monitoring machine learning (ML) models in production is now as simple as using a function in BigQuery! Today we’re introducing a new set of functions that enable model monitoring directly within BigQuery. Now, you can describe data throughout the model workflow by profiling training or inference data, monitor skew between training and serving data, and monitor drift in serving data over time using SQL — for BigQuery ML models as well as any model whose feature training and serving data is available through BigQuery. With these new functions, you can ensure your production models continue to deliver value while simplifying their monitoring.

In this blog, we present two companion notebooks to help you get hands-on with these features today!

Companion Introduction – a fast introduction to all the new functions

Companion Tutorial – an in-depth tutorial covering many usage patterns for the new functions, including using Vertex AI Endpoints, monitoring feature attributions, and an overview of how monitoring metrics are calculated.

The foundation of a model: the data

A model is only as good as the data it learns from. Understanding the data deeply is essential for effective feature engineering, model selection, and ensuring quality through MLOps. BigQuery’s table-valued function ML.DESCRIBE_DATA provides a powerful tool for this, allowing you to summarize and describe an entire table with a single query.

Example: Identifying data issues

In the accompanying introduction notebook, we profile the training data ( penguin classification dataset) using the ML.DESCRIBE DATA function and quickly identify a data issue.

code_block
<ListValue: [StructValue([(‘code’, ‘SELECT *rnFROM ML.DESCRIBE_DATA(rn TABLE `bigquery-public-data.ml_datasets.penguins`,rn STRUCT(3 AS top_k, 4 AS num_quantiles)rn);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeac348f7c0>)])]>

Here’s the resulting output table:

Notice that the min value for the sex column is a ‘.’. Ideally, we’d see the values MALE, FEMALE or null as indicated in the top_values.values column. This means that in addition to the 10 null values (indicated by the num_null column) there are also some null values indicated by a string with value ‘.’. This should be corrected before using it as training data.

The ML.DESCRIBE_DATA function is extra helpful because it summarizes each data type all in one table. There are also optional parameters that can be specified to control the number of quantiles for different numerical column types and the number of top values to return for categorical columns. The input data can be specified as a table or a query statement, allowing you to describe specific subsets of data (e.g., serving timeframes, or groups within your training data). The function’s flexibility extends beyond ML tasks: it even allows you to describe data stored outside of BigQuery, facilitating quick analysis for both model-building and broader data exploration purposes.

Detect skew at a glance

A trained model will perform only when the serving data is similar in distribution to the training data. Model monitoring helps ensure this by comparing training and serving data for shifts known as skew. BigQuery’s ML.VALIDATE_DATA_SKEW table valued function streamlines this process, allowing you to directly compare serving data to any BigQuery ML model’s training data.

Let’s see it in action:

code_block
<ListValue: [StructValue([(‘code’, ‘SELECT *rnFROM ML.VALIDATE_DATA_SKEW(rn MODEL `bqml_model_monitoring.classify_species_logistic`,rn (rn SELECT *rn FROM `bqml_model_monitoring.serving`rn WHERE DATE(instance_timestamp) = CURRENT_DATE()rn )rn);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeac348fdf0>)])]>

This query directly compares the data in the serving table to the BigQuery ML model classify_species_logistic. The accompanying introduction notebook has the full code in an interactive example. In that notebook the serving data is simulated to create change in two of the features: body_mass_g and flipper_length_mm. The results of the ML.VALIDATE_SKEW function show anomalies detect for each of these:

The detection of skew is as easy as comparing a model in BigQuery to a table of serving data. During training, BigQuery ML models automatically compute and store relevant statistics. This eliminates the need for reusing the entire training dataset, making skew monitoring simple and cost-efficient. Importantly, the function intelligently focuses on features present in the model, further enhancing efficiency and workflow. With optional parameters, you can customize anomaly detection thresholds, metric types for categorical features, and even set different thresholds for specific features. Later, we’ll demonstrate how easily you can monitor skew for any model!

Proactive monitoring for drift

Beyond comparing serving data to training data, it’s also important to keep an eye on changes within serving data over time. Comparing recent serving data to previous serving data is another type of model monitoring known as drift detection. This uses the same detection techniques of metrics that compare distributions between a baseline and comparison dataset and flag anomalies that exceed set threshold. With the table valued function ML.VALIDATE_DATA_DRIFT, you can compare any two tables, or query statements results, directly for detection.

Drift detection in action:

code_block
<ListValue: [StructValue([(‘code’, ‘SELECT *rnFROM ML.VALIDATE_DATA_DRIFT(rn (rn SELECT * EXCEPT(species, instance_timestamp)rn FROM `statmike-mlops-349915.bqml_model_monitoring.serving`rn WHERE DATE(instance_timestamp) = CURRENT_DATE()rn ),rn (rn SELECT * EXCEPT(species, instance_timestamp)rn FROM `statmike-mlops-349915.bqml_model_monitoring.serving`rn WHERE DATE(instance_timestamp) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)rn ),rn STRUCT(rn 0.03 AS categorical_default_threshold,rn 0.03 AS numerical_default_thresholdrn )rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeac348f580>)])]>

Here, the same serving table is used as the baseline and comparison table but with different WHERE statements to filter the rows and compare today to yesterday as an example. The results below show that while the detection values did not surpass the threshold, they are approaching the threshold between two consecutive days for the features that have simulated change.

Just like with skew detection, you can also adjust the default detection threshold for anomaly detection as well as the metric type used for categorical features, and specify different thresholds for different columns and feature types. There are additional parameters to control the binning of numerical features for the metrics calculations.

Take TFDV monitoring to the next level

If you’re already familiar with the TensorFlow Data Validation (TFDV) library, you’ll appreciate how these new BigQuery functions enhance your model monitoring toolkit. They bring the power of TFDV directly into your BigQuery workflows, allowing you to generate rich statistics, detect anomalies, and leverage TFDV’s powerful visualization tools — all with SQL. And the best part is it uses BigQuery’s scalable, serverless compute. Leverage BigQuery’s scalable, serverless compute for near-instant analysis, empowering you to take rapid action on model monitoring insights!

Let’s explore how it works:

Generate statistics with ML.TFDV_DESCRIBE

You can generate in-depth statistics summaries with table valued function ML.TFDV_DESCRIBE for any table, or query, in the same format as the TensorFlow tfdv.generate_statistics_from_csv() API:

code_block
<ListValue: [StructValue([(‘code’, ‘SELECT *rnFROM ML.TFDV_DESCRIBE(rn (rn SELECT * EXCEPT(species)rn FROM `bqml_model_monitoring.training`rn )rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeac348fac0>)])]>

The ML.TFDV_DESCRIBE function outputs statistics in a structured data format (a ‘proto’) that is directly compatible with TFDV: tfmd.proto.statistics_pb2.DatasetFeatureStatisticsList.

Using a bit of Python code in a BigQuery notebook, we can import the TFDV package as well as TensorFlow Metadata package and then make a call to the tfdv.visualize_statistics method while converting the data to the expected format. The ML.TFDV_DESCRIBE results were loaded to Python for the training data as train_describe and for the current day’s serving data as today_describe. See the accompanying tutorial for complete details.

code_block
<ListValue: [StructValue([(‘code’, “import tensorflow_data_validation as tfdvrnimport tensorflow_metadata as tfmdrnfrom google.protobuf import json_formatrnrntfdv.visualize_statistics(rn lhs_statistics = json_format.ParseDict(train_describe, tfmd.proto.statistics_pb2.DatasetFeatureStatisticsList()),rn rhs_statistics = json_format.ParseDict(today_describe, tfmd.proto.statistics_pb2.DatasetFeatureStatisticsList()),rn lhs_name = ‘Training Data Stats’,rn rhs_name = ‘Serving Data Stats – For Today’rn)”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeac348ff70>)])]>

This generates the amazing visualizations shown below that directly highlight shifts in the two parameters that we purposefully shifted in the serving data for this example: body_mass_g and flipper_length_mm

This streamlined workflow brings the power and precision of TensorFlow Data Validation directly to BigQuery and enables you to quickly visualize how sets of data differ. This provides deeper insight to model health monitoring and informs how to proceed with model training iterations.

Detect anomalies With ML.TFDV_VALIDATE

You can also precisely detect skew or drift anomalies with the scalar function ML.TFDV_VALIDATE, which compares tables, or queries, pinpointing potential model-breaking shifts.

Example:

code_block
<ListValue: [StructValue([(‘code’, “WITHrn TRAIN AS (rn SELECT * EXCEPT(species)rn FROM `bqml_model_monitoring.training`rn ),rn SERVE AS (rn SELECT * EXCEPT(species, instance_timestamp)rn FROM `bqml_model_monitoring.serving`rn WHERE DATE(instance_timestamp) = CURRENT_DATE()rn )rnSELECT ML.TFDV_VALIDATE(rn (SELECT * FROM ML.TFDV_DESCRIBE(TABLE TRAIN)),rn (SELECT * FROM ML.TFDV_DESCRIBE(TABLE SERVE)),rn ‘SKEW’, 0.03,’L_INFTY’,0.03rn) as validate”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeac348fa60>)])]>

These results are formatted in a structured data format (‘proto’) that is specifically compatible with TFDV’s display tools: tfmd.proto.anomalies_pbs2.Anomalies. Passing this as input to Python method tfdv.display_anomalies presents an easy-to-read table of anomaly detection results as presented after the code snippet:

code_block
<ListValue: [StructValue([(‘code’, ‘tfdv.display_anomalies(rn anomalies = json_format.ParseDict(validate, tfmd.proto.anomalies_pb2.Anomalies())rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeac348f940>)])]>

Feature name

Anomaly short description

Anomaly long description

‘culmen_depth_mm’

High approximate Jensen-Shannon divergence between training and serving

The approximate Jensen-Shannon divergence between training and serving is 0.0483968 (up to six significant digits), above the threshold 0.03.

‘flipper_length_mm’

High approximate Jensen-Shannon divergence between training and serving

The approximate Jensen-Shannon divergence between training and serving is 0.917495 (up to six significant digits), above the threshold 0.03.

‘body_mass_g’

High approximate Jensen-Shannon divergence between training and serving

The approximate Jensen-Shannon divergence between training and serving is 0.356159 (up to six significant digits), above the threshold 0.03.

‘island’

High Linfty distance between training and serving

The Linfty distance between training and serving is 0.118041 (up to six significant digits), above the threshold 0.03. The feature value with maximum difference is: Dream

‘culmen_length_mm’

High approximate Jensen-Shannon divergence between training and serving

The approximate Jensen-Shannon divergence between training and serving is 0.0594803 (up to six significant digits), above the threshold 0.03.

‘sex’

High Linfty distance between training and serving

The Linfty distance between training and serving is 0.0513795 (up to six significant digits), above the threshold 0.03. The feature value with maximum difference is: FEMALE

The default detection methods for numerical and categorical data, as well as thresholds are the same as for the other functions shown above. You can customize detection with parameters in the function for precision monitoring needs. For a deeper dive, the accompanying tutorial includes a section that demonstrates how these metrics are calculated manually and uses this function to compare to the manual calculation results as a validation.

Online and batch serving: A unified model monitoring approach

BigQuery’s model monitoring functions offer a streamlined solution whether you’re working with models deployed on Vertex AI Prediction Endpoints or using batch serving data stored within BigQuery (as shown above). Here’s how:

Batch serving: For batch prediction data already stored or accessible by BigQuery, the monitoring features are readily accessible just as demonstrated previously in this blog.

Online serving: Directly monitor models deployed on Vertex AI Prediction Endpoints. By configuring logging requests and responses to BigQuery, you can easily apply BigQuery ML model monitoring functions to detect skew and drift.

The accompanying tutorial provides a step-by-step walkthrough, demonstrating endpoint creation, model deployment, logging setup (for Vertex AI to BigQuery), and how to monitor both online and batch serving data within BigQuery.

Automate for scale

To achieve truly scalable monitoring of shifts and drifts, automation is essential. BigQuery’s procedural language offers a powerful way to streamline this process, as demonstrated in the SQL query from our introductory notebook. This automation isn’t limited to monitoring; it can extend to continuous model retraining. In a production environment, continuous training would be accompanied by: proactively identifying data quality issues, adapting to real-world changes, and maintaining a rigorous deployment strategy aligned with your organization’s needs.

code_block
<ListValue: [StructValue([(‘code’, “DECLARE skew_anomalies ARRAY<STRING>;rnrn# Monitor Skew: latest serving compared to trainingrnSET skew_anomalies = (rn SELECT ARRAY_AGG(input)rn FROM ML.VALIDATE_DATA_SKEW(rn MODEL `bqml_model_monitoring.classify_species_logistic`,rn (rn SELECT *rn FROM `bqml_model_monitoring.serving`rnt WHERE DATE(instance_timestamp) >= CURRENT_DATE()rn )rn )rn WHERE is_anomaly = Truern);rnrnIF(ARRAY_LENGTH(skew_anomalies) > 0) THENrn # retrain the modelrn CREATE OR REPLACE MODEL `bqml_model_monitoring.classify_species_logistic`rnt# find the full model training query in the introduction notebookrn ;rn rn # force alert with messagern SELECT ERROR(rn CONCAT(rn ‘\n\nFound data skew in features: ‘,rn ARRAY_TO_STRING(skew_anomalies, ‘, ‘),rn ‘. Model is retrained with latest up to date serving data.\n\n’rn )rn );rnrn ELSE SET skew_anomalies = [‘No skew detected.’];rnEND IF;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeac348fee0>)])]>

Let’s take a look at what the results look like:

code_block
<ListValue: [StructValue([(‘code’, ‘Found data skew in features: body_mass_g, flipper_length_mm. Model is retrained with the latest serving data.’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeac348f550>)])]>

A skew anomaly was detected and successfully triggered model retraining, restoring accuracy after the data changes. This demonstrates the value of automated monitoring and retraining for maintaining model performance in dynamic production environments.

To streamline this process, Google Cloud offers several powerful automation options::

BigQuery Scheduled Queries

Want a hands-on demonstration? Our accompanying tutorial dives into BigQuery scheduled queries, including historical backfilling, daily monitoring, and setting up email alerts for detected shifts and drifts. We’ll also be releasing future tutorials covering the other automation tools.

The simplicity and power of model monitoring With BigQuery

Building trustworthy machine learning systems requires continuous monitoring. BigQuery’s new model monitoring functions streamline this to just a few SQL functions:

Deeply understand your data: ML.DESCRIBE_DATA provides a comprehensive view of your datasets, aiding in feature engineering and quality checks.

Detect skew between training and serving data: ML.VALIDATE_DATA_SKEW directly compares BigQuery ML models against their serving data.

Monitor data drift over time: ML.VALIDATE_DATA_DRIFT empowers you to track changes in serving data, ensuring your model’s performance remains consistent.

Enhance your TFDV workflow: ML.TFDV_DESCRIBE and ML.TFDV_VALIDATE bring the precision of TensorFlow Data Validation directly into BigQuery, enabling more detailed visualizations and anomaly detection while leveraging BigQuery’s scalable, and efficient compute.

Getting Started

Extend from BigQuery ML models to Vertex AI Models and automate these new functions with Google Cloud offerings like BigQuery scheduled queries, Dataform, Workflows, Cloud Composer, or Vertex AI Pipelines. Dive into our hands-on notebooks to get started today:

Companion Introduction – a fast introduction to all the new functions

Read More for the details.

2024 04 25

GCP – Announcing PyTorch/XLA 2.3: Distributed training, dev improvements, and GPUs

Cloud, Google Cloud gcp

PyTorch’s flexibility and dynamic nature make it a popular choice for deep learning researchers and practitioners. Developed by Google, XLA is a specialized compiler designed to optimize linear algebra computations – the foundation of deep learning models. PyTorch/XLA offers the best of both worlds: the user experience and ecosystem advantages of PyTorch, with the compiler performance of XLA.

PyTorch/XLA stack diagram

We are excited to launch PyTorch/XLA 2.3 this week. The 2.3 release brings with it even more productivity, performance and usability improvements.

Why PyTorch/XLA?

Before we get into the release updates, here’s a short overview of why PyTorch/XLA is great for model training, fine-tuning and serving. The combination of PyTorch and XLA provides key advantages:

Easy Performance: Retain PyTorch’s intuitive, pythonic flow while gaining significant and easy performance improvements through the XLA compiler. For example, PyTorch/XLA produces a throughput of 5000 tokens/second while finetuning Gemma and Llama 2 7B models and reduces the cost of serving down to $0.25 per million tokens.

Ecosystem advantage: Seamlessly access PyTorch’s extensive resources, including tools, pretrained models, and its large community.

These benefits underscore the value of PyTorch/XLA. Lightricks shares the following feedback on their experience with PyTorch/XLA 2.2:

“By leveraging Google Cloud’s TPU v5p, Lightricks has achieved a remarkable 2.5X speedup in training our text-to-image and text-to-video models compared to TPU v4. With the incorporation of PyTorch XLA’s gradient checkpointing, we’ve effectively addressed memory bottlenecks, leading to improved memory performance and speed. Additionally, autocasting to bf16 has provided crucial flexibility, allowing certain parts of our graph to operate on fp32, optimizing our model’s performance. The XLA cache feature, undoubtedly the highlight of PyTorch XLA 2.2, has saved us significant development time by eliminating compilation waits. These advancements have not only streamlined our development process, making iterations faster but also enhanced video consistency significantly. This progress is pivotal in keeping Lightricks at the forefront of the generative AI sector, with LTX Studio showcasing these technological leaps.” – Yoav HaCohen, Research team lead, Lightricks

What’s in the 2.3 release: Distributed training, dev experience, and GPUs

PyTorch/XLA 2.3 keeps us current with PyTorch Foundation’s 2.3 release from earlier this week, and offers notable upgrades from PyTorch/XLA 2.2. Here’s what to expect:

1. Distributed training improvements

SPMD with FSDP: Fully Sharded Data Parallel (FSDP) support enables you to scale large models. The new Single Program, Multiple Data (SPMD) implementation in 2.3 integrates compiler optimizations for faster, more efficient FSDP.

Pallas integration: For maximum control, PyTorch/XLA + Pallas lets you write custom kernels specifically tuned for TPUs.

2. Smoother development

SPMD auto-sharding: SPMD automates model distribution across devices. Auto-sharding further simplifies this process, eliminating the need for manual tensor distribution. In this release, this feature is experimental, supporting XLA:TPU and single-host training.

PyTorch/XLA autosharding architecture

Distributed checkpointing: This makes long training sessions less risky. Asynchronous checkpointing saves your progress in the background, protecting against potential hardware failures.

3. Hello, GPUs!

SPMD XLA: GPU support: We have extended the benefits of SPMD parallelization to GPUs, making scaling easier, especially when handling large models or datasets.

Start planning your upgrade

PyTorch/XLA continues to evolve, streamlining the creation and deployment of powerful deep learning models. The 2.3 release emphasizes improved distributed training, a smoother development experience, and broader GPU support. If you’re in the PyTorch ecosystem and seeking performance optimization, PyTorch/XLA 2.3 is worth exploring!

Stay up-to-date, find installation instructions or get support on the official PyTorch/XLA repository on GitHub: https://github.com/pytorch/xla

PyTorch/XLA is also well-integrated into the AI Hypercomputer stack that optimizes AI training, fine-tuning and serving performance end-to-end at every layer of the stack:

Ask your sales representative about how you can apply these capabilities within your own organization.

Read More for the details.