2025 11 04

GCP – 7 ways networking powers your AI workloads on Google Cloud

When we talk about artificial intelligence (AI), we often focus on the models, the powerful TPUs and GPUs, and the massive datasets. But behind the scenes, there’s an unsung hero making it all possible: networking. While it’s often abstracted away, networking is the crucial connective tissue that enables your AI workloads to function efficiently, securely, and at scale.

In this post, we explore seven key ways networking interacts with your AI workloads on Google Cloud, from accessing public APIs to enabling next-generation, AI-driven network operations.

#1 – Securely accessing AI APIs

Many of the powerful AI models available today, like Gemini on Vertex AI, are accessed via public APIs. When you make a call to an endpoint like *-aiplatform.googleapis.com, you’re dependent on a reliable network connection. To gain access these endpoints require proper authentication. This ensures that only authorized users and applications can access these powerful models, helping to safeguard your data and your AI investments. You can also access these endpoints privately, which we will see in more detail in point # 5.

#2 – Exposing models for inference

Once you’ve trained or tuned your model, you need to make it available for inference. In addition to managed offerings in Google Cloud, you also have the flexibility to deploy your models on infrastructure you control, using specialized VM families with powerful GPUs. For example, you can deploy your model on Google Kubernetes Engine (GKE) and use the GKE Inference Gateway, Cloud Load Balancing, or a ClusterIP to expose it for private or public inference. These networking components act as the entry point for your applications, allowing them to interact with your model deployments seamlessly and reliably.

#3 – High-speed GPU-to-GPU communication

AI workloads, especially training, involve moving massive amounts of data between GPUs. Traditional networking, which relies on CPU copy operations, can create bottlenecks. This is where protocols like Remote Direct Memory Access (RDMA) come in. RDMA bypasses the CPU, allowing for direct memory-to-memory communication between GPUs.

To support this, the underlying network must be lossless and high-performance. Google has built out a non-blocking rail-aligned network topology in its data center architecture to support RDMA communication and node scaling. Several high-performance GPU VM families support RDMA over Converged Ethernet (RoCEv2), providing the speed and efficiency needed for demanding AI workloads.

#4 – Data ingestion and storage connectivity

Your AI models are only as good as the data they’re trained on. This data needs to be stored, accessed, and retrieved efficiently. Google Cloud offers a variety of storage options, for example Google Cloud Storage, Hyperdisk ML and Managed Lustre. Networking is what connects your compute resources to your data. Whether you’re accessing data directly or over the network, having a high-throughput, low-latency connection to your storage is essential for keeping your AI pipeline running smoothly.

#5 – Private connectivity to AI workloads

Security is paramount, and you often need to ensure that your AI workloads are not exposed to the public internet. Google Cloud provides several ways to achieve private communication to both managed Vertex AI services and your own DIY AI deployments. These include:

VPC Service Controls: Creates a service perimeter to prevent data exfiltration.
Private Service Connect: Allows you to access Google APIs and managed services privately from your VPC. You can use PSC endpoints to connect to your own services or Google services.
Cloud DNS: Private DNS zones can be used to resolve internal IP addresses for your AI services.

#6 – Bridging the gap with hybrid cloud connections

Many enterprises have a hybrid cloud strategy, with sensitive data remaining on-premises. The Cross-Cloud Network allows you to architect your network to provide any-to-any connectivity. With design cases covering distributed applications, Global front end, and Cloud WAN, you can build your architecture securely from on-premises, other clouds or other VPCs to connect to your AI workloads. This hybrid connectivity allows you to leverage the scalability of Google Cloud’s AI services while keeping your data secured.

#7 – The Future: AI-driven network operations

The relationship between AI and networking is becoming a two-way street. With Gemini for Google Cloud, network engineers can now use natural language to design, optimize, and troubleshoot their network architectures. This is the first step towards what we call “agentic networking,” where autonomous AI agents can proactively detect, diagnose, and even mitigate network issues. This transforms network engineering from a reactive discipline to a predictive and proactive one, ensuring your network is always optimized for your AI workloads.

Learn more

To learn more about networking and AI on Google Cloud dive deeper with the following:

Documentation: AI Hypercomputer
Codelabs: Gemini CLI on GCE with a Private Service Connect endpoint
White paper: Leveling up with Autonomous Network Operations.

Want to ask a question, find out more or share a thought? Please connect with me on Linkedin.

GCP – 7 ways networking powers your AI workloads on Google Cloud

#1 – Securely accessing AI APIs

#2 – Exposing models for inference

#3 – High-speed GPU-to-GPU communication

#4 – Data ingestion and storage connectivity

#5 – Private connectivity to AI workloads

#6 – Bridging the gap with hybrid cloud connections

#7 – The Future: AI-driven network operations

Learn more

Diving into the technology behind Google’s AI-era global network

Related Posts

GCP – Automating FinOps cost management policies using Workload Manager

GCP – Upgrading Kubernetes versions just got safer with minor version rollback

AWS – EC2 Auto Scaling announces warm pool support for Auto Scaling groups that have mixed instances policies