GCP – How Verve achieves 37% performance gains with C4 machines and new GKE features
Earlier this year, Google Cloud launched the highly anticipated C4 machine series, built on the latest Intel Xeon Scalable processors (5th Gen Emerald Rapids), setting a new industry-leading performance standard for both Google Compute Engine (GCE) and Google Kubernetes Engine (GKE) customers. C4 VMs deliver exceptional performance improvements and have been designed to handle your most performance-sensitive workloads delivering up to a 25% price-performance improvement over the previous generation general-purpose VMs, C3 and N2.
C4 VMs are already delivering impressive results for businesses. Companies like Verve, which is a creator of digital advertising solutions, are already integrating C4 into their core infrastructure; in Verve’s case, they’re seeing remarkable results with a 37% improvement in performance. For Verve, C4 isn’t only about better performance — it’s actually fueling their revenue growth.
Read on to discover how Verve leveraged C4 to achieve this success, including their evaluation process and the key metrics that demonstrate C4’s impact on their business.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e44b8003220>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Verve’s Challenge and Business Opportunity
Verve delivers digital ads across the internet with a platform that connects ad buyers to other ad-delivery platforms, as well as allows these advertisers to bid on ad space through a proprietary real-time auction platform. Real-time is the key here, and it’s also why C4 has made such a big impact on their business.
A marketplace for ad bidding is an incredibly latency and performance-sensitive workload. About 95% of the traffic hitting their marketplace, which runs on GKE, is not revenue generating, because the average ad fill-rate is only 5-7% of bids.
It takes a lot of cloud spend to fill bid requests that never generate revenue, and so any increase in performance or reduction in latency can have a tremendous impact on their business. In fact, the more performance Verve can get out of GKE, the more revenue they generate because the fill-rate for ads (successful bid/ask matching) grows exponentially.
Fast Facts on Verve and their ad-platform:
Verve’s GKE Architecture and C4 Evaluation Plan
Verve’s marketplace ran on N2D machines leveraging an Envoy-based reverse proxy (Countour) for ingress and egress. Verve is handling a high volume of traffic, with hundreds of millions of events daily (clicks, impressions, actions, in-app events, etc.).
This means they need to be able to scale their servers fast to handle traffic spikes and control who has access to our servers and with which permissions. Verve has built its infrastructure on top of Kubernetes to allow elasticity and scalability, and they rely a lot on spot pricing to be cost effective.
To setup the benchmark, Verve ran a canary, meaning one pod of the main application per node type, and measured two values, one related to performance exported from the application, vCPU per ad request 99th percentile in ms, and one related to spot price, which is given by the total compute price (vCPU + GB RAM):
Leveraging GKE Gateway to Save Costs and Improve Latency
Verve needs to scale their servers fast to handle traffic spikes with the lowest latency possible and rapid scalability, and for this they rely on Google GKE Gateway which leverages Google’s Envoy-based global load balancers.Their solution optimizes real-time bidding for ads, boosting revenue through improved response times and efficient cost management in a market where latency is correlated to bids and revenue, somewhat similar to High-Frequency Trading (HFT) in financial markets.
By migrating to GKE Gateway, Verve managed to improve its Total Cost of Ownership (TCO). Google only charges for traffic going through the Gateway, so Verve saw significant compute cost savings by not having to spin up GKE nodes for the proxies. Also, Verve saw a notable reduction in the burden of maintaining this GKE Gateway-based solution compared to an Ingress-based solution, which impacted their TCO. The cherry on top of it all is they saw improved latencies by 20-25% in the traffic itself and this generated 7.5% more revenue.
Saving Costs While Achieving Better Performance with Custom Compute Classes
Anticipating their high season, Verve worked with their GCP Technical Account Manager to get onboarded in the Early Access Program of Custom Compute Classes, a new feature which Verve had been eagerly anticipating for years.
Custom Compute Classes (CCC) is a Kubernetes-native, declarative API that can be used to define fallback priorities for autoscaled nodes in case a top priority is unavailable (e.g. a spot VM). It also has an optional automatic reconciliation feature which can move workloads to higher priority node shapes if and when they become available.
This lets GKE customers define a prioritized list of compute preferences by key metrics like price/performance, and GKE automatically handles scale-up and consolidation onto the best options available at any time. Verve is using CCCs to help establish C4 as their preferred machine, but they also use it to specify other machine families to maximize their obtainability preferences.
Pablo Loschi, Principal Systems Engineer at Verve, was impressed with the versatility his team was able to achieve:
“With Custom Compute Classes,” Loschi said, “we are closing the circle of cost-optimization. Based on our benchmarks, we established a priority list of spot machine types based on price/performance, and CCC enables us to maximize obtainability and efficiency by providing fall-back compute priorities as a list of preferred machines. We love how when out-of-stock machines become available again CCC reconciles to preferential infrastructure, finally eliminating the false dichotomy of choosing between saving costs and machine availability, even in the busy season”
Verve’s Results and Business Impact
Verve benchmarked their marketplace running on GKE across several GCE machines. Today their marketplace runs on N2D machines, and by switching to C4 they saw a 37% improvement in performance.
They also switched from a self-managed Contour-Envoy proxy to GKE Gateway, which saw a dramatic improvement in latency of 20% to 25%, which translated into 7.5% more revenue since more bids are auctioned. GKE Gateway also allowed them to save a lot of compute costs because the load balancer doesn’t charge per compute but only per network. Additionally, they benefited from reduced manual burden of managing, updating, and scaling this solution.
“We were able to directly attribute the reduced latency to revenue growth — more bids are being accepted because they are coming faster,” Ken Snider, Verve VP of Cloud Infrastructure, said.
The combination of switching to C4 and GKE Gateway is driving their business’ revenue growth. “We started on a path a year ago talking with the product team from Google to help solve this problem, and now we are seeing it come together,” Snider said.
The next phase for Verve’s optimization journey is to improve their compute utilization, ensuring maximal usage of all deployed GKE nodes. GKE features such as Node Autoprovisioning and Custom Compute Classes will continue to play an important role in his team’s efforts in driving top-line growth for the business while being good stewards of their cloud costs.
C4 Brings Unparalleled Performance
C4 VMs are built on the latest Intel Xeon Scalable processors (5th Gen Emerald Rapids), delivering a significant performance leap for mission-critical and performance-sensitive workloads such as databases, gaming, financial modeling, data analytics, and inference.
Leveraging Google’s custom-designed Titanium infrastructure, C4 VMs provide high bandwidth and low latency networking for optimal performance with up to 200 Gbps of bandwidth, as well as high-performance storage with Hyperdisk. With C4, storage is offloaded to the Titanium adapter, reserving the host resources for running your workloads. And by leveraging hitless upgrades and live migration, the vast majority of infrastructure maintenance updates are performed with near-zero impact to your workloads, minimizing disruptions and providing predictable performance. For real-time workloads, C4 offers up to 80% better CPU responsiveness compared to previous generations, resulting in faster trades and a smoother gaming and streaming experience.
But C4 offers more than just powerful hardware; it’s a complete solution for performance-critical workloads. C4 VMs integrate seamlessly with Google Kubernetes Engine (GKE), enabling you to easily deploy and manage containerized applications at scale.
A range of machine types with varying vCPU and memory configurations are available to match your specific needs. And with its superior price-performance, C4 VMs deliver exceptional value, helping you optimize your cloud spend without compromising on performance.
Next Steps
- Learn more about C4
- Learn more about GKE Gateway
- Learn more about CCC
- Learn more about saving costs by using GKE Autopilot
Read More for the details.