GCP – How we got to 100 million cells in our global Li-ion rack battery fleet
When it comes to data center power systems, batteries play an important role. The applications that run in our data centers require nearly continuous uptime. And while utility power is highly reliable, power outages are unavoidable.
When an outage happens, batteries can supply short-duration power, allowing servers to operate continuously when the facility switches between AC power sources, or to ride through transient power disturbances. Or, if a facility loses both primary and alternate power sources for an extended period of time, batteries can supply sufficient power to allow machines to execute a clean shutdown procedure. This is helpful in expediting machine restarts after the power outage. More importantly, it helps ensure that critical user data is safely stored to disk and not lost in the power disruption.
- aside_block
- <ListValue: [StructValue([(‘title’, “Ensure Your Data’s Safety and Uptime with Google Cloud for free”), (‘body’, <wagtail.rich_text.RichText object at 0x3e4d6dc78b20>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
At Google, we rely on a 48Vdc rack power system with integrated battery backup units (BBUs), and in 2015, we became one of the first hyperscale data center providers to deploy Lithium-ion BBUs. These Li-ion batteries had twice the life, twice the power and half the volume of previous-generation lead-acid batteries. Switching from lead-acid batteries to Li-ion means we deploy only one-quarter the number of batteries, greatly reducing the battery waste generated by our data centers.
We recently reached an important milestone: Google has more than 100 million cells deployed in battery packs across our global data center fleet. This is remarkable, and only possible thanks to the safety-first approach we take to deploy Li-ion batteries at scale.
The main safety risk associated with Li-ion batteries is the battery going into thermal runaway if it’s accidentally mishandled or exposed to excessive temperatures or overcharging. While a rare event, the resulting fire is extremely difficult to extinguish due to the large amount of heat generated, driving a thermal runaway chain reaction to nearby cells.
To deploy this large fleet of Li-ion cells, we have had to make safety a core principle of our battery design. Specifically, as an early adopter of the UL9540A thermal runaway test method, we subject our Li-ion BBU designs to rigorous flame safety testing that demonstrates their ability to limit thermal runaway. As a result, Google has successfully been granted permits to deploy BBUs in some of the world’s most stringent jurisdictions, in the APAC region.
In addition, our Li-ion BBUs benefit from our distributed UPS architecture that offers significant availability and TCO benefits compared to traditional monolithic UPS systems. The distributed UPS architecture improves machine availability by: 1) reducing the failure-domain blast radius to a single rack, and 2) locating the batteries in the rack to eliminate intermediate points of failure between the UPS and machines. This architecture also provides TCO benefits by scaling the UPS with the deployment, i.e., reducing day-1 UPS cost. Additionally, locating the batteries in the rack on the same DC bus as the machines eliminates intermediate AC/DC power conversion steps that cause efficiency losses. In 2016 we shared the 48V rack power system spec with the Open Compute Project, including specs for the Li-ion BBUs.
Li-ion batteries have been crucial to ensuring the uninterrupted operation of Google Cloud data centers. By transitioning from lead-acid to Li-ion BBUs, we’ve significantly improved power availability, efficiency, and lifespan, even as we simultaneously address their critical safety risks. Our commitment to rigorous safety testing and adherence to standards and test methods like UL9540A has enabled us to deploy millions of Li-ion BBUs globally, providing our customers with the high level of reliability they expect from Google Cloud.
Getting to 100 million Li-ion batteries is just one of many examples of how we are building a reliable cloud and power-efficient AI. As data center power systems evolve to include new technologies including large battery energy storage systems (BESS) and new workload requirements (AI workloads), we remain dedicated to exploring and implementing innovative solutions to build the most efficient and safest cloud data centers.
The authors would like to acknowledge Vijay Boovaragavan, Matt Tamashiro, Sandeep Sebastian, Thibault Pelloux-Gervais, Ken Wong, Mike Meakins, Stanley Fung, and Scott Sharp for their contributions.
Read More for the details.