GCP – Supporting research computing at Stanford and UC Riverside with HPC on Google Cloud
We are in the midst of a new era of research and innovation across disciplines and industries. Researchers increasingly rely on computational methods to drive their discoveries, and AI-infused tools supercharge the need for powerful computational resources. Not only is the speed of discovery accelerating, the workflows used to make these discoveries are also rapidly advancing. These increasing and changing demands put tremendous pressure on those responsible for providing research groups with the needed high performance computing (HPC) resources. Helping the teams that provide research computing resources to scientists is where Google Cloud steps in.
The cloud is made to readily meet changing computational requirements. At Google Cloud, we apply this flexibility to HPC demands such as those posed by research. To make the provisioning, modifying, and decommissioning of those HPC environments in the cloud more accessible, we have developed the Google Cloud HPC Toolkit, an open-source software product for defining HPC environments in Google Cloud.
The Google Cloud HPC Toolkit encapsulates HPC best practices and cleanly exposes relevant configuration parameters in a compact infrastructure-as-code configuration file. At the same time, the user can inspect the complete set of commands in the configuration file used to create the infrastructure and thereby control the process and customize components as needed. This approach caters to research groups and IT staff from research computing facilities alike, helping them build HPC environments that meet dynamic computational science and AI/ML needs.
Cloud HPC Toolkit can dramatically simplify an HPC deployment, using <100 lines of YAML in an HPC blueprint to create an HPC environment that would require 40,000+ lines of code
Stanford School of Sustainability
Stanford was among the earliest adopters of Google’s Cloud HPC Toolkit. They started using it to connect Google Cloud and on-premises resources as soon as it became available.
As the school began adopting cloud computing, they wanted to ensure researchers could maintain the same experience they had on-premises. To do this, they paired a custom startup script with Chrome Remote Desktop to give researchers the same interface they’re familiar with, keeping the process streamlined no matter where research is being done. The researcher logs in to interactive nodes in Chrome Remote Desktop and is greeted with an experience similar to accessing on-premises clusters. Their use of the toolkit has continued to grow, developing their own modules to enable quick, secure use of Vertex AI instances for code development for an ever growing user base.
“Stanford Doerr School of Sustainability is expanding computing capabilities for its researchers through Google Cloud. HPC Toolkit enables us to quickly, securely and consistently deploy HPC systems to run experiments at scale, with blueprints that can be deployed and redeployed with varying parameters, giving us the agility we need to meet the burgeoning needs of our scientists. With the Toolkit, we can stand up clusters with different partitions depending on our users’ needs, so that they can take advantage of the latest hardware like NVIDIA GPUs when needed and leverage Google Cloud’s workload-optimized VMs to reach price-performance targets. Dynamic cluster sizes, the ability to use spot VMs when appropriate in cluster partitions, and the ability to quickly get researchers up and running in environments they are used to have all been enhanced by the Toolkit.”
– Robert Clapp, Senior Research Engineer, Stanford Doerr School of Sustainability
University of California, Riverside
The University of California, Riverside (UCR), has been working to adopt Google Cloud for its infrastructure needs. Ensuring that researchers retain access to high-performance computing was an important consideration with the move.
The UCR’s HPC admin team works with a number of different research groups, each of which has their own specs for what they need clusters to support. To support these groups, the HPC team needed a common set of parameters that could be customized quickly. They also needed to ensure the security and reliability of those clusters. To do this, they created a set of Cloud HPC Toolkit blueprints that allowed their users to customize specific components while maintaining a specific set of standards. This standardization also allows them to share blueprints with other researchers to let other labs more easily reproduce their work and validate findings. They followed up by employing auto-scaling measures to ensure their researchers weren’t limited by storage or compute capacity.
“Through the scale and innovation of Google Cloud and the HPC Toolkit, we’ve revolutionized research at UCR. We achieve goals once deemed impossible by our researchers, in extraordinary timeframes. The HPC Toolkit simplifies the design and administration of HPC clusters, making them standardized and extensible. This enables UCR to offer lab-based HPC clusters that are tailored for specific research and workflow needs. Whether it’s AI/ML HPC GPU Clusters, Large Scale Distributed Clusters, High Memory HPC Clusters, or HPC Visualization Clusters, the HPC Toolkit streamlines the process, making it as easy as configuring a YAML file.”
– Chuck Forsyth, Director Research Computing, UCR
Start your own journey
Stanford and UCR are just two organizations leveraging the power of HPC on Google Cloud to drive innovation forward. Whether you are a research group taking care of your own computing resources or a professional working in a university research computing facility, the Cloud HPC Toolkit provides a powerful approach to manage your infrastructure in the cloud. The Toolkit is open source with a community of users and Google Cloud partners that share ready-made blueprints to help you get started. Visit the Google Cloud HPC Toolkit and the Google Cloud HPC sites to learn more and get started with HPC on Google Cloud today.
What’s more, if you work with a public sector entity, we can help keep costs predictable through our Public Sector Subscription Agreement (PSSA), which reduces the risk of unforeseen expenses by maintaining a fixed monthly price for defined environments, without the variability of month-to-month consumption spikes. The PSSA helps meet the needs of researchers with predetermined budgets and grants that can’t be exceeded.
Connect with your Google Cloud team today to learn more about our Public Sector Subscription Agreement (PSSA), HPC solutions for verticals and HPC tooling aimed at administrative tasks. We’re here to help you get your researchers and their HPC workloads up and running on Google Cloud.
Read More for the details.