GCP – PyTorch / XLA now generally available on Cloud TPUs
The PyTorch machine learning (ML) framework is popular in the ML community for its flexibility and ease-of-use, and we are excited to support it across Google Cloud. Today, we’re announcing that PyTorch / XLA support for Cloud TPUs is now generally available. This means PyTorch users can access large scale, low cost Cloud TPU hardware accelerators using a stable and well-supported PyTorch integration.
PyTorch / XLA combines the intuitive APIs of PyTorch with the strengths of the XLA linear algebra compiler, which can target CPUs, GPUs, and Cloud TPUs, including Cloud TPU Pods. PyTorch / XLA will run most standard PyTorch programs with minimal modifications, falling back to CPU to execute operations that are not yet supported on TPUs. With the help of a detailed report that PyTorch / XLA generates, PyTorch users can find bottlenecks and adapt their programs to run more efficiently on Cloud TPUs.
“PyTorch / XLA has enabled me to run thousands of experiments on Cloud TPUs with barely any changes to my PyTorch workflow,” said Jonathan Frankle, a PhD candidate at Massachusetts Institute of Technology (MIT). “It provides the best of both worlds: the ease of PyTorch and the speed and cost-efficiency of TPUs,” he said. Frankle has used PyTorch / XLA to scale up his latest research related to “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks,” his breakthrough work that won a Best Paper award at ICLR 2019.
The Allen Institute for AI (AI2) recently used PyTorch / XLA on Cloud TPUs across several projects. Matthew Peters, a research scientist at AI2, is currently using PyTorch / XLA to investigate methods to add a visual component to state-of-the-art language models to improve their language understanding capabilities. “While PyTorch / XLA is still a new technology, it provides a promising new platform for organizations that have already invested in PyTorch to train their machine learning models,” Peters said.
To help you get started with PyTorch / XLA, Google Cloud supports a growing set of open-source implementations of widely-used deep learning models and associated tutorials. Here are the tutorials for ResNet-50, Fairseq Transformer, Fairseq RoBERTa, and, now, DLRM. We are also developing open-source tools to facilitate continuous testing of ML models, and we have helped the PyTorch Lightning and Hugging Face teams use this framework to run their own tests on Cloud TPUs. (Here’s a related blog post from the PyTorch Lightning team.)
Check out the tutorials linked above, experiment with PyTorch / XLA right in your browser via Colab, and post issues and pull requests to the PyTorch / XLA GitHub repo. We’ve also just released a new Deep Learning VM (DLVM) image that has PyTorch / XLA preinstalled along with PyTorch 1.6—here are instructions on how to get started quickly with this new DLVM image.
For more technical information about PyTorch / XLA, including sample code, be sure to read this companion post on the official PyTorch Medium site.
Read More for the details.