GCP – SandboxAQ: Accelerating drug discovery through cloud integration
The traditional drug discovery process involves massive capital investments, prolonged timelines, and is plagued with daunting failure rates. From initial research to obtaining regulatory approval, bringing a new drug to market can take decades. During this time, many drug candidates that had seemed very promising fail to deliver, either due to inefficacy or safety concerns. Only a small fraction of candidates successfully make it through clinical trials and regulatory hurdles.
Enter SandboxAQ, which is helping researchers explore vast chemical spaces, gain deep insights into molecular interactions, and predict biological outcomes with precision. It does so with cutting-edge computational approaches such as active learning, absolute free energy perturbation solution (AQFEP), generative AI, structural analysis, and predictive data analytics, ultimately reducing drug discovery and development timelines. And it does all this on a cloud-native foundation.
Drug design involves an iterative cycle of designing, synthesizing, and testing molecules referred to as the Design-Make-Test cycle. Many customers approach SandboxAQ during the design phase, often when their computational methods are falling short. By improving and accelerating this part of the cycle, SandboxAQ helps medicinal chemists bring innovative and effective molecules to market. For example, in a project related to neurodegenerative disease, SandboxAQ’s approach expanded chemical space from 250,000 to 5.6 million molecules, achieving a 30-fold increase in hit rate and dramatically accelerating the discovery of candidate molecules.
Cloud-native development for scientific insight
SandboxAQ’s software relies on large-scale computation and to maximize flexibility and scale, they use a cloud strategy, which includes Google Cloud infrastructure and tools.
The technologies in large-scale virtual screening campaigns need to be agile and scale cost-effectively. Specifically, SandboxAQ engineers need to be able to quickly iterate on scientific code, immediately run that code at scale cost-effectively, and store and organize all of the data it produces.
SandboxAQ achieved a significant boost in efficiency and scalability with Google Cloud infrastructure. They scaled their computational throughput by 100X to leverage tens of thousands of virtual machines (VMs) in parallel. They also improved utilization by reducing idle time by 90%. By consolidating development and deployment on Google Cloud, SandboxAQ streamlined its workflows, from code development and testing to large-scale batch processing and machine-learning model training.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e17204f9340>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
All of SandboxAQ’s development and deployment takes place in the cloud. Code and data live in cloud-based services, and development is done on a cloud-based platform that provides scientists and engineers with self-service VMs with standardized and centrally maintained environments and tools. This is important, because scientific code often requires heavy-duty computing hardware. Scientists have access to hefty 96-core machines, or instances with large GPUs. They can also create new machines with alternate configurations or CPU types as depicted below, enabling low-friction testing and development processes across heterogeneous resources.
SanboxAQ scientists and developers manage and access their Bench machines (see above) using the company’s `bench` client. They can connect to machines via SSH or use any number of managed tools, for example a browser-based VNC service for instant remote desktop, or JupyterLab for a familiar notebook development flow.
As code is ready to be run at a larger scale, researchers can dispatch SandboxAQ parameterized sets of computations as jobs on an internal tool powered by Batch, a fully managed service to schedule, queue, and execute batch jobs on Google infrastructure. With development and batch runtime environments closely synced, changes can be quickly run at scale. Code developed on bench machines is pushed to GitHub and immediately available for batch execution. Then, as tools are reviewed and merged into `main` of the company’s monorepo, the new tools become automatically available on SandboxAQ scientists’ bench machines, who can launch parallel jobs processing millions of molecules on any kind of Google Cloud VM resource in any global zone, utilizing either on-demand or Spot VMs.
SandboxAQ’s implementation of a globally resolved transitive dependency tree, enables simple package and dependency management. With this practice, Google Batch can seamlessly integrate with individual tools developed by engineers to train many instances of a model in parallel.
Machine learning is a core component of SandoxAQ’s strategy, making easy data access especially important. At the same time, SandboxAQ’s Drug Discovery team also works with clients who have sensitive data. To secure customers’ data, bench and batch workloads read and write data from a unified interface that’s managed via IAM, allowing granular control of different data sources within the organization.
Meanwhile, Google Cloud services like Cloud Logging, Cloud Monitoring, Compute Engine and Cloud Run make it simple to develop tools to monitor these workloads, easily surface logs to SandboxAQ scientists, and comb through huge amounts of output data. As new features are tested or bugs show up, changes are made immediately available to the scientific team, without having to wrangle infrastructure. Then, as code becomes stable, they can incorporate it into downstream production applications, all in a centrally secured, unified way on Google Cloud.
In short, having a unified development, batch compute, and production environment on Google Cloud reduces the friction SandboxAQ faces to develop new workloads and run them at scale. With shared environments for scientific workload development and engineering, SandboxAQ makes it quick and easy for customers to move from experimentation to production, delivering the results customers want, fast.
SandboxAQ solution in the real world
SandboxAQ is already having a profound impact on drug discovery programs targeting a range of hard-to-treat diseases. For example, there are advanced collaborations with Professor Stanley Pruisner’s lab at University of California San Francisco (UCSF), Riboscience, Sanofi, and with the Michael J Fox Foundation, to name a few. With this approach built on Google CloudSandboxAQ has achieved a superior hit rate compared to other methods like high throughput screening, demonstrating the transformative potential of SandboxAQ on drug discovery and bringing cures to patients faster.
Visit the Google Cloud AI Hypercomputer web page to learn about Google Cloud AI infrastructure.
Read More for the details.