, Author at Cloud bites from the grill

About

Posts by :

2023 08 31

AWS – Announcing Amazon CodeCatalyst’s Launch in Europe (Ireland)

Today, AWS announces the launch of Amazon CodeCatalyst in the Europe (Ireland) region. CodeCatalyst provides everything you need to start planning, coding, building, testing, and deploying applications on AWS with a streamlined, integrated experience. With CodeCatalyst, you can spend more time developing application features and less time setting up project tools, creating and managing continuous integration and continuous delivery (CI/CD) pipelines, provisioning and configuring development and deployment environments, and onboarding team members to their projects. CodeCatalyst is also available in the US West (Oregon) AWS Region and can deploy workloads to any public region worldwide.

Read More for the details.

2023 08 31

AWS – AWS Directory Service and AWS Private CA provide certificates for Active Directory

AWS, Cloud AWS

Starting today, AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD) and AD Connector are integrating with AWS Private Certificate Authority (AWS Private CA) to issue certificates for domain-joined objects that use Active Directory (AD) auto-enrollment. You can use a fully managed AWS Private CA drop-in replacement for your self-managed enterprise CAs without the need to deploy, patch, or update local agents or proxy servers. This also enables you to accelerate the migration of AD-aware workloads to AWS.

Read More for the details.

2023 08 31

AWS – Amazon SES now offers email delivery and engagement history for every email

AWS, Cloud AWS

Today, Amazon Simple Email Service (SES) launched a new deliverability feature that helps customers troubleshoot individual email delivery problems, confirm delivery of critical messages, and identify engaged recipients on a granular, single email basis. Senders can investigate trends in delivery performance and see delivery and engagement status for each email sent through SES. This makes it even easier for customers to manage and optimize their delivery and campaign performance using Virtual Deliverability Manager.

Read More for the details.

2023 08 31

Azure – Generally Available: Azure Monitor VM Insights using Azure Monitor Agent

Azure, Cloud Azure

VM Insights using Azure Monitor Agent provides various benefits like cost savings, simplified management experience and enhanced security & performance.

Read More for the details.

2023 08 31

AWS – Amazon SNS now supports additional usage metrics in Amazon CloudWatch

AWS, Cloud AWS

Amazon Simple Notification Service (Amazon SNS) has introduced additional usage metrics in Amazon CloudWatch for the following service quotas: ‘Topics per Account’, ‘Filter Policies per Account’, and ‘Pending Subscriptions per Account’. Under the ‘AWS/Usage’ namespace, you can now view the ‘ApproximateNumberOfTopics’, ‘ApproximateNumberOfFilterPolicies’, and ‘ApproximateNumberOfPendingSubscriptions’ metrics to monitor your usage of these Amazon SNS resources. Moreover, using AWS Service Quotas, you can also view your utilization metrics per quota, and create Amazon CloudWatch alarms to notify you when your utilization of a given quota exceeds your configurable threshold. This enables you to more precisely adapt your utilization of Amazon SNS, based on your applied quotas.

Read More for the details.

2023 08 31

AWS – Amazon RDS Optimized Writes for MySQL and MariaDB now supports r5 database instances

AWS, Cloud AWS

Amazon Relational Database Service (Amazon RDS) Optimized Writes now supports r5 database (DB) instances. With Amazon RDS Optimized Writes you can improve the write throughput for Amazon RDS for MySQL and MariaDB workloads by up to 2x at no additional cost. This is especially useful for write-intensive database workloads, commonly found in applications such as digital payments, financial trading, and online gaming.

Read More for the details.

2023 08 31

AWS – AWS SimSpace Weaver SDK for Python now available

AWS, Cloud AWS

Today we’re excited to announce the release of the AWS SimSpace Weaver SDK for Python to make it easier for simulation developers building Python based simulation projects to integrate with SimSpace Weaver.

Read More for the details.

2023 08 31

AWS – AWS SDK for SAP ABAP is now available in Amazon Web Services China Regions

AWS, Cloud AWS

The AWS SDK for SAP ABAP is now generally available in Amazon Web Services China (Beijing) Region, operated by Sinnet, and Amazon Web Services China (Ningxia) Region, operated by NWCD.

Read More for the details.

2023 08 31

AWS – AWS AppSync provides an improved module and functions for JavaScript DynamoDB resolvers

AWS, Cloud AWS

AWS AppSync is a managed service that makes it easier to build scalable APIs that connect applications to data. With AppSync, API developers can write resolvers to define the business logic that connects their AppSync GraphQL and Pub/Sub APIs to data. Developers use AppSync to interact with data sources like Amazon DynamoDB tables by writing their JavaScript resolvers that are executed on the AppSync JavaScript (APPSYNC_JS) runtime. Now, AppSync provides new functions to interact with DynamoDB tables that simplifies the developer experience in JavaScript resolvers.

Read More for the details.

2023 08 31

AWS – AWS Step Functions streamlines the authoring experience in Workflow Studio

AWS, Cloud AWS

AWS Step Functions is introducing enhancements to Workflow Studio, a visual workflow designer in the AWS console, which includes a streamlined transition between the visual builder and code, making it faster and easier to build workflows. To get started quickly, you can also choose from a collection of starter projects for common use cases and modify them using Workflow Studio.

Read More for the details.

2023 08 31

AWS – PostgreSQL 16 Release Candidate 1 is now available in Amazon RDS Database Preview Environment

AWS, Cloud AWS

Amazon Relational Database Service (Amazon RDS) for PostgreSQL 16 Release Candidate 1 (RC1) is now available in the Amazon RDS Database Preview Environment, allowing you to evaluate the pre-release of PostgreSQL 16 on Amazon RDS for PostgreSQL. You can deploy PostgreSQL 16 RC1 in the Preview Environment and have the same benefits of a fully managed database, making it simpler to set up, operate, and monitor databases. PostgreSQL 16RC1 in the Preview Environment also includes support for logical decoding on read replicas, AWS libcrypto (AWS-LC), and over 80 PostgreSQL extensions such as pgvector, pg_tle, h3-pg, pg_cron, and rdkit.

Read More for the details.

2023 08 31

AWS – Amazon QuickSight launches scaled shared folders for asset sharing at scale in a multi-tenant setup

AWS, Cloud AWS

Amazon QuickSight now supports scaled shared folders for multi-tenant setup that enables you to share QuickSight asset dashboards, analyses, datasets and datasources at scale with all your tenants at once. Prior to this launch, you could share a folder with up to hundred tenants at a time and had to create copies of folders and assets within them. Scaled shared folders are new type of shared folders which can be created and shared using APIs. To learn more, click here.

Read More for the details.

2023 08 31

GCP – Introducing Duet AI in Apigee API Management and Application Integration

Cloud, Google Cloud gcp

For years, Google Cloud’s Apigee API Management has been helping customers around the globe build APIs for diverse use cases, environments, and scale. Now, with the help of generative AI, we are reimagining the way APIs and integrations are developed, making it easier for both seasoned and new developers to seamlessly manage communications within their systems.

At Google Cloud Next, we introduced Duet AI into Apigee API Management and Application Integration to enable developers of any background to build APIs, integration flows, and extensions that connect Vertex AI or ChatGPT to real-world data through APIs. Customers can sign up to try these capabilities today, which will be in preview for trusted testers in the coming weeks.

Duet AI: your always-on AI collaborator

Duet AI is your AI-powered collaborator, available across Google Cloud and in your IDE to help you get more done, faster. You can access Duet AI capabilities though a chat interface, available in the Google Cloud console and IDEs such as Visual Studio Code and JetBrains IntelliJ (via Cloud Code extensions). Responding to natural language prompts, Duet AI in Apigee API Management and Application Integration automatically generates API specifications and integration flows, using your enterprise assets and context.

Duet AI in Apigee API Management

Apigee API Management lets you build, manage, and secure APIs for diverse use cases and environments, at scale. You can access Apigee through the Google Cloud console or in commonly used IDEs via Cloud Clode plug-in. And now, with the addition of Duet AI in Apigee API Management, you can create API specifications from requirements that you describe in natural language. These specifications will reuse your enterprise context such as security schema or other API objects cataloged in API Hub. This is a big productivity enhancement, as building API specifications usually requires specialized knowledge and time-intensive development. Beyond generating the specifications, Duet AI can also recommend improvements by highlighting errors and inconsistencies in syntax, semantics, and styles inline with other enterprise APIs cataloged in API Hub.

Create extensions for Vertex AI or ChatGPT

Extensions enable Vertex AI or ChatGPT to access real-world data and drive real-world actions. Developers can create an extension to Vertex AI or ChatGPT from the APIs cataloged in API Hub. To create an extension, all you have to do is tell Duet AI what type of extension to create and provide some defining details. Duet AI generates the extension for you based on these details and your API specification in API Hub.

Duet AI in Application Integration

Application Integration is an Integration Platform as a Service (iPaaS) to connect any application — both home-grown and third-party SaaS — with visual point-and-click configurations. And now, with Duet AI in Application Integration, you create integration flows from requirements described in natural language.

Application Integration provides an interface to describe your requirements in natural language, for example, “create an integration to update a case in Salesforce, when a new issue is created in JIRA.” Based on the requirements, Duet AI automatically suggests integration flows consistent with the existing integrations, using your enterprise assets (such as APIs cataloged in the API Hub) when applicable. In addition, you can update or extend the integration flows by providing additional prompts.

Automatically map data based on the requirements

Based on the variables created in the integration flow and the applications being integrated, Duet AI automatically creates a default data mapping that connects the two applications. The data mapping can be further refined per requirements.

Build documentation and test cases to harden the integration flows

Duet AI also generates documentation and functional test cases in a single click using different data sets. With this capability, you can ensure that integration flows are robust and easily understood for developers.

Get started

Duet AI in Apigee API Management and Application Integration is available for access in private preview for users part of Google Cloud’s Trusted Tester program. To access these capabilities, sign up to become a trusted tester here. You can also get started with Apigee API Management or Application Integration today.

Read More for the details.

2023 08 31

Azure – Public Preview: Azure Log Alerts support for Azure Resource Graph (ARG)

Azure, Cloud Azure

We are now introducing support for running queries also on Azure Resource Graph (ARG) tables, and even joining data between Azure Resource Graph (ARG) data sources from your Log Analytics workspace and Application Insights resources in a single query.

Read More for the details.

2023 08 31

Azure – Azure Firewall: Auto-Learn SNAT routes feature is now in public preview

Azure, Cloud Azure

Azure Firewall can now learn address ranges and automatically configure them to be excluded from SNAT, to reduce time and complexity spent manually defining SNAT private ranges.

Read More for the details.

2023 08 31

Azure – Azure Firewall: Explicit Proxy is now in public preview

Azure, Cloud Azure

This configuration allows traffic from the sending application to be directed to the private IP address of the firewall, facilitating direct egress from the firewall without the need for a UDR.

Read More for the details.

2023 08 31

Azure – Azure Firewall: Poland Central Region is now in general availability

Azure, Cloud Azure

Azure Firewall is now generally available in the Poland Central region.

Read More for the details.

2023 08 31

Azure – Azure Firewall Single-Click Upgrade and Downgrade is now in general availability

Azure, Cloud Azure

This feature allows you to upgrade your firewall without service downtime, with a single click of a button.

Read More for the details.

2023 08 31

GCP – Cloud CISO Perspectives: Late August 2023

Cloud, Google Cloud gcp

Welcome to the second Cloud CISO Perspectives for August 2023. As you read this, we’ll be kicking off the third and final day of Google Cloud Next, our annual conference where we unveil our latest advancements — especially in security. In his guest column below, my colleague Sunil Potti, vice president and general manager, Google Cloud Security, explains in more detail our vision for how AI can help achieve stronger security outcomes.

As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.

aside_block[StructValue([(u’title’, u’Board of Directors Insights Hub’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e34551d67d0>), (u’btn_text’, u’Visit the Hub’), (u’href’, u’https://cloud.google.com/solutions/security/board-of-directors’), (u’image’, <GAEImage: gcat small.jpg>)])]

Embracing the new era: Enhancing security with AI, and securing AI itself

By Sunil Potti, VP/GM, Google Cloud Security

At Google Cloud, we continue to invest in key technologies as we progress towards our true north star of invisible security: We want to make strong security pervasive and simple for everyone.

Our announcements at Next ‘23 show how we’ve come closer to that goal, as we expand our AI capabilities with Duet AI, our AI collaborator that provides generative AI-powered assistance to cloud defenders where and when they need it, in Mandiant Threat Intelligence, Chronicle Security Operations, and Security Command Center.

While we revealed many security innovations and enhancements across our security operations and cloud platforms, including managed threat hunting by Mandiant in Chronicle, agentless vulnerability scanning in Security Command Center, and more, today I want to highlight the important challenges we face as we integrate generative AI into security — and why it can change security for the better.

Google has been working for more than a decade to build AI into our products and solutions, and generative AI represents an industry shift into high gear. Tech companies are integrating generative AI with their tech stacks to enhance how their other systems function, and expand workflow capacities. A core generative AI capability is that it enables an intuitive interface with data in real-time. The way that a consumer can now prompt a foundation model to summarize news items, security teams can query their own organization’s data for insights into cyber threats.

The power of a generative AI-enabled platform can provide enterprises a path to boost their workforce, prepare for the threats that will emerge in the future, and better secure their organizations.

We are focusing on how generative AI capabilities can help us solve some of security’s thornier problems, especially threat overload, toilsome tools, and the talent gap. As we’ve described since our AI and security announcements at the RSA Conference in April, there is an urgent need for technology to simplify complex issues for those with less security expertise, to provide guidance on how best to act, and to help empower IT teams to make security decisions.

Foundation models are the heart of generative AI solutions, and so earlier this year we introduced Google Cloud Security AI Workbench: an industry-first extensible platform powered by our specialized security foundation model, Sec-PaLM 2. Fine-tuned for security use cases, Security AI Workbench empowers organizations to better address challenges with threats, toil, and talent.

Generative AI solutions have the potential to reduce the toil of repetitive tasks that plague security teams, such as aggregating and enriching data from a multitude of sources to gain a more complete understanding of risks and where to focus.

Adding Duet AI into three of our core solutions advances our ability to bring the potential of generative AI to our customers.

Duet AI in Mandiant Threat Intelligence can help security teams quickly understand what Google knows about the adversary, how the latest threats may be targeting their organization, and how to make threat intelligence actionable across an organization.

Duet AI in Chronicle Security Operations can automatically provide a clear summary of what’s happening in threat cases, give context and guidance on the most important threats, and offer recommendations for how to respond. Duet AI also powers Chronicle’s new natural language search.

Duet AI in Security Command Center can empower specialists and non-specialists alike to stay one step ahead of adversaries with near-instant analysis of security findings and possible attack paths.

And just before Next ‘23, we announced AI-powered security and digital sovereignty controls in Workspace to help enterprise and public sector organizations keep their users and data safe.

While building AI into security has been paramount for us this year, this is just one piece of the puzzle – we’ve also been working on securing AI itself. The effort to harden AI so it is more resistant to manipulation and threat actors includes developing a Security AI Framework, our first AI red team report, and helping business leaders assess AI risk.

Threat actors are not slowing down, but neither are we. The power of a generative AI-enabled platform can provide enterprises a path to boost their workforce, prepare for the threats that will emerge in the future, and better secure their organizations.

aside_block[StructValue([(u’title’, u’Hear monthly from our Cloud CISO in your inbox’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e34576c27d0>), (u’btn_text’, u’Subscribe today’), (u’href’, u’https://go.chronicle.security/cloudciso-newsletter-signup?utm_source=cgc-blog&utm_medium=blog&utm_campaign=FY23-Cloud-CISO-Perspectives-newsletter-blog-embed-CTA&utm_content=-&utm_term=-‘), (u’image’, <GAEImage: gcat small.jpg>)])]

In case you missed it

Here are the latest updates, products, services, and resources from our security teams so far this month:

Security announcements from Google Cloud Next: Our announcements at this year’s Next are the result of envisioning a new, improved security state, and working hard to achieve it. We introduced Duet AI in Mandiant Threat Intelligence, Chronicle Security Operations, and Security Command Center, we announced managed threat hunting powered by Mandiant in Chronicle, and revealed vital updates to our security solutions. You can check out the scheduled security sessions, and there’s still time to virtually participate in the final day of the conference.

Introducing AI-powered updates for Google Workspace: Just before Next, we unveiled new Zero Trust, digital sovereignty, and threat defense controls powered by Google AI to help organizations keep their data safe. Read more.

What to think about when you’re thinking about securing AI: From a new Google Cloud report, explore how securing AI systems is similar to securing traditional enterprise systems — and how it’s different. Read more.

News from Mandiant

Threat actors are interested in generative AI, but so far use remains limited: Since at least 2019, Mandiant has tracked threat actor interest in, and use of, AI capabilities to facilitate malicious activity. Based on our own observations and open source accounts, adoption of AI in intrusion operations remains limited and primarily related to social engineering. Read more.

How UNC4841 appears to have adapted to Barracuda ESG zero-day remediation: Mandiant researchers detail additional tactics, techniques, and procedures employed by threat actor group UNC4841 that have since been uncovered through our incident response engagements, as well as through collaborative efforts with Barracuda Networks and our international government partners. Read more.

AI and the 5 phases of the threat intelligence lifecycle: AI can help threat intelligence teams to detect and understand novel threats at scale, reduce burnout-inducing toil, and grow their existing talent by democratizing access to subject matter expertise. At Mandiant, we take a more nuanced approach. Read more.

How new SEC regulations can help organizations prepare for a cyber incident: When a cyber incident occurs, organizations need to be ready and able to respond quickly. A new SEC rule puts a premium on having a comprehensive response plan because it can potentially change how forensics, legal, and communications teams work during an incident, and organizations should take steps now to ensure they are ready. Read more.

Defender’s Advantage: Attacks at the edge and securing AI: In the latest issue of The Defender’s Advantage Cyber Snapshot, we pair a deep dive on edge device attacks with guidance on cybersecurity crisis communications, and address how to ensure security is built into AI systems. Read more.

Now hear this: Google Cloud Security and Mandiant podcasts

Next ‘23: How Google Cloud builds AI-powered security tools: The rapid rise this year of artificial intelligence and generative AI presents a rare opportunity to rethink how we approach security. It’s also a cause for concern, from securing AI itself to learning to trust the technology. Hosts Anton Chuvakin and Tim Peacock talk with Eric Doerr, VP of Engineering, Google Cloud Security, about AI, security, and what he plans to talk about at his Next presentation. Listen here.

AI and security: The good, the bad, and the… magical: Is AI a game-changer in cybersecurity? Can cybersecurity even be affected by game-changers? And what AI and security fears keep a CISO up at night? Anton and Tim venture into the heart of the future of cybersecurity with Google Cloud CISO Phil Venables. Listen here.

To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in two weeks with more security-related updates from Google Cloud.

Read More for the details.

2023 08 31

GCP – How to scale AI training to up to tens of thousands of Cloud TPU chips with Multislice

Cloud, Google Cloud gcp

The largest generative AI models are expected to surpass hundreds of billions of parameters and use trillions of training tokens. These models will require tens of EFLOPs (1018 FLOPs) of AI supercomputing to maintain training times of several weeks or less. Achieving that performance will require tens of thousands of accelerators working efficiently together. But most scaling solutions require sophisticated hand-coding and manual tuning, which result in brittle solutions and sublinear scaling performance.

To address this challenge, this week at Google Cloud Next we announced Multislice, a full-stack large-scale training technology that enables easy, cost-effective and near-linear scaling up to tens of thousands of Cloud Tensor Processing Units (TPUs) chips. Historically, a training run could only use a single slice, a reservable collection of chips connected via inter-chip interconnect (ICI). This meant that a run could use no more than 3072 TPU v4 chips, which is currently the largest slice in the largest Cloud TPU system. With Multislice, a training run can scale beyond a single slice and use multiple slices over a number of pods by communicating over data center networking (DCN).

A slice in one pod communicates with a slice in another pod using Multislice

In summary, Multislice can offer the following benefits:

Train massive models with near-linear scaling performance from single slice to multiple slices with up to tens of thousands of chipsEnhance developer productivity with simple setup using few code changesSave time by leveraging automatic compiler optimizationsMaximize cost-efficiencies via TPU v5e’s up to 2x higher performance-to-dollar for training LLMs versus TPU v4Access up to a 2x to 24x higher peak FLOPs budget versus systems with 8-chip ICI domains with the TPU v5e and TPU v4 respectively

“Google Cloud’s next generation of AI infrastructure including Multislice Cloud TPU v5e will bring incredible price performance benefits for our workloads and we look forward to building the next wave of AI on Google Cloud.”—Tom Brown, Co-Founder, Anthropic

How does Multislice work?

When deployed in Multislice configurations, TPU chips within each slice communicate through high-speed ICI. And, TPU chips across different slices communicate by transferring data over Google Cloud’s Jupiter data center network. For example, when using data parallelism, activations continue to be communicated over ICI, the same as in single slice operation, while gradients are reduced over DCN.

Data parallelism with Multislice

Multislice supports a variety of parallelism techniques to use multiple slices within a single pod or slices in multiple pods in the same training run. Simple data parallelism (DP) should suffice for most models, especially dense decoder or diffusion models, but for larger-scale models or data sizes, Multislice also supports Fully Sharded Data Parallelism (FSDP), model, and pipeline parallelism. Advanced optimizations and sharding strategies are discussed in more detail in the Multislice user guide.

Developers do not write code to implement inter-slice DCN communication. The XLA compiler generates that code and automatically overlaps communication with computation for maximum performance.

Achieve 2x or much higher system scale

Software scaling is limited by hardware. An accelerator system can only scale up to its FLOPs budget. Beyond that, performance is limited by communication bottlenecks. For scaling to be effective, the hardware system itself must support scaling. The Cloud TPU system’s large ICI domains help it achieve as high as 24x higher maximum FLOPs than traditional systems with 8-chip ICI domains, without hitting communication bottlenecks.

To understand the math better, start with the ICI domain. The largest number of chips connected by high-speed ICI is called the system’s ICI domain. The ICI domain for a TPU v4 pod is 3,072 chips and for a TPU v5e pod is 256 chips.

Let’s define P as the peak FLOPs a system can performantly scale to. A dense LLM using DP and FSDP needs a minimum batch size per ICI domain that’s approximately equal to the DCN arithmetic intensity. For these models, the DCN arithmetic intensity approximates to a ratio of per-chip FLOPs to DCN bandwidth per chip. Then, P becomes:

This shows that the maximum scale of a system, in total peak FLOPs, is impacted by the DCN bandwidth available per chip, the size of the ICI domain, and the global batch size.

To see the impact of ICI domain size on the maximum scale a system can enable without communication bottlenecks, let’s take a look at examples of different systems where a model is trained with a 32M global batch size:

As a baseline, assume that a system with an 8-chip ICI domain communicates across DCN at a rate of 400 Gbps/chip using traditional high-speed DCN technology:

Here, the highest total peak FLOPs this system can enable without performance degradation is P8 = 12.8 EFLOPs.

TPU pods have larger ICI domains that increase the arithmetic intensity per byte transferred over DCN. For the TPU v5e pods, with 256 chips per ICI domain and 25 Gbps/chip, P256 = 25.6 EFLOPs.

And, for TPU v4 pods, with 3072 chips per ICI domain and 25 Gbps/chip, P3072 = 307.2 EFLOPs.

The TPU v5e system and the TPU v4 system enable up to 2x or 24x higher scale respectively compared to systems with 8-chip ICI domains and 400 Gbps of DCN bandwidth per chip.

“Google DeepMind and Research have had several successful training runs each using many thousands of TPU v5e chips including models for LLM use cases with excellent scaling efficiency — similar to TPU v4 generation — using Multislice scaling software.”—Jeff Dean, Chief Scientist, Google

Near-linear scaling up to tens of thousands of chips for TPU v5e

To compare Multislice performance to single-slice performance, we can use the metric Model FLOPs Utilization (MFU), defined as the ratio of the observed throughput (tokens-per-second) relative to the theoretical maximum throughput of a system operating at peak FLOPs.

A TPU training run that uses multiple slices with Multislice offers the same MFU rates as one using a single slice because of compiler optimizations. Some examples are:

Gradient reduction for FSDP that’s overlapped with backward propagation

Special hierarchical collectives that decompose traditional collectives based on communication topologies

In addition, unlike other accelerators, TPU chips can sustain their peak FLOPS without throttling, therefore achieving higher FLOPs and higher MFU rates.

Multislice can be used to train generative AI models with hundreds of billions of parameters with runs showing as high as 58.9% MFU on multibillion parameter models on TPU v4.

Weak scaling where the batch size was increased as more pods were used exhibits near-linear performance on GPT-3 175B trained on TPU v5e1:

Multislice exhibits near-linear scaling on v5e

Multislice on Cloud TPUs is easy to use

Managing communication hierarchies for scaling technologies can be complex and reduce developer productivity. Multislice simplifies scaling with a throughput-optimized software stack that works well out-of-the-box. Users can focus on training AI models rather than managing complexity, which offers a good developer experience even at large scales. Integrations with familiar profiling and orchestration tools as those used for single-slice jobs reduce setup time further.

Multislice has full-stack optimized network, runtime, scheduler, frameworks, and orchestration

Using GSPMD, going from 2 to 2,000 slices is a simple matter of switching between tensor, data, and FSDP parallelisms by manipulating sharding axes. Consequently, we adapted the runtime and the rest of the infrastructure for Multislice workloads, and we introduced a new sharding dimension over DCN for JAX and PyTorch.

Multislice introduces a new sharding dimension over DCN

We use the GSPMD terminology, mesh, defined as a logical multi-dimensional organization of distributed devices that can be configured to assign model matrix dimensions across devices:

code_block[StructValue([(u’code’, u’dcn_parallelism = [config.dcn_data_parallelism, config.dcn_fsdp_parallelism, config.dcn_tensor_parallelism]rnici_parallelism = [config.ici_data_parallelism, config.ici_fsdp_parallelism, config.ici_tensor_parallelism]rnrnif multi_slice_env:rn mesh = mesh_utils.create_hybrid_device_mesh(ici_parallelism, dcn_parallelism)rnelse:rn mesh = mesh_utils.create_device_mesh(ici_parallelism)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e85d1810910>)])]

The table below shows an example for the configuration options for two slices with simple data parallelism with the DCN mapped to the data parallel axes:

Computing values for these parallelism types is easy. Simply ensure that the products of the dcn and ici parallelism variables multiply to the total number of chips.

Additionally, because the XLA compiler understands the underlying hybrid DCN/ICI network topology it can automatically insert the appropriate hierarchical collectives and even convert single-slice operations into multi-slice hierarchical collectives to improve compute-communication overlap.

For example, take an all-reduce operation. The compiler automatically decomposes this into a three-step hierarchical collective:

The XLA compiler automatically decomposes an all-reduce into hierarchical collectives

“Multislice training has been a game-changer. It’s made it easy to scale our ML workloads beyond a single densely-interconnected slice using data-center networking. JAX XLA made it easy to set up and delivered high performance out-of-the-box.”—Myle Ott, Co-founder, Character AI

Multislice supports JAX and PyTorch frameworks. For fast out-of-the box performance, in addition to compiler support for all models, we provide MaxText and PAX for LLMs, as open-sourced and well-tested examples written in pure Python and JAX that can be used as starter code. PAX is a framework for training large-scale models that allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry-leading MFU rates. MaxText is a more minimal framework intended for forking and adaptation. The only code change compared to single-slice code is the extra sharding dimension for DCN parallelism.

High performance networking

Multislice supports AllReduce, Broadcast, Reduce, AllGather and ReduceScatter collective communication operations over Google’s Jupiter data center network. As reported in August 2022, Jupiter reduces flow completion by 10%, improves throughput by 30%, uses 40% less power, incurs 30% less capex costs, and delivers 50x less downtime than previous generations of the Google data center network.3

Easy to manage

There are two options to manage the Multislice job: using Compute Engine Queued Resource CLIs and APIs or through Google Kubernetes Engine (GKE).

Special options allow for one-step deletion and creation of the collection of slices. And, fast recovery means jobs are restarted quickly even when individual slices are interrupted.

Reliable and fault tolerant

Your model training jobs restart automatically from the previous checkpoint even if individual slices fail. Using Multislice with GKE further improves the failure recovery experience — a single field-change in the yaml file implements automatic retry on encountering errors.

“Google Cloud’s TPU Multislice provided significant productivity and efficiency gains for us right out-of-the-box, enabling us to scale our language model training reliably. We recommend Multislice to anyone building large generative language AI models.”—Emad Mostaque, CEO, Stability AI

Get started

Multislice was designed to enable efficient large-scale AI model training. To scale AI workloads, hardware and software must work in concert. We have kept AI development productivity top of mind and are excited for you to try Multislice in preview on both Cloud TPU v4 as well as on the newly-announced Cloud TPU v5e.

Please contact your Google Cloud account representative to learn more and try Cloud TPU with Multislice using PAX and MaxText.

1. Google internal data as of August, 2023
2. Google internal data as of August, 2023
3. Google internal data as of August, 2023

Read More for the details.