AWS – Amazon SageMaker AI now supports P6e-GB200 UltraServers
Today, Amazon SageMaker AI announces support for P6e-GB200 UltraServers in SageMaker HyperPod and Training Jobs. With P6e-GB200 UltraServers, you can leverage up to 72 NVIDIA Blackwell GPUs under one NVLink domain to accelerate training and deployment of foundational models at trillion-parameter scale. P6e-GB200 UltraServers are available in two sizes: ml.u-p6e-gb200x72 (72 GPUs within NVLink) and ml.u-p6e-gb200x36 (36 GPUs within NVLink).
P6e-GB200 UltraServers deliver over 20x compute and over 11x memory under one NVIDIA NVLink compared to P5en instances. Within each NVLink domain you can leverage 360 petaflops of FP8 compute (without sparsity) and 13.4 TB of total high bandwidth memory (HBM3e). When you use P6e-GB200 UltraServers on SageMaker AI, you get the GB200’s superior performance combined with SageMaker’s managed infrastructure such as security, built-in fault tolerance, topology aware scheduling (SageMaker HyperPod EKS & Slurm), integrated monitoring capabilities, and native integration with other SageMaker AI and AWS services.
The UltraServers are available through SageMaker Flexible Training Plans in the Dallas Local Zone (“us-east-1-dfw-2a”), an extension of the US East (N. Virginia) AWS Region. For on-demand reservation of GB200 UltraServers, please reach out to your account manager. Amazon SageMaker AI lets you easily train and deploy machine learning models at scale using fully managed infrastructure optimized for performance and cost. To get started with UltraServers on SageMaker AI, visit the documentation.
Read More for the details.