AWS – Amazon SageMaker AI announces availability of P5e and G6e instances for Inference
We are pleased to announce general availability of inference optimized G6e instances (powered by NVIDIA L40S Tensor Core GPUs) and P5e (powered by NVIDIA H200 Tensor Core GPUs) on Amazon SageMaker.
With 1128 GB of high bandwidth GPU memory across 8 NVIDIA H200 GPUs, 30 TB of local NVMe SSD storage, 192 vCPUs, and 2 TiB of system memory, ml.p5e.48xlarge instances can deliver exceptional performance for compute-intensive AI inference workloads such as large language model with 100B+ parameters, multi-modal foundation models, synthetic data generation, and complex generative AI applications including question answering, code generation, video, and image generation.
Powered by 8 NVIDIA L40s Tensor Core GPUs with 48 GB of memory per GPU and third generation AMD EPYC processors ml.g6e instances can deliver up to 2.5x better performance compared to ml.g5 instances. Customers can use ml.g6e instances to run AI Inference for large language models (LLMs) with up to 13B parameters and diffusion models for generating images, video, and audio.
The ml.p5e and ml.g6e instances are now available for use on SageMaker in US East (Ohio) and US West (Oregon). To get started, simply request a limit increase through AWS Service Quotas. For pricing information on these instances, please visit our pricing page. For more information on deploying models with SageMaker, see the overview here and the documentation here. To learn more about these instances in general, please visit the P5e and G6e product pages.
Read More for the details.