AWS – Amazon SageMaker HyperPod now provides a new cluster setup experience
SageMaker HyperPod now provides a new cluster creation experience that sets up all the resources needed for large-scale AI/ML workloads—including networking, storage, compute, and IAM permissions in just a few clicks. SageMaker HyperPod clusters are purpose-built for scalability and resilience, designed to accelerate large-scale distributed training and deployment of complex machine learning models like LLMs and diffusion models, as well as customization of Amazon Nova foundation models.
The new cluster creation experience for SageMaker HyperPod introduces both quick and custom setup paths that make it easier for both beginners and advanced AWS customers to get started. Previously, customers needed to manually configure networking, IAM roles, storage, and compute. With the new quick setup, model builders, who may not have AWS infrastructure expertise, can now launch a fully-operational cluster optimized for large-scale AI workloads in just a few clicks using a streamlined single-page interface that provisions all dependencies including VPCs, subnets, FSx storage, EKS/Slurm orchestrator, and essential (k8s) operators. For platform engineering teams who may want to modify the default settings, the custom setup path provides full control over every configuration—from specific subnet configurations to selective operator installations—from within the same console experience. Teams can also export an auto-generated CloudFormation template for repeatable production deployments.
You can create clusters using either the AWS Console or CloudFormation in all AWS Regions where SageMaker HyperPod is supported. To learn more, see the user guide.
Read More for the details.