AWS – EFA Now Supports NVIDIA GPUDirect RDMA
We are excited to announce that Elastic Fabric Adapter (EFA) now supports NVIDIA GPUDirect Remote Direct Memory Access (RDMA). GPUDirect RDMA support on EFA will be available on Amazon Elastic Compute Cloud (Amazon EC2) P4d instances– the next generation of GPU-based instances on AWS. P4d provides the highest performance for machine learning (ML) training and high performance computing (HPC) in the cloud for applications such a natural language processing, object detection and classification, seismic analysis, and computational drug discovery. GPUDirect RDMA support on EFA enables network interface cards (NICs) to directly access GPU memory. This avoids extra memory copies, making remote GPU-to-GPU communication across NVIDIA GPU-based Amazon EC2 instances faster, and reduces orchestration overhead on CPUs and user applications. As a result, our customers running applications using NVIDIA Collective Communications Library (NCCL) on P4d will be able to further accelerate their multi-node tightly-coupled workloads.
Read More for the details.