AWS – AWS Neuron introduces NxD Inference GA, new features, and improved tools
Today, AWS announces the release of Neuron 2.23, featuring enhancements across inference, training capabilities, and developer tools. This release moves the NxD Inference library (NxDI) to general availability (GA), introduces new training capabilities including Context Parallelism and ORPO, and adds support for PyTorch 2.6 and JAX 0.5.3.
The NxD Inference library moves from beta to general availability, now recommended for all multi-chip inference use-cases. Key enhancements include Persistent Cache support to reduce compilation times and optimized model loading time.
For training workloads, the NxD Training library introduces Context Parallelism support (beta) for Llama models, enabling sequence lengths up to 32K. The release adds support for model alignment using ORPO with DPO-style datasets, upgraded support for 3rd party libraries, specifically: PyTorch Lightning 2.5, Transformers 4.48, and NeMo 2.1.
The Neuron Kernel Interface (NKI) introduces new 32-bit integer operations, improved ISA features for Trainium2, and new performance tuning APIs. The Neuron Profiler now offers 5x faster profile result viewing, timeline-based error tracking, and improved multiprocess visualization with Perfetto.
AWS Neuron SDK supports training and deploying models on Trn1, Trn2, and Inf2 instances, available in AWS Regions as On-Demand Instances, Reserved Instances, Spot Instances, or part of Savings Plan.
For a full list of new features and enhancements in Neuron 2.23 and to get started with Neuron, see:
Read More for the details.