AWS – AWS Neuron introduces speculative decoding and vLLM support
Today, AWS announces the release of Neuron 2.18, introducing stable support (out of beta) for PyTorch 2.1, adding continuous batching with vLLM support, and adding support for speculative decoding with Llama-2-70B sample in Transformers NeuronX library.
Read More for the details.