AWS – AWS Neuron introduces speculative decoding
Today, AWS announces the release of Neuron 2.18, introducing stable support (out of beta) for PyTorch 2.1 and adding support for speculative decoding with Llama-2-70B sample in Transformers NeuronX library.
Read More for the details.