AWS – Amazon Bedrock announces general availability of prompt caching
At re:Invent 2024, AWS announced the preview of prompt caching, a new capability that can reduce costs by up to 90% and latency by up to 85% by caching frequently used prompts across multiple API calls. Today, AWS is launching prompt caching in generally availability on Amazon Bedrock.
Prompt caching allows you to cache repetitive inputs and avoid reprocessing context such as long system prompts and common examples that help guide the model’s response. When you use prompt caching, fewer computing resources are needed to process your inputs. As a result, not only can we process your request faster, but we can also pass along the cost savings.
Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies via a single API. Amazon Bedrock also provides a broad set of capabilities customers need to build generative AI applications with security, privacy, and responsible AI capabilities built in. These capabilities help you build tailored applications for multiple use cases across different industries, helping organizations unlock sustained growth from generative AI while providing tools to build customer trust and data governance.
Prompt caching is now generally available for Anthropic’s Claude 3.5 Haiku and Claude 3.7 Sonnet, Nova Micro, Nova Lite, and Nova Pro models. Customers who were given access to Claude 3.5 Sonnet v2 during the prompt caching preview will retain their access, however no additional customers will be granted access to prompt caching on the Claude 3.5 Sonnet v2 model. For regional availability or to learn more about prompt caching, please see our documentation and blog.
Read More for the details.