AWS – Amazon Bedrock now supports Batch inference for Anthropic Claude Sonnet 4 and OpenAI GPT-OSS models
Anthropic’s Claude Sonnet 4 and OpenAI’s GPT-OSS 120B and 20B models are now available for Batch inference in Amazon Bedrock. With Batch inference, you can run multiple inference requests asynchronously, improving performance on large datasets at 50% of the on-demand inference pricing. Amazon Bedrock offers select foundation models (FMs) from leading AI providers such as Anthropic, OpenAI, Meta, and Amazon for batch inference, making it easier and more cost-effective to process high-volume workloads.
With Batch inference on Claude Sonnet 4 and OpenAI GPT-OSS models, you can process large datasets for scenarios such as document and customer feedback analysis, bulk content generation (e.g., marketing copy, product descriptions), large-scale prompt or output evaluations, automated summarization of knowledge bases and archives, mass categorization of support tickets or emails, and extraction of structured data from unstructured text—at scale and with lower costs. We’ve optimized our Batch offering to deliver higher overall batch throughput on these newer models compared to previous ones. In addition, you can now track your Batch workload progress at the AWS account level with Amazon CloudWatch metrics. For all models, these metrics include total pending records, processed records and tokens per minute, and for Claude models, they also include tokens pending processing.
To learn more about Batch inference in Amazon Bedrock, visit the Batch inference documentation. You can visit Supported Regions and models for batch inference page for more details on supported models and follow Amazon Bedrock API reference to get started with Batch inference.
Read More for the details.