AWS – Amazon Bedrock Model Evaluation LLM-as-a-judge is now generally available
Amazon Bedrock Model Evaluation’s LLM-as-a-judge capability is now generally available. Amazon Bedrock Model Evaluation allows you to evaluate, compare, and select the right models for your use case. You can choose an LLM as your judge from several available on Bedrock to ensure you have the right combination of evaluator models and models being evaluated. You can select quality metrics such as correctness, completeness, and professional style and tone, as well as responsible AI metrics such as harmfulness and answer refusal. You can evaluate all available models on Amazon Bedrock, including serverless models, Bedrock Marketplace models compatible with Converse API, customized and distilled models, imported models, and model routers. You can also compare results across evaluation jobs.
*Brand new – more flexibility!* Today, you can evaluate any model or system hosted anywhere by bringing your own inference responses you already fetched into your input prompt dataset for the evaluation job (“bring your own inference responses“). These responses can be from an Amazon Bedrock model or from any model or application hosted outside of Amazon Bedrock, enabling you to bypass calling an Amazon Bedrock model in the evaluation job, and allowing you to incorporate all the intermediate steps of your application into your final responses.
With LLM-as-a-judge, you can get human-like evaluation quality at lower cost, while saving weeks of time.
To learn more, visit the Amazon Bedrock Evaluations page and documentation. To get started, sign in to the AWS Console or use Amazon Bedrock APIs.
Read More for the details.