GCP – Gemini 2.5 brings enhanced reasoning to enterprise use cases
We recently announced Gemini 2.5, our most intelligent AI model yet. Gemini 2.5 models are now thinking models, capable of reasoning before responding, resulting in dramatically improved performance. This transparent step-by-step reasoning is crucial for enterprise trust and compliance.
Our first model in this family, Gemini 2.5 Pro, available in public preview on Vertex AI, is now among the world’s best models for coding and tasks requiring advanced reasoning. It has state-of-the-art performance on a wide range of benchmarks, is recognized by many users as the most enterprise-ready reasoning model, and is at the top of the LM arena leaderboard by a significant margin.
Building on this momentum, we are launching Gemini 2.5 Flash, our workhorse model with low latency and cost efficiency on Vertex AI, our comprehensive platform for building and managing AI applications and agents, and Google AI Studio.
Let’s dive into how these capabilities are transforming AI development on Google Cloud.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee15d25fe80>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Advancing enterprise problem-solving with deep reasoning
Enterprises face challenges that require intricate information landscapes, multi-step analyses, and making nuanced decisions – tasks demanding that AI doesn’t just process, but also reasons. For these situations, we offer Gemini 2.5 Pro on Vertex AI, engineered for maximum quality and tackling the most complex tasks demanding deep reasoning and coding expertise. Coupled with a one million token context window, Gemini 2.5 Pro performs deep data analysis, extracts key insights from dense documents like legal contracts or medical records, and handles complex coding tasks by comprehending entire codebases.
“At Box, we’re redefining how enterprises apply intelligence to their content. With Box AI extract agents, powered by Gemini, users can instantly streamline tasks by making unstructured data actionable, as seen in millions of extractions supporting a variety of use cases, including procurement and reporting. Gemini 2.5 represents a leap forward in advanced reasoning, enabling us to envision building more powerful agent systems where extracted insights automatically trigger downstream actions and coordinate across multiple steps. This evolution pushes the boundaries of automation, allowing businesses to unlock and act upon their most valuable information with even greater impact and efficiency.” — Yashodha Bhavnani, VP of AI Product Management, Box
“Moody’s leverages Gemini’s advanced reasoning capabilities on Vertex AI within a model-agnostic framework. Our current production system uses Gemini 2.0 Flash for intelligent filtering and Gemini 1.5 Pro for high-precision extraction, achieving over 95% accuracy and an 80% reduction in processing time for complex PDFs. Building on this success, we are now in the early stages of testing Gemini 2.5 Pro. Its potential for deeper, structured reasoning across extensive document sets, thanks to features like its large context window, looks very promising for tackling even more complex data challenges and enhancing our data coverage further. While it’s not in production, the initial results are very encouraging.” — Wade Moss, Sr. Director, AI Data Solutions, Moody’s
To tailor Gemini for specific needs, businesses can soon leverage Vertex AI features like supervised tuning (for unique data specialization) and context caching (for efficient long context processing), enhancing performance and reducing costs. Both these features are launching in the coming weeks for Gemini 2.5 models.
Building responsive and efficient AI applications at scale
While Gemini 2.5 Pro targets peak quality for complex challenges, many enterprise applications prioritize speed, low latency, and cost-efficiency. To meet this need, we will soon offer Gemini 2.5 Flash on Vertex AI. This workhorse model is optimized specifically for low latency and reduced cost, delivering impressive and well-balanced quality for high-volume scenarios like customer service or real-time information processing. It’s the ideal engine for responsive virtual assistants and real-time summarization tools where efficiency at scale is key.
Gemini 2.5 Flash will also feature dynamic and controllable reasoning. The model automatically adjusts processing time (‘thinking budget’) based on query complexity, enabling faster answers for simple requests. You also gain granular control over this budget, allowing explicit tuning of the speed, accuracy, and cost balance for your specific needs. This flexibility is key to optimizing Flash performance in high-volume, cost-sensitive applications.
“Gemini 2.5 Flash’s enhanced reasoning ability, including its insightful responses, holds immense potential for Palo Alto Networks, including detection of future AI-powered threats and more effective customer support across our AI portfolio. We are focused on evaluating the latest model’s impact on AI-assistant performance, including its summaries and responses, with the intention of migrating to this model to unlock its advanced capabilities.” — Rajesh Bhagwat, VP of Engineering, Palo Alto Networks
Optimizing your experience on Vertex AI
Choosing between powerful models like Gemini 2.5 Pro and 2.5 Flash depends on your specific needs. To make it easier, we’re introducing Vertex AI Model Optimizer in experimental, to automatically generate the highest quality response for each prompt based on your desired balance of quality and cost. For customers who have workloads that do not require processing in a specific location, our Vertex AI Global Endpoint provides capacity-aware routing for our Gemini models across multiple regions, maintaining application responsiveness even during peak traffic or regional service fluctuations.
Powering the future with sophisticated agents and multi-agent ecosystems
Gemini 2.5 Pro’s advanced multimodal reasoning enables sophisticated, real-world agent workflows. It interprets visual context (maps, flowcharts), integrates text understanding, performs grounded actions like web searches, and synthesizes diverse information – allowing agents to interact meaningfully with complex inputs.
Building on this potential, today we are also announcing a number of innovations in Vertex AI to enable multi-agent ecosystems. One key innovation supporting dynamic, real-time interactions is the Live API for Gemini models. This API allows agents to process streaming audio, video, and text with low latency, enabling human-like conversations, participation in live meetings, or monitoring real-time situations (such as understanding spoken instructions mid-task).
Key Live API features further enhance these interactions: support for long, resumable sessions (greater than 30 minutes), multilingual audio output, time-stamped transcripts for analysis, dynamic instruction updates within sessions, and powerful tool integrations (search, code execution, function calling). These advancements pave the way for leveraging models like Gemini 2.5 Pro in highly interactive applications.
Get started
Ready to tackle complex problems, build efficient applications, and create sophisticated AI agents? Try Gemini 2.5 on Vertex AI now!
Read More for the details.