Cloud

2025 10 07

GCP – Announcing quantum-safe Key Encapsulation Mechanisms in Cloud KMS

Quantum computing presents a new frontier for technology, and a new set of security challenges, too. A sufficiently powerful quantum computer could break the public-key cryptography systems we rely on today, posing a significant risk to individuals and organizations.

Although that threat might be years away, building appropriate defenses can equally take years to build. Following our recent announcement of quantum-safe digital signatures in Google Cloud Key Management Service (Cloud KMS), we are excited to expand our quantum-safe capabilities. We are now announcing support for post-quantum Key Encapsulation Mechanisms (KEMs) in Cloud KMS in preview, enabling customers to begin their migration to a post-quantum world.

This new capability provides a critical building block to start protecting your applications from Harvest Now, Decrypt Later attacks. Adversaries can capture and store encrypted data today with the intention of decrypting it years from now, once a cryptographically-relevant quantum computer (CRQC) is available. This makes it crucial to protect sensitive data requiring long-term confidentiality, even if the quantum threat seems distant.

The challenge: Migrating to a post-quantum world

While the bulk of data is protected using symmetric encryption, asymmetric encryption is the key to exchanging those symmetric keys securely. It’s this key exchange process that’s fundamentally changing with post-quantum cryptography.

Migrating from classical asymmetric encryption to post-quantum KEMs is more than a simple algorithm swap; it requires a shift in architectural thinking. Developers will face two primary challenges: adapting to a new cryptographic paradigm, and managing new performance characteristics.

The most significant hurdle is that a KEM is not a direct replacement for traditional asymmetric encryption. With classical algorithms such as RSA, a developer often encrypts data they already have — for example, a symmetric session key they’ve generated.

A KEM completely inverts this model: the sender does not choose the shared secret. Instead, the secret is a fresh, random value generated as an output of the KEM’s encapsulation process itself.

This architectural shift means that you cannot simply replace an Encrypt() function call. To handle this complexity securely, developers should adopt a high-level standard like Hybrid Public Key Encryption (HPKE), defined in RFC 9180. HPKE is an extensible standard that can be used with different KEMs, and is available in libraries such as Tink to simplify the integration of post-quantum algorithms.

Second, developers will have to manage new performance characteristics. While the computational speed of post-quantum operations is often comparable to their classical counterparts, the size of public keys and ciphertexts is substantially larger.

This is not a trivial increase. We are often talking about an order of magnitude difference. For example, a standard ML-KEM-768 key is about 18 times larger than a P-256 key.

This size increase has direct implications for application performance, impacting bandwidth, storage, and memory usage. A successful migration requires architects to account for these larger payloads, particularly in resource-constrained environments like IoT devices.

A final challenge is the novelty of these algorithms. While the new standards have undergone intense scrutiny, they have not yet endured the decades of real-world analysis that classical algorithms have.

For this reason, for most general purpose applications, we strongly recommend using a hybrid approach. A hybrid deployment hedges against risks in the new PQC algorithms by combining a classical and a post-quantum algorithm. This ensures that any unexpected flaw in a new post-quantum algorithm, or its implementation, does not introduce a new vulnerability. It can help provide a safe and gradual transition to a post-quantum world.

Getting started with KEMs in Cloud KMS

To provide a flexible and secure path to quantum resistance, Cloud KMS now offers several KEM algorithms.

ML-KEM-768 and ML-KEM-1024: These are implementations of the Module-Lattice-based Key-Encapsulation Mechanism standard, published by NIST as FIPS 203. For customers who must comply with standards like CNSA 2.0, ML-KEM-1024 offers a higher security level, though at the cost of larger keys (1568 as compared to 1184 bytes) and ciphertexts (1568 versus 1088 bytes) which can impact performance.
X-Wing: For most general purpose applications, we strongly recommend using a hybrid approach. The X-Wing KEM, which combines the classical X25519 algorithm with the post-quantum ML-KEM-768 algorithm, provides layered defense against both classical and potential quantum adversaries.

Integrating these new quantum-safe KEMs into your workflows is designed to be a straightforward process using the Cloud KMS API. You can find detailed instructions and code samples in the official Cloud KMS documentation. Our underlying implementations will be available as part of Google’s open-source cryptographic libraries, BoringCrypto and Tink, to ensure full transparency.

The post-quantum road ahead

The migration to post-quantum cryptography is a marathon, not a sprint. This preview of post-quantum KEMs in Cloud KMS is a critical building block that can enable you to start preparing your applications for a quantum-safe future.

To make this transition easier, our open-source library Tink will provide support for Hybrid Public Key Encryption (HPKE) offering a more user-friendly way to integrate these KEMs in languages such as Java, C++, Golang, and Python. For some of these languages, this will be ready by the end of this year.

In the meantime, direct use of the KEMs in Cloud KMS is available for developers with deep cryptographic expertise. To provide more protection against Harvest Now, Decrypt Later attacks, Google Cloud is upgrading its infrastructure to use post-quantum cryptography for connections, with this capability expected to be fully available in 2026.

Our work in this space is ongoing. We are committed to keeping pace with developments in post-quantum cryptography, including future standards from NIST, and we plan to adapt to any changes in the quantum cryptanalytic landscape. While the industry continues to develop standards for hybrid digital signatures, we are actively monitoring progress and may offer support for such schemes as consensus emerges.

We encourage you to explore these new capabilities in Cloud KMS and begin the process of making your applications quantum-safe. We welcome your feedback and are eager to collaborate with you on your specific cryptographic needs.

Read More for the details.

2025 10 07

AWS – AWS Marketplace now supports new currencies for usage-based private offers

Tibor Kiss AWS, Cloud AWS

AWS Marketplace now supports usage-based private offers in four new currencies: EUR, GBP, AUD, and JPY. AWS Marketplace Sellers and Channel Partners can now reach buyers globally without currency conversion complexity or foreign exchange risk by pricing their offers in these new currencies. For sellers, this means faster deal cycles, simplified cash flow management in local currency, and the ability to close larger deals with confidence. For AWS Marketplace buyers, software and services can now be procured in their preferred currency, eliminating foreign exchange risk in invoice amounts, and streamlining the procurement process for private offers.

Sellers can now create private offers in EUR, GBP, AUD, and JPY, and receive their disbursements in the offer currency, regardless of pricing types, including contract with consumption-based pricing and usage-based (pay-as-you-go) pricing, in addition to contract-only pricing. For Channel Partner Private Offers (CPPO), the seller, channel partner, and buyer must all transact in the same currency. Sellers need to issue a resale authorization in the negotiated currency, and the channel partner then creates the CPPO in that currency.

This functionality is available worldwide for all AWS Marketplace Sellers for all private offers. Public offers remain in USD only.

AWS Marketplace Sellers can choose to get disbursed to a bank account in one of the eligible jurisdictions by updating your banking information and set your currency preferences in the AWS Marketplace Management Portal Settings page. To learn more, visit the documentation on local currency offers and disbursements.

Read More for the details.

2025 10 07

AWS – AWS Marketplace expands Japan consumption tax support for Channel Partner Private Offers

Tibor Kiss AWS, Cloud AWS

Starting today, AWS Marketplace expands its Japan consumption tax (JCT) support for Channel Partner Private Offers (CPPOs), enhancing the tax experience for Independent Software Vendors (ISVs) and Channel Partners. For transactions where Japan ISVs authorize Japan Channel Partners to resell to Japan addressed buyers, AWS Japan G.K. (“AWS Japan”) will now collect the 10% JCT for the first leg of the transaction between ISVs and Channel Partners, issue a tax qualified invoice (TQI) to the Channel Partners and disburse the JCT to ISVs.

AWS Japan will continue to collect the 10% JCT for the second leg of the transaction between Japan Channel Partners and Japan buyers and issue a TQI to the buyers, as previously established under the Japan Marketplace Facilitator rule. This launch unifies the compliance across both transactions, creating a seamless tax experience.

This feature is applicable for Japan ISVs and Japan Channel Partners when transacting via the AWS Japan Marketplace Operator. To learn more, please visit the AWS Japan FAQ or AWS Marketplace Seller Guide.

Read More for the details.

2025 10 07

GCP – 150 of the latest AI use cases from leading startups and digital natives

Tibor Kiss Cloud, Google Cloud gcp

We recently hosted our first-ever AI Builders Forum, where we gathered with hundreds of the top founders, VCs, advisors, researchers, and teams powering the startups who are building the future with AI. And now, this week, many of us are together again in San Francisco for Tech Week.

A big reason for us to gather with our peers and the next generation of groundbreaking companies is that at Google Cloud, we’ve always worked to provide companies with the very same tools and infrastructure that power Google’s own services and drive our continued growth. In recent years, we have been able to take that to an entirely new level, making our leading generative AI technology available to even the youngest startups.

During the AI Builders Forum, we showcased the work of dozens of startups who have taken that technology in new and exciting directions. There’s so many more hard at work with Google Cloud that we wanted to highlight as many as we could. Here, we’ve gathered 150 to be exact, breaking them out by the sectors and specialties in which they’re creating exciting new offerings that are changing the way we work, create, collaborate, and cure.

For those looking to join their ranks, we encourage you to join our Google for Startups Cloud program, with access to valuable credits and advice from Google experts and experienced VCs. And when it comes to the especially hot segment of AI agents, we’ve just introduced our Startup Technical Guide: AI Agents. It’s an in-depth collection of best practices and step-by-step guidance for building and scaling agentic AI.

We hope this list serves as inspiration for your own work — and that you’ll find yourself being recognized soon enough, too. All it takes is the right AI, the right partners, a good team, and a great idea.

1_150-startup-ai-use-cases-ai-coding-dev

Arize AI, an AI agent and engineering platform, partners with Google Cloud to help organizations successfully develop, evaluate, and observe their generative AI applications. The Arize AX platform seamlessly integrates with Vertex AI models and runs on Google Kubernetes Engine (GKE), which allows a very lean operations team to easily scale services and provide deep visibility into every layer of AI systems.
Augment Code, an AI coding assistant, integrated Anthropic’s Claude 3.5 Sonnet via Vertex AI to power its codebase chat feature. This delivered an instant improvement to chat performance after quick implementation, enhancing security and allowing customers to build and troubleshoot code faster.
Aviator, an engineering productivity platform, uses Google Kubernetes Engine, Vertex AI, and Gemini to scale its engineering productivity platform to thousands of users while accelerating the development of new generative AI features.
Cursor is an AI-powered code editor that integrates advanced AI features like autonomous agents and codebase-aware chat. It aims to create a highly effective human-AI programmer by automating tasks, understanding entire codebases, and accelerating development velocity, using models like Gemini and Anthropic’s Claude on Google Cloud’s Vertex AI.

aside_block: <ListValue: [StructValue([(‘title’, ‘Get started’), (‘body’, <wagtail.rich_text.RichText object at 0x7f6374b1f430>), (‘btn_text’, ‘Get in touch.’), (‘href’, ‘https://cloud.google.com/contact/form?hl=nl’), (‘image’, <GAEImage: Startup Program>)])]>

DeepSource, a platform for code quality and security, leverages Gemini and Google Kubernetes Engine (GKE) to help developers automatically analyze and remediate code. The platform uses Gemini-powered AI agents for its Autofix™ remediation engine, which increases the accuracy of its static analysis and provides automated fixes. Running on GKE, DeepSource can automatically scale to process tens of millions of lines of code per day, reducing operational costs and accelerating time to market for over 6,000 companies.
Factory AI, a platform for agent-driven software development, accelerates engineering by unifying context from sources like GitHub and Jira to delegate tasks like feature development and migrations. It uses Gemini 2.5 Flash for data ingestion and Gemini 2.5 Pro for advanced code/document generation.
Fireworks AI, a generative AI platform, uses Google Kubernetes Engine and Compute Engine to run its fast and efficient inference engine. This enables the company to process over 140 billion tokens daily, offering customers high uptime and throughput with lower latency and costs.
Lovable, an AI software engineer, leverages Google Cloud’s Vertex AI to deploy and unify its core language models, including Gemini and Anthropic’s models. This unique orchestration enables users to create complete, full-stack web applications from plain English descriptions, effectively cutting the time required for prototype development from weeks to minutes and app development from months to hours.
Naologic, an AI application platform, uses Gemini APIs, Google Kubernetes Engine , and MongoDB Atlas on Google Cloud to build apps on top of legacy ERPs. The solution delivers fast query responses regardless of complexity, enabling powerful, natural-language chat and scaling for complex AI workloads like RAG and image search.
Qodo addresses critical code quality concerns with an agentic platform that works within existing developer workflows. Qodo integrates into Vertex AI Model Garden and provides automated pull request reviews at no cost to open-source projects.
Resolve AI, an always-on AI SRE, autonomously investigates incidents and helps run production systems using code, infrastructure, and observability data. With the intelligence and performance of Gemini on Vertex AI, Resolve AI improves MTTR, reliability, and engineering velocity for its customers.
Sieve, an AI research lab focused on video data, builds multimodal AI systems to automate dataset creation, improve data quality, and provide relevant metadata that powers frontier model training. Sieve uses Vertex AI for large-scale video processing, content moderation, and indexing, including tasks like text recognition, captioning, and enrichment.
Systalyze, an enterprise AI deployment platform, reduces the cost and complexity of AI. Through a partnership with Google Cloud, it reduces deployment costs by up to 90% and accelerates fine-tuning, inference, and agentic AI by 2–15x, while keeping data fully private and secure.
Turbopuffer, a startup offering serverless vector and full-text search, uses Google Cloud Storage, Google Kubernetes Engine, and Google Compute Engine to help AI businesses overcome the high costs and complexity of traditional database architectures. Its solution has reduced AI database cost by up to 90% for customers, manages over 1 trillion documents, and handles more than 10 million writes and 10,000 queries every second.
Vercel democratizes access to AI models through its AI SDK and AI Gateway, making it seamless for developers to integrate agentic capabilities into their applications. By providing unified access to leading models like Google’s Gemini through a single interface, Vercel has enabled teams to build AI-powered features faster and more reliably.
VESSL AI, an MLOps platform, uses Google Cloud to accelerate AI model development and reduce costs for its users. By leveraging Google Kubernetes Engine, VESSL AI can dynamically scale ML workloads, helping users create AI models up to four times faster and realize up to 80% savings on cloud expenditures. The platform’s integration with Vertex AI provides users with access to powerful models and AutoML solutions, further streamlining the MLOps lifecycle.
Windsurf provides an AI-powered code completion and generation tool for developers. Their AI integrates directly into IDEs, offering suggestions, generating code, and refactoring existing code. Windsurf uses Gemini 2.5 Pro to power its coding assistance IDE, and to support integrations with Cognition’s Devin AI.

2-150-startup-ai-use-cases-healthcare-life-sciences

Atropos Health, a healthcare data analytics company, optimized its GENEVA OS to work with Google Cloud’s Healthcare Data Engine (HDE) and BigQuery. This enables customers to efficiently and securely convert data into valuable insights and evidence.
Cerebra AI develops AI software for analyzing non-contrast CT (NCCT) scans, with a focus on early stroke and cancer detection. It fine-tunes MedGemma on NCCT images and leverages Gemini’s few-shot generalization capabilities to rapidly adapt the model for various diagnostic tasks.
CitiusTech, a global Healthcare technology services firm, uses Google Cloud to improve patient experience, reduce administrative burden on clinical staff, and save costs for healthcare systems. The company has developed an AI search solution using Vertex AI to efficiently connect patients with the right specialists and automate critical workflows.
Congruence Therapeutics, a computationally driven biotechnology company, uses its proprietary platform, Revenir, to build a pipeline of small molecule correctors. The platform identifies novel allosteric and cryptic pockets in proteins to rescue aberrant function.
CoVet is an AI assistant built by veterinary professionals, for veterinary professionals, that uses Gemini, Cloud Functions, and other GCP solutions to help veterinary teams automate administrative work, save hours every day, and refocus on what matters most: exceptional patient care.
Digital Diagnostics, a healthcare diagnostics company, uses Google Cloud’s secure infrastructure to enhance the reach of LumineticsCore, its AI-powered diagnostic tool for diabetic retinopathy. This approach protects sensitive health data and ensures patient privacy and regulatory compliance.
DNAstack, a leading genomics data management and analysis platform, leverages Google Cloud’s scalable infrastructure and advanced analytics tools to accelerate research and discovery in personalized medicine.
Evogene uses Google Cloud and Vertex AI to replace life sciences’ costly “spray and pray” molecular discovery — testing millions of molecules hoping to stumble into effective ones — with their computational platform. They now process 40 billion molecules versus previous millions, while using Vertex AI to develop a cutting-edge small-molecule foundation model that dramatically accelerates drug discovery timelines.
GenBio AI, a computational biology company, uses Google Cloud to power six specialized AI models in developing AI-driven digital organism simulators. These models simulate biological programming to address critical challenges in medicine and biology.
Immunai tackles drug development’s decade-long timeline with AMICA, the world’s largest immune-focused single-cell database containing hundreds of millions of cells. Using Google Cloud GPU clusters, they train models that transform complex immune mechanisms into actionable recommendations for 30+ biopharmaceutical partners.
Infinitus, the first trusted agentic healthcare communications platform, automates clinical and administrative conversations at scale. Our AI agents powered by Gemini’s multimodal capabilities have completed over 5x more conversations than any other solution with payors, patients, and providers to drive revenue and improve health outcomes
iSono Health, a medical imaging company, developed a Virtual Sonographer, an intelligent, automated 3D Ultrasound platform powered by Google Cloud AI. The platform brings breast imaging directly to the point of care, providing fast, accessible, and repeatable imaging.
Menten AI, a biotechnology company, uses Google Cloud’s high-performance compute and machine learning capabilities to accelerate the development of peptide therapeutics. This allows the company to rapidly design and optimize novel drug candidates.
Moonwalk Bio, a preclinical-stage biotechnology company, leverages epigenetic biology and AI to pioneer new medicines for obesity and cardiometabolic disease. Their platform determines the causal relationships between genes and disease pathways for therapeutic targeting.
Pear Health Labs, a health & fitness AI platform, develops personalized interventions to prevent chronic conditions. It powers recommendations, content search, & dynamic audio coaching. It uses Vertex AI Voice Generation, Vector Search in BigQuery, & the engineering team leverages Gemini Code Assist.
Sami, a tech-enabled health insurance company in Brazil, built its entire infrastructure on Google Cloud from day one. The company uses AI tools such as Gemini, Notebook LM, and Vertex AI to transform healthcare delivery and accelerate critical processes like prescription validation to seconds. Sami enables information flow across its partners, delivering high-quality, accessible healthcare at scale.
SandboxAQ is expanding its usage of Google Cloud and running a new AI drug discovery simulation platform on Google Cloud.
Sully.ai, a healthcare AI company, has built an app store for AI agents designed specifically for healthcare professionals. The platform provides support to clinicians on administrative tasks, so they can focus on patients.
Tali.ai is the leading medical AI scribe platform, designed to reduce the administrative burden of clinicians. Integrated with multiple EMRs across the U.S. and Canada, it leverages Google’s Vertex and Gemini models to automate clinical note-taking during patient encounters and extract key insights.
Think Research, a provider of knowledge-based digital health software solutions, uses Google Cloud’s scalable infrastructure and analytics tools to power its platform. This enables the company to deliver more efficient patient care and improve health outcomes.
Ubie, a healthcare-focused startup founded in Japan, is using Gemini models via Google Cloud to power its physician assistance tool.
Ufonia helps physicians deliver care Ufonia, a clinical AI company, helps physicians deliver care by using AI to automate clinical consultations with patients. Google Cloud’s full AI stack, including infrastructure, models on Vertex AI Model Garden, BigQuery, and Google Kubernetes Engine, powers its platform by using AI to automate clinical consultations with patients. It is using Google Cloud’s full AI stack to power its platform, including infrastructure, models on Vertex AI Model Garden, BigQuery, and GKE.
Via Scientific, a bioinformatics company, partners with Google Cloud to deliver Via Foundry, an enterprise-grade platform that uses Gemini and Vertex AI to make the drug discovery process more efficient. The platform transforms complex biological data into actionable insights that can accelerate discoveries.
Virgo Surgical, a medical video solutions provider, uses Google Cloud Storage and Google Kubernetes Engine to host and process over 1.75 petabytes of video data. This data has been used to create EndoDINO, an AI foundation model for endoscopy that achieves high performance in medical imaging applications.

Gobii provides AI Agents that automate complex web tasks like forms and workflows directly in the browser. To power these intelligent agents, Gobii leverages Google Cloud, utilizing Vertex AI and our scalable GKE infrastructure.
Hebbia, a legal and financial research company, integrated Gemini models into its Matrix platform to help organizations build AI agents capable of working across all of their data.
LiveX, customer service AI agents, uses Google Kubernetes Engine Autopilot and NVIDIA GPUs to power its platform. These technologies reduced total cost of ownership by 50% and operational costs by 66% while supporting an 85% reduction in customer support costs for one of its clients.
Parallel is using Gemini models to power new products including an API for AI agents to perform high-value tasks using web data.
Qualia Clear is an agentic system that transforms real estate closings by automating manual title & escrow workflows. It uses tool calling, Gemini 2.5 Flash & Pro, and Google Agent Development Kit to process emails and documents and simplify reporting, improving efficiency and customer service.
Replicant automates customer conversations for enterprise brands using voice and chat AI agents that replicate the expertise of your very best agents. Gemini helps Replicant deliver consistent service 24/7 that deploys quickly, scales effortlessly and continuously improves to boost ROI and CX.
Skyvern helps companies automate browser-based workflows with AI. Skyvern uses Large Language Models (LLMs) like Gemini 2.5 Pro and computer vision, to interact with websites, enabling it to automate tasks like filling out forms, procuring materials, and downloading invoices. Skyvern’s AI agents can adapt to website changes, making automation more robust.
Torq uses agentic AI to automate the entire security operations lifecycle through Socrates, an AI SOC analyst that coordinates specialized agents. Running on Google Cloud’s infrastructure, teams achieve 90% automation of tier-1 analyst tasks auto-remediated without human involvement, 95% decrease in manual tasks and10x faster response times.

4-150-startup-ai-use-cases-business-professional-services

Altumatim, a legal tech startup, uses a platform powered by Gemini on Vertex AI to analyze millions of documents for eDiscovery. This accelerates the process from months to hours, improves accuracy to over 90%, and enables attorneys to focus on building compelling legal arguments.
Anara, a generative AI research assistant, helps users find and understand scientific documents with verifiable AI summaries and insights. It uses Google Cloud’s scalable infrastructure, AI Studio, and Cloud Functions to power its models and data processing for a global user base.
Clavata.ai delivers an integrated AI governance and safety platform with intelligent, multi-modal, real-time evaluation engine powered by Gemini. Our tools enable proactive policy enforcement, dynamic debugging, iteration, observability, and problem prevention.
Harvey, a legal AI company, uses Gemini 2.5 Pro on Vertex AI to automate complex document reviews, a major pain point in the legal industry. The platform provides domain-specific AI that can reason over hundreds of pages of materials, enabling legal professionals to maximize efficiency and focus on strategic work.
Inspira, a legal tech company, tackles the time-intensive challenge of legal document analysis by providing lawyers with an AI-powered solution built on Google Cloud. Leveraging Gemini, Vertex AI, and BigQuery, Inspira’s platform automates legal document search, analysis, and drafting to reduce workflow times by 80%, allowing lawyers to find answers and relevant decisions in minutes or hours instead of weeks.
Instalily uses Google Cloud with Gemini 2.5 and Vertex AI to power InstaWorkers™ that transform sales, service, and operations. At a leading field service provider, InstaWorkers™ cut the technician’s diagnosis time from 15 minutes to under 10 seconds, lowered serving costs by 98 percent, and drove a 99 percent improvement in end-to-end workflow speed.
Markups.ai, an AI contract negotiation agent, turns a days-long human legal review into a minute(s) automated process. By simply emailing a contract, clients receive customized revisions and analysis almost instantly. Gemini 2.5 Pro enabled us to go from handling only first revisions of NDAs, to any revision of any contract (MSAs, DPAs, etc.).
monday.com, a work management platform trusted by more than 245,000 customers worldwide, leverages Veo to produce training videos, social content, and internal communications in a fraction of the time — empowering all employees, not just designers, to move faster and focus on impact.
NoBroker, a real estate platform, uses its ConvoZen AI, powered by Gemini and L4 GPUs, to automate customer support across multiple Indian languages. The platform processes 10,000 hours of recordings daily, with AI agents projected to handle 25-40% of future calls and save customers $1 billion annually.
ObraJobs, a job platform, connects candidates with relevant opportunities, streamline the hiring process, and deliver personalized recommendations for both job seekers and employers. Obra uses Vertex AI to power candidate matching and personalized job recommendations, supported by Cloud Storage, Cloud Run, Cloud Scheduler, Cloud Tasks, and Firestore.
Provenbase has built its talent recruitment tool for businesses on Google Cloud and is now powering its transformative Deep Search for talent feature using Google Cloud AI.
Story, an intellectual property startup that powers licensing and monetization services, is working with Google Cloud’s web3 services and infrastructure to bring new capabilities to developers on its platform.
Upwork, the world’s human and AI-powered work marketplace, connects businesses with independent professionals. By leveraging GCP’s Vertex AI Text to Speech API, Upwork delivers faster, more accurate talent matching and hiring efficiency for clients and freelancers.
Wotter, an employee engagement platform, uses a Gemini-powered smart assistant and Google Cloud’s robust AI capability to provide real-time insights into employee sentiment. It accurately predicts flight risks and offers actionable “Wott-if” scenarios, enabling leaders to build a data-driven people strategy.

5-150-startup-ai-use-cases-manufacturing-industrial

Bynry‘s SMART360 leverages Google Cloud’s robust infrastructure to empower small and mid-sized utilities to enhance operational efficiency and customer satisfaction.
Kraftblock, a green tech company, uses Google Cloud Compute Engine to run simulations for its high-temperature thermal energy storage systems, helping energy-intensive industries like steel and ceramics decarbonize. This support helps the green tech startup optimize its solution and scale faster.
Labellerr, a data labeling engine, uses Vertex AI and Cloud Run to automate annotation and smart QA to help ML teams process millions of images and thousands of hours of videos in just a few weeks.
Physical Intelligence recently partnered with Google Cloud to support model development, using our secure and scalable AI infrastructure.
tulanā, an intelligent decision support provider, has a highly customizable platform that uses forecasting, optimization, simulation, and AI to help enterprise clients make better decisions across supply chains and physical infrastructure. tulanā is using Cloud Run to horizontally scale its optimization workloads, Gemini for intelligent ETL processes, and Cloud SQL and Big Query to store customer data.

6-150-startup-ai-use-cases-finaicial-services

Albo, a Mexican neobank, uses Gemini models to power its “Albot” AI chatbot that provides 24/7 financial advice, customer onboarding, and support to millions of first-time banking users. The platform advances financial inclusion while streamlining regulatory compliance and improving operational efficiency.
Bud Financial, a data intelligence provider focused on banking clients, leverages DataStax Astra DB on Google Cloud with Gemini. The platform enables ultra-fast processing of complex financial data, helping clients reduce fraud by over 90% and shortening the time required to access critical data analytics from weeks to minutes.
Causal, a financial planning platform, uses Cloud SQL and Gemini models to power its data foundation and accelerate innovation. By offloading database management, the company built an AI-powered wizard that helps users connect data, analyze patterns, and generate financial models in just five minutes.
CERC, a financial infrastructure company, manages more than 500 million daily transactions using Databricks, BigQuery, and Gemini. This increased processing capacity by 10x without adding to the workforce, allowing the company to process millions in revenue forecasts in just two minutes and accelerate analytics for customers.
DataCurve, a frontier coding data analytics provider, addresses complex data challenges by combining Web3 and generative AI on Google Cloud. Its platform uses AI agents for deep data analysis and Web3 for data authenticity, delivering insights that help customers take action and improve engagement.
eToro, a global trading and investing platform, has pioneered a groundbreaking approach to marketing by using Veo to create advertising campaigns. This enables eToro to rapidly generate professional-quality, culturally specific video content across the global markets it serves, which would traditionally require extensive production timelines and significant budgets.
Fiscal.ai is reinventing financial data infrastructure. Its AI-native platform transforms unstructured public filings into clean, standardized data in minutes, replacing the slow, error-prone legacy of manual aggregation and delivering the mission-critical insights that today’s top investors demand.
Rogo, a generative AI platform built for the financial industry, uses Google Cloud solutions like Dataflow, Spanner, and Vertex AI to automate complex research and analysis for the world’s leading investment banks and private equity firms. Gemini 2.5 Flash enabled the platform to cut AI modeling time from months to hours and reduce hallucination rates from 34.1% to 3.9%.
Stax AI, a retirement administration platform, uses Google Cloud’s generative AI and MongoDB to automate data extraction from complex financial documents. The solution processes thousands of brokerage statements in minutes, not hours, helping administrators respond to client inquiries and meet compliance deadlines in a fraction of the time.
Stream, offers financial tools to employers and employees and is using Gemini models to handle more than 80% of its internal customer inquiries, including questions about pay dates, balances, and more.
WealthAPI, a German fintech company, uses DataStax Astra DB on Google Cloud with Gemini to deliver real-time financial insights to millions of users. The platform’s scalability allows it to analyze hundreds of thousands of transactions per second and has reduced response times by 90% for its customers.

7-150-startup-ai-use-cases-design-creativity-media-gen

Afooga, an AI-powered content experimentation factory, enables businesses to generate, test, and distribute content at massive scale from a single hypothesis, automatically optimizing across TikTok, Meta, YouTube, and more. Afooga leverages Vertex AI and Veo for generative video capabilities, and is architected entirely on Google Cloud.
Alson AI, a creative platform provider, uses Veo and Gemini to power its creativity platform that helps creators turn ideas into illustrated books and animations, reducing production time from months to minutes and costs from thousands of dollars to $25.
Cartwheel, a generative animation platform, helps users tell stories faster and more creatively. Its tool uses Gemini Flash for character creation prompts, Imagen for reference-image creation for 3D character development, and Veo 3 for video-to-animation input control that makes output editable by artists.
ComfyUI, an open-source engine for visual AI, helps creators prototype and automate media generation with pre-set models and more than 20,000 extensions. The platform integrates Gemini 2.5 and Veo 3 for multimodal creation and runs on Google Cloud infrastructure.
Connected-Stories, an AI creative platform, uses Gemini and Imagen on Vertex AI to help brands overcome manual content personalization challenges. The platform’s AI creative assistant analyzes briefs and turns them into sophisticated strategies, enabling brands to create personalized content at scale and optimize campaigns in real-time.
fal, a generative media platform for developers, accelerates generative AI model inference to improve the speed with which content is generated. The fal team is working with Google Cloud to leverage its Veo 2 technology to help its users create videos with realistic motion and high-quality output.
HeyGen, an AI-powered video generation platform, makes creating, translating, and personalizing high-quality videos simple and accessible. HeyGen’s core product leverages Gemini 2.5 Pro, Flash, and Flash-Lite to streamline content creation. With one prompt, HeyGen automates video planning, intelligently analyzes user-generated footage, and optimizes content through advanced visual and audio processing.
Higgsfield.ai, a generative AI video startup, uses its foundational model on Google Cloud and Vertex AI to enable video creation, helping smaller companies create realistic videos faster and more cost-effectively compared to traditional methods.
Krea.ai, a creative suite of AI tools, offers real-time image/video generation and personalized model training for artists and marketers. It integrates with Google Cloud, including models as Veo3 and Nano Banana, to provide access to advanced models, enabling users to create high-quality ads, product photos, and game assets.
Mosaic lets you build and run multimodal video-editing AI agents. A canvas of creative tools becomes your building blocks, enabling simultaneous edits and many versions on autopilot—powered by Gemini 2.5 Pro’s video understanding and Google Cloud (Storage + Cloud Run) for scalable pipelines.
Nim.video is an AI-first platform for instant short-form video generation from a single prompt. The multimodal platform uses top generative models, including Veo3 and Veo3 Fast, for text-to-video synthesis. It runs on Vertex AI, enabling scaled experiments and orchestration of services like speech recognition and TTS.
OpenArt empowers social media creators and SMBs to turn ideas into stunning videos in minutes – complete with motion, music, and a narrative arc in one click. Powered by Gemini image models and Google’s Veo3 video model, it makes creating viral posts and brand content fast and effortless.
Photoroom, provides generative AI photo-editing and design capabilities to consumers and businesses, uses Veo 2 and Imagen 3 to improve the quality of its offering and accelerate its development.
Prodia offers APIs to integrate generative AI into creative tools. Built on Google Cloud, Prodia relies on GPUs & DWS to serve the fastest text-to-image and instruct-to-edit models in the world, as verified by Artificial Analysis benchmarks. Prodia uses Veo 3 and Nano Banana to further power multimodal AI features.
Reclip, a “real” social media application, leverages GCP’s Generative Media tools (Veo, Imagen) to create short, engaging animated videos from real time audio clips, captured by their proprietary app. Consumers love sharing these precious, funny and real “Reclips” with their friends and family.
Scope3 is enhancing its ad-tech platform with AI-powered features, using Gemini 2.5 Flash to offer features like real-time content classification, ensuring content is aligned with brand preference.
Spot AI, a video AI agent builder, leverages Google Cloud PubSub to turn security cameras into intelligent agents to enhance safety, security, and operations without needing AI/ML teams.
Wondercraft, an AI-powered content studio that helps users create engaging audio ads, podcasts and more, is leveraging Gemini models for some of its core functionalities and will soon release a Veo 2 integration.
Writer, an enterprise generative AI platform, builds and trains its 17-plus LLMs on Google Cloud using Google Kubernetes Engine and high-performance NVIDIA GPUs. This allows the company to scale efficiently and cost-effectively, delivering fast, low-latency answers for enterprise customers who demand accurate and on-brand AI-generated content.

8-150-startup-ai-use-cases-marketing-media

AndesML, a retail media platform, helps large enterprises launch and monetize their own ad networks by showing the right ad to the right customer at the right time. Built on Vertex AI, BigQuery, and Gemini models, the AndesML platform has delivered a 30% performance lift in customer campaigns, accelerated production time by more than 30 days, and reduced operational costs.
Audiomob, an in-game audio ad platform, replaced its legacy business intelligence system with BigQuery and Looker to gain real-time insights from its global data. This move enabled the company to handle billions of monthly transactions, contributing to triple-digit yearly revenue growth and significant savings in engineering time.
Hedra, an end-to-end marketing creation platform, is designed to generate high-quality content at scale. Hedra Studio combines its proprietary multimodal models with other leading models like Veo and Imagen, enabling users to produce polished marketing content for any use case. Hedra’s Live Avatars use Gemini to deliver dynamic, real-time interactive video experiences.
Inworld, an AI platform for builders of consumer applications, uses Google Cloud and Gemini to cost-effectively handle tens of millions of concurrent users with response times measured in milliseconds, meeting strict requirements for quality, cost control, and security.
Koolio.ai helps creators produce high-quality podcasts and audio content. Koolio.ai integrates Gemini, Lyria, and Veo to power features such as AI-generated dialogue, accurate transcription, intelligent sound effects and music selection, and audio enhancement, streamlining the entire audio creation workflow from concept to final production.
MNTN uses Google Cloud to power its Connected TV ad platform, making TV campaigns as measurable as search or social. With AI-driven tools like MNTN Matched and Security Command Center, MNTN scales creative and targeting securely and at speed.
Napster, a metaverse company, is building a no-code 3D e-commerce platform on Google Cloud using Vertex AI and Gemini. This supported 20-85% infrastructure cost reductions and saved over 3,600 developer hours, making immersive 3D web experiences accessible and affordable for its customers.
Potrero Labs, a creator-focused platform, has launched Jams, an AI-first video social network empowering authentic self-expression. Its platform simplifies video creation, allowing users to record short videos and let Jams enhance them. Jams offers a simple UI with a variety of models under the hood, including Gemini 2.5 Pro for script creation, multi-modal Gemini for video analysis, and Veo 3 for backgrounds, b-rolls, and audio.
Producer.ai (formerly Riffusion), an AI music platform, trains generative music models and builds products that empower anyone to create the music they imagine. “The Producer” music collaboration agent helps users create original, studio-quality songs from text, audio, or visual prompts. Gemini on Vertex AI assists with prompt augmentation and data pipelines, while Vertex AI APIs offer access to advanced multimodal models for experimentation.
Rembrand is an AI-powered advertising platform that facilitates in-video product placements for content creators and advertisers on social media and CTV. Powered by Google Cloud’s AI Infrastructure, Rembrand enables brands to genuinely connect with audiences without disrupting the content.
Satisfi Labs will begin using Gemini models to power a new agentic platform for hundreds of customers in sports, entertainment, and tourism. The Agentic Platform delivers specialized agents for guest experiences, ticketing, on-site, safety, and merchandise tuned by industry experts to help live experience businesses sell more, service faster, and gain real-time insights from every guest conversation
Scorpion, a digital marketing company for SMBs, uses Google’s VEO AI to scale video ad production. By integrating this technology into its toolkit, Scorpion makes creating professional videos for websites and advertising faster and more accessible for all businesses.
StatusPro builds NFL virtual reality experiences for gaming and training and is now developing its newest game on Google Cloud, including a new Gemini-powered in-game coach.
Synthesia, an AI video enterprise platform, helps businesses create instructional videos for employee training, customer support, sales enablement, and product marketing. The company is using Veo 3 to contextually adapt visuals to the content delivered by its AI avatars and voices.
Tinuiti, a performance marketing agency, used Google’s VertexAI to develop an AI-powered service that develops and optimizes ad copy to increase performance. The tool embodies a philosophy of maximizing growth by minimizing waste, and a recent experiment showed significant ROAS performance improvements compared to human-curated copy.
Toonsutra, an India-based webcomic platform, is using Google’s Gemini AI to go global. By making stories accessible in regional languages and adding Lyria 2 for music, Gemini for voices, and Veo 3 for animation, they’re creating next-gen immersive comics.
Velin.ai, a content creation platform for small businesses, offers an AI agent that explains the content and its underlying strategic implications while acting as a unified content workspace. Gemini 2.5 drafts everything from scripts to social campaigns, while Imagen 4 and Veo 3 generate aligned visuals and video clips, ensuring a consistent brand narrative across all content.
Visla is an AI-powered video creation platform that helps businesses and creators produce pro videos in minutes. Using Google’s Imagen 4, Gemini Flash Image 2.5, Veo 3, and Visla’s AI Video Agent with Avatars, it adapts visuals, narration, and automates polished content for learning, training, and marketing.

LitLab.ai, a reading platform, leverages Veo3 and Vertex AI to generate personalized, curriculum-aligned stories and provide real-time oral reading analysis. They create decodable content and employ voice recognition for instant teacher feedback on student fluency.
Savvy revolutionizes learning by automatically generating flashcards and quizzes from PDFs, notes, videos, and podcasts. As students answer, Savvy leverages Gemini to dynamically grade their answers, providing instant feedback and personalized learning.
Studyhall AI, an AI research platform, graduated from Google Cloud’s UK Growth Accelerator program and built a mobile application that uses Gemini models to help coach students on reading, writing, and exam prep.
Subject.com, an AI-powered platform for grades 6-12, blends cinematic storytelling with superintelligent AI so students and teachers never get stuck. VertexAI, CloudSQL & BigQuery power Subject’s teacher assistant tool Spark, instant feedback, “ExplainThis” text simplifier, 24/7 Homework Helper, and personalized learning tied to student interests.

Deeli AI, an AI-powered platform, helps companies discover and evaluate emerging technologies to make informed investment decisions. The company builds its product and data pipeline on various services such as GCE, Cloud Run, and Dataflow, and uses models from the Vertex AI Model Garden.
Nectar’s AI-driven community agents with Gemini to handle customer conversations on social platforms, influencer marketing, and real-time product feedback at scale for leading brands & retailers. By transforming unstructured social data into actionable insights and powering customer conversations, Nectar helps brands deepen relationships and drive measurable growth.
Simbe, a multimodal, retail-focused computer vision company, built its AI-powered Store Intelligence platform on Google Cloud. By deploying autonomous robots and sensors, Simbe provides real-time insights into shelf inventory and price accuracy. This helps retailers reduce out-of-stock instances, improve pricing and promotion execution to the high 90% range, and achieve a 4x return on investment within 90 days.
Zapia AI, a retail technology company, uses AI agents to support millions of users with product discovery, local business searches, and purchase assistance, resulting in over 90% positive user feedback. Its multi-agent orchestration is powered by Gemini to improve agent reasoning, reduce latency, and lower operational costs.
Zazzle is a global platform for custom products and designs made on demand. Zazzle uses Gemini ADK and CCaaS to facilitate chat-based product discovery and enhance customer experience, making it easier to find the right designs across a wide range of products.

11-150-startup-ai-use-cases-security-responsible-ai

Aptori, an AI security company, detects vulnerabilities in AI-generated code, prioritizes risks, and automates code fixes in real-time. Aptori uses Gemini to analyze code for security weaknesses and generate context-aware fixes, integrating its AI Security Engineer directly into developer workflows.
Chainguard, a software supply chain security company, uses Google Cloud Run and Google Kubernetes Engine to provide developers with secure open-source building blocks. Its serverless architecture streamlines operations and product development, reducing infrastructure management costs and enabling them to scale effortlessly to meet increasing user demand.
Eon, a cloud data protection platform, uses Google Cloud Storage and BigQuery to transform backups into AI-ready data lakes. This allows its customers to eliminate fragmented data silos, reduce secondary storage costs by up to 98%, and improve data recovery times by up to 90%.
Galileo, an AI observability and evaluation platform for building trustworthy AI applications, addresses the critical challenge of mitigating LLM unpredictability and hallucinations. Using Gemini to build its “evaluation agents” and running on a scalable Google Cloud infrastructure with NVIDIA GPUs, Galileo provides a holistic “trust layer” for reliable AI. This has enabled customers to de-risk over 1,000 AI applications, while go-to-market support from the Google for Startups Cloud Program helps Galileo accelerate growth and unlock new opportunities.
Prediction Guard is using Google Cloud services like Confidential Computing and Vertex AI to support its platform for added gen AI safety.
Resistant AI, an AI-powered security company, uses Google Cloud to build solutions that combat fraud in financial services documentation and workflows. Running on Google Cloud infrastructure, the company’s specialized document fraud detectors scrutinize financial documents in 500 different ways, helping to protect automated workflows like those using Google’s Document AI from sophisticated financial crime.
Specular, an offensive cybersecurity platform, builds AI agents using Gemini 2.5 Pro to automate attack surface management and penetration testing. Their platform automates traditional workflows to identify, assess, and remediate cybersecurity, helping enterprises proactively prioritize and respond to threats.
Wyze Labs is rolling out new AI-powered anomaly detection features for its security camera systems, powered by Google’s vision AI tool.
Zefr, a global leader in responsible AI, powers Fortune 500 brand advertising with safety and suitability on platforms like YouTube and TikTok. Using patented Cognitive AI with Gemini Flash and Vertex AI, Zefr analyzes video, image, audio, and text to deliver trusted, scalable solutions.

12-150-startup-ai-use-cases-analytics-other-ai

aSim, an AI-powered mobile app development tool, allows you to quickly generate, share, and discover mini-apps. Users can instantly generate an app from a prompt, leveraging APIs/LLMs like Google Maps and Gemini, as well as image and video generation from Nano Banana and Veo 3.
Bud Financial, a data intelligence provider focused on banking clients, leverages DataStax Astra DB on Google Cloud with Gemini. The platform enables ultra-fast processing of complex financial data, helping clients reduce fraud by over 90% and shortening the time required to access critical data analytics from weeks to minutes.
Bynry’s SMART360 leverages Google Cloud’s robust infrastructure to empower small and mid-sized utilities to enhance operational efficiency and customer satisfaction.
Citylitics, a predictive intelligence platform, transforms public infrastructure investment for municipalities, utilities, and engineering firms. By automating data processing with Dataflow and Cloud Run, it cuts analysis time by 71% and boosts data sources by 400%, helping customers proactively identify and win new business.
ContextQA, an Agentic AI software testing platform, enables accurate, user-centric test automation from development start. The platform uses Gemini models to compare application behavior with expected outcomes, adapting automatically to changes.
Flockx, an AI-powered event app, migrated to Google Cloud to support its growth and control costs. Using Google Kubernetes Engine and autoscaling on Compute Engine, the company accelerated product development and successfully demonstrated scalability by processing over 100,000 AI runs in a single weekend.
Macro, an AI productivity platform, uses Gemini to modernize knowledge work by offering a unified workspace with features like multi-document chat and editable mind maps. The platform simplifies complex workflows and scales with demand for over 125,000 users in legal, finance, and education, while offering enterprise-grade security, data privacy, and compliance.
MaestroQA, a conversation analytics data platform, is leveraging Gemini to enhance its AI-powered conversation analytics. By using Gemini, MaestroQA is improving its ability to analyze customer interactions across every channel, providing deeper insights that help businesses boost customer satisfaction and drive growth and retention.
MLtwist, an AI data pipeline services company, processes, transcribes, translates, and labels large, complex data streams for enterprise applications. It uses Gemini and AI Studio for transcription and labeling tasks, saving approximately 63% of the time required to process even highly illegible documents.
Moii.AI, a vision AI startup, uses Gemini and BigQuery to analyze CCTV footage for safety and productivity insights. This approach reduced video review time from days to minutes, saved the company $10,000 monthly on AI processing costs, and doubled its engineering team’s efficiency.
Owl.AI, a sports technology company, delivers AI-powered solutions to professional sports leagues. Their offerings, which include judging and scoring, aim to enhance accuracy, consistency, and eliminate bias. Owl.AI achieves this by leveraging AI models built on Gemini and fine-tuned on Google Cloud to analyze real-time video footage of athletic performances.
Provenbase has built its talent recruitment tool for businesses on Google Cloud and is now powering its transformative Deep Search for talent feature using Google Cloud AI.
SandboxAQ is expanding its usage of Google Cloud and running a new AI drug discovery simulation platform on Google Cloud.
Satlyt, a space compute leader, enables in-orbit AI workloads by orchestrating intersatellite communication and routing. It uses Google Cloud’s Kubernetes Engine, Vertex AI, and scalable data infrastructure to deploy AI agents and plans to deploy Google’s Gemma models in orbit.
SE3 Labs, a 3D computer vision and AI company, uses Cloud Run to deploy advanced AI model technologies that create “Spatial GPTs,” which are essentially AI models that can understand and interact with the world in 3D.
Temporal, a durable execution platform for developers, uses Vertex AI to enhance its customer support operations. The solution provides improved visibility into support trends by automatically categorizing 80% of support tickets, allowing the team to anticipate customer needs and identify new opportunities.

Read More for the details.

2025 10 07

GCP – Five Best Practices for Using AI Coding Assistants

Tibor Kiss Cloud, Google Cloud gcp

Does owning a kitchen knife mean you know how to effectively dice onions or julienne carrots? Of course not. Access to a tool doesn’t guarantee profenciency. To get the results you’re looking for, you need to learn the right techniques.

AI coding assistants are no different. These are new and powerful additions to your developer toolbox that you can access today. But like any tool, you need to know when and how to use them effectively.

So, how can you get the most out of AI coding assistants? To find out, we asked several engineers from our Google Cloud Developer Experiences team to actively use Gemini CLI, Gemini Code Assist, and Jules while they completed complex app development and migration projects. Based on those engineering sprints, we found five best practices that led to better outcomes, and we’re sharing these best practices so you can turn access to tooling into AI expertise as well.

Consider your use case when choosing a tool

Getting the most out of AI requires deliberate planning. Before you begin coding, think about the requirements for your project and associated tasks you’ll need to complete. This up-front thought exercise will clarify which AI tool you should start with and which might be helpful later. Each Google developer tool has different strengths, so you want to choose the right one for the job.

Depending on the complexity of your work, you might not use agentic frameworks to accomplish your task. Writing a new function? Use inline generation. Taking an app from v1 to v2? Use an agent to help plan out all the file changes needed. Throughout our engineering sprints, we used tools like Gemini CLI to help on larger migrations where multiple files were involved. When we were writing specific functions, inline generation through Gemini Code Assist fit our developer flow better. Once you choose the right tool, you can start to train that tool toward your goals.

Train the tool with foundational work

Most generative models act based on natural language. In many examples, AI performs better on generation once you’ve leveraged it to also document the code base. “Shift left” in your AI assistance workflow by documenting early to produce higher-quality output later. This includes things like generating READMEs where needed and even generating unit tests based on the existing code.

As you check this work, you can also find idiosyncrasies in your code where AI may struggle with comprehension, and as a result, potentially future generation. Do this training before you get to the next best practice we recommend: using AI to help you plan your code.

Make a plan

A large part of a developer’s job is planning, and AI models are no different. Spending extra time with AI tools to build and revise an execution plan generally gives you better code output on complex tasks. You can create a strong plan in several ways:

Iterate on a requirements document to fully understand the problem you will solve.
Use source code analysis to understand the current code structure, including package dependencies and other runtime details.
Create a set of tests that will determine if the generated code works based on your requirements.
Create the execution plan, detailing which files and folders need to be modified based on the AI’s understanding of the problem.

You should also ask the AI to build and save a step-by-step plan (like in a plan.md file). This encourages both you and the AI to pause and think through the upcoming steps before execution. AI tools work best when you manage them. Break down a complex, high-level assignment into several manageable components. For example, during a large migration, we recommend migrating one service at a time instead of several at once. Finally, instruct the AI to ask for your approval before executing on new plan milestones. This crucial step keeps you, the developer, in control of your project.

Prioritize prompt engineering

Take time to make your prompts as relevant as possible, just as you would when helping a new teammate scope a task. Consider what details you need to share for a person to succeed, and provide all those details to your AI tool. You can even ask Gemini or other chatbot tools to help improve your prompt before you send it to the AI assistant.

Even when you’re in the agentic coding flow, prompting is still essential. Understanding the specific models you use helps you get better iterative results. Be specific about your requests and desired outcomes. Models are token predictors, so grouping your thoughts and clearly stating what you want to happen next are good practices, whether you are prompting a model or reviewing a task for a teammate.

Connect the dots between sessions

We’ve all encountered that one piece of code you can’t touch without breaking the whole application—that is context. The most effective way to get better performance day after day is to create a context file at the end of each working session, for example GEMINI.md. This file can include high-level instructions, specific details around dependency versions, architecture diagrams, and more. This GEMINI.md file gives the tool a “cheat sheet” it can use to kick off your next AI-assisted session.

Documenting context significantly improves the planning and execution accuracy of the AI coding tools. It also ensures the tool understands your project and your specific working style. During our sprints, we saved all key learnings into a file at the end of every working day and instructed the AI to access it the next morning. This allowed us to pick up exactly where we left off, and now we’re even exploring the idea of storing more layered context files based on repository and general user preferences. The key to creating usable code is giving models the right context.

We’re just scratching the surface of what’s possible with AI coding assistants, and we’re excited to keep learning with you. Join an upcoming AgentVerse event to learn more about how to take an AI idea from concept to reality.

Read More for the details.

2025 10 07

AWS – Automatic quota management is now generally available for AWS Service Quotas

Tibor Kiss AWS, Cloud AWS

Today, AWS announces the general availability of a new capability of AWS Service Quotas called automatic quota management. AWS Service Quotas helps you view and manage your quotas from a central location. This new feature monitors quota usage, and notifies customers before they run out of their allocated quotas supported on AWS Service Quotas. This helps customers with better visibility and proactive awareness about their quota usage, enabling them to scale their applications without interruptions.

AWS customers can get notified of their quota usage with automatic quota management. Customers can configure their preferred notifications channels, such as email, SMS, or Slack, through Service Quotas console or API. Notifications are also available in AWS Health, and customers can subscribe to related AWS Cloudtrail events for automation workflows.

This new capability is now available at no additional cost in all AWS commercial regions. To explore this feature and for details, please visit Service Quotas console and AWS Service Quotas documentation.

Read More for the details.

2025 10 07

AWS – AWS Marketplace announces enhanced pricing dimension capabilities for sellers

Tibor Kiss AWS, Cloud AWS

Today, AWS Marketplace announces enhanced pricing dimension capabilities, increasing limits and improving flexibility for sellers managing their product pricing. These enhancements increase the maximum pricing dimensions from 24 to 200, enable immediate use of new SaaS dimensions, and remove the 90-day price update restriction for dimensions without active subscriptions.

These enhancements address key product pricing needs for sellers offering complex enterprise software. With 200 dimensions each for contract and usage-based pricing, sellers can now fully represent pricing across multiple features, user types, and consumption metrics in a single listing; matching the same pricing structures they offer outside of AWS Marketplace. When sellers add new usage dimensions to their public offers, these become available immediately for use. For instance, when a seller launches a new feature, subscribers can now instantly access it. Similarly, for dimensions without active subscriptions, sellers can adjust prices to align with their external pricing strategies without waiting through multiple 90-day periods.

These enhancements to pricing dimensions are now available in all AWS Regions where AWS Marketplace is supported.

To learn more, visit the AWS Marketplace Seller Guide, or access the AWS Marketplace Management Portal to try the new capabilities.

Read More for the details.

2025 10 07

AWS – Amazon EC2 Im4gn instances now available in AWS Asia Pacific (Mumbai) region

Tibor Kiss AWS, Cloud AWS

Starting today, Amazon EC2 Im4gn Instances are available in Asia Pacific (Mumbai) region. Im4gn instances are built on the AWS Nitro System and are powered by AWS Graviton2 processors. They feature up to 30TB of instance storage with the 2nd Generation AWS Nitro SSDs that are custom-designed by AWS for the storage performance of I/O intensive workloads such as SQL/NoSQL databases, search engines, distributed file systems and data analytics. These instances help with transactions processed per second (TPS) for I/O intensive workloads such as relational databases (e.g. MySQL, MariaDB, PostgreSQL), and NoSQL databases (KeyDB, ScyllaDB, Cassandra) which have medium-large size data sets and can benefit from high compute performance and high network throughput. They are also an ideal fit for search engines, and data analytics workloads requiring fast access to data sets on local storage.

The Im4gn instances also feature up to 100 Gbps networking and support for Elastic Fabric Adapter (EFA) for applications requiring high levels of inter-node communication.

Get started with Im4gn instances by visiting the AWS Management Console, AWS Command Line Interface (CLI), or AWS SDKs. To learn more, visit the Im4gn instances page.

Read More for the details.

2025 10 07

AWS – Amazon Redshift Serverless with lower base capacity available in the AWS Asia Pacific (Seoul) and Canada (Central) Regions

Tibor Kiss AWS, Cloud AWS

Amazon Redshift now allows you to get started with Amazon Redshift Serverless with a lower data warehouse base capacity configuration of 8 Redshift Processing Units (RPUs) in the AWS Asia Pacific (Seoul) and Canada (Central) regions. Each RPU provides 16 GB of memory. Amazon Redshift Serverless measures data warehouse capacity in RPUs, and you pay only for the duration of workloads you run in RPU-hours on a per-second basis. Previously, the minimum base capacity required to run Amazon Redshift Serverless was 32 RPUs. With the new lower base capacity minimum of 8 RPUs, you now have even more flexibility to support a diverse set of workloads of small to large complexity based on your price performance requirements. You can increment or decrement the RPU in units of 8 RPUs.

Amazon Redshift Serverless allows you to run and scale analytics without having to provision and manage data warehouse clusters. With Amazon Redshift Serverless, all users, including data analysts, developers, and data scientists, can use Amazon Redshift to get insights from data in seconds. With the new lower capacity configuration, you can use Amazon Redshift Serverless for production environments, test and development environments at an optimal price point when a workload needs a small amount of compute.

To get started, see the Amazon Redshift Serverless feature page, user documentation, and API Reference.

Read More for the details.

2025 10 06

AWS – Amazon EKS and Amazon EKS Distro now supports Kubernetes version 1.34

Tibor Kiss AWS, Cloud AWS

Kubernetes version 1.34 introduced several new features and bug fixes, and AWS is excited to announce that you can now use Amazon Elastic Kubernetes Service (EKS) and Amazon EKS Distro to run Kubernetes version 1.34. Starting today, you can create new EKS clusters using version 1.34 and upgrade existing clusters to version 1.34 using the EKS console, the eksctl command line interface, or through an infrastructure-as-code tool.

Kubernetes version 1.34 introduces several key improvements, including projected service account tokens for kubelet image credential providers helping improve security for container image pulls, and Pod-level resource requests and limits for simplified multi-container resource management. The release also introduces Dynamic Resource Allocation (DRA) prioritized alternatives, enabling workloads to define prioritized device requirements for improved resource scheduling. To learn more about the changes in Kubernetes version 1.34, see our documentation and the Kubernetes project release notes.

EKS now supports Kubernetes version 1.34 in all the AWS Regions where EKS is available, including the AWS GovCloud (US) Regions.

You can learn more about the Kubernetes versions available on EKS and instructions to update your cluster to version 1.34 by visiting EKS documentation. You can use EKS cluster insights to check if there any issues that can impact your Kubernetes cluster upgrades. EKS Distro builds of Kubernetes version 1.34 are available through ECR Public Gallery and GitHub. Learn more about the EKS version lifecycle policies in the documentation.

Read More for the details.

2025 10 06

AWS – AWS Incident Detection and Response is now available in the AWS GovCloud (US) Regions

Tibor Kiss AWS, Cloud AWS

AWS Incident Detection and Response is now available in both AWS GovCloud (US-West) and AWS GovCloud (US-East) Regions.

AWS Incident Detection and Response offers eligible AWS Enterprise Support customers proactive incident engagement to reduce the potential for failure and accelerate recovery of critical workloads from disruption. Incident Detection and Response facilitates your collaboration with AWS to develop runbooks and response plans customized to each onboarded workload.

Read More for the details.

2025 10 06

GCP – 11 ways to reduce your Google Cloud compute costs today

Tibor Kiss Cloud, Google Cloud gcp

As the saying goes, “a penny saved is a penny earned,” and this couldn’t be more true when it comes to cloud infrastructure. In today’s competitive business landscape, you need to maintain the performance to meet your business needs. Luckily, Google Cloud’s Compute Engine and block storage services offer numerous opportunities to reduce costs without sacrificing performance, especially in the context of your migration and modernization initiatives.

In this article, we’ll explore 11 key ways to optimize your infrastructure spending on Google Cloud, from simple adjustments to strategic decisions that can result in significant long-term savings.

1. Choose the right VM instances

One of the most effective ways to reduce Compute Engine costs is to ensure that you’ve properly selected and right-sized your virtual machines (VMs) for their workloads to support your migration and modernization efforts. Whether you’re new to Google Cloud or already using Compute Engine, adopting the latest-generation VMs — such as N4, C4, C4D, and C4A — can deliver substantial savings and improved price-performance.

Powered by Google Cloud’s Titanium architecture, our latest-generation VMs offer faster CPUs, higher memory bandwidth, and more efficient virtualization than their predecessors, so you can handle the same workloads with fewer resources. For existing customers, migrating from older VM generations to the newest VMs can significantly lower total costs while helping you exceed current performance levels. Organizations that have made the switch often report 20–40% better performance along with meaningful reductions in cloud compute spend. For example, Elastic leveraged the general-purpose C4A machine series based on Google Cloud’s Arm-based Axion CPUs, to achieve a compelling efficiency and performance uplift for their workloads.

Beyond general-purpose VMs, we also offer specialized machine types to address unique customer requirements. Compute-optimized HPC VMs like H4D are designed for high-performance computing and data analytics, offering extreme performance for demanding workloads. M4 and X4 instances cater to memory-intensive applications, while Z3 instances are ideal for storage-intensive workloads. Furthermore, if you need complete control over your hardware environment and maximum performance isolation, we offer bare metal instances.

These options help ensure that even the most specialized and performance-sensitive workloads can find an optimal and cost-effective home within the Compute Engine portfolio.

2. Optimize your block storage selections

The best way to lower your block storage TCO, while ensuring your workloads remain successful, is to drive high resource efficiency. Hyperdisk makes it simple to drive high performance and high efficiency by enabling you to optimize your block storage to your workload and through Storage Pools. We’ll discuss each of these capabilities, and how you can use them to lower your block storage TCO below.

Workload Optimized: With Hyperdisk, you can independently tune capacity and performance to match your block storage resources to your workload. Hyperdisk enables you to independently provision performance and capacity at the volume level. You can leverage this capability to purchase just the capacity and performance you need, no more and no less. You can also take advantage of Hyperdisk Balanced’s “baseline” performance (i.e. included free with every volume), you can serve the vast majority of your VMs without purchasing any extra performance.

Storage Pools: Hyperdisk is the only hyperscale cloud block storage to offer thin-provisioned performance and capacity. With Hyperdisk Storage Pools, you can provision the aggregate performance and capacity your workload requires, while still provisioning the volume level capacity performance your workloads request (also known as thin-provisioning). This allows you to pay for the resources you need, not the sum of the volumes you’ve provisioned. As a result, you can lower your overall block storage TCO by as much as 50%.

For more information on how to select the right block storage for your workload and to see how customers have benefitted from Hyperdisk, read this blog.

3. Consider custom compute classes

To get the most out of our latest-generation VMs, Google Kubernetes Engine (GKE) custom compute classes (CCC) offer an advanced way to optimize compute choices and provide high availability. Instead of being limited to a single machine type for your workloads, you can define a prioritized list of VM instance types. This allows you to set the newest, most price-performant VMs — including our latest-generation VMs — as your top priority. GKE custom compute classes provide the capability to automatically and seamlessly spin up instances based on your specified priority list. This feature helps you maximize the availability of your compute capacity while still aiming for the most cost-effective options, so your workloads can scale reliably without manual intervention.

Here are some specific use cases for how custom compute classes can help you optimize costs:

Autoscaling cost-performant fallbacks: When demand peaks, you might be tempted to autoscale using a highly available but less cost-efficient VM type. CCC allows you to take a tiered approach. You can set up several cost-efficient fallback alternatives, so that as demand increases, GKE first attempts to use the most cost-effective options, and progressively moves to the other choices in your list when necessary to meet demand.
AI/ML inference: Running AI/ML inference workloads often involves significant compute resources. Instead of maintaining a large, static reservation that might sit idle during off-peak times, CCC lets you provision a minimal base reservation and leverage more cost-effective capacity types, such as Spot VMs, to handle peak inference demand — all orchestrated through your CCC configuration.
Adopting new VM generations: Combine the power of GKE custom compute classes with Compute Flexible committed use discounts (Flex CUDs) to de-risk the adoption of new, cost-efficient VM series like N4 and C4. With CCC, you can define fallback options, providing workload resilience, while Flex CUDs offer financial adaptability, as the discounts apply across your total eligible compute spend, regardless of the specific VM series you use. This dual approach is a safe, cost-effective strategy for leveraging the latest hardware without disruption. For more information, read this blog.
Using flexible Spot VMs: Spot VMs offer significant savings but can be preempted. Being constrained to a single Spot VM shape increases the risk that capacity will not be available. With CCC, you can define multiple fallback Spot VM types. This “spot surfing” capability allows the application to remain on cost-efficient Spot capacity by automatically pivoting to alternative Spot instance types if the primary choice is unavailable.

In short, by leveraging GKE CCC, you can artfully mix and match various VM types and consumption models, including On-Demand, Spot, DWS FlexStart, and instances covered by CUDs, to build a resilient and highly cost-optimized infrastructure that adapts to the unique needs and patterns of your workloads.

4. Leverage custom machine types (CMT)

Custom machine types, available on N4 VMs, allow you to precisely configure virtual machines to your exact specifications. Rather than selecting from predefined machine types that might include excess capacity, you can tailor the CPU-to-memory ratio specifically for your workloads, so you only pay for resources you actually use. This targeted approach minimizes waste and can significantly reduce your cloud spend, especially when migrating from on-premises to Google Cloud or from other cloud providers.

This flexibility becomes particularly valuable if your applications have unique resource profiles that don’t align well with our standard offerings. Custom machine types let you create the perfect environment for your needs. By avoiding the compromise of over-provisioning certain resources while potentially constraining others, you can achieve both better performance and more efficient spending across your Compute Engine deployment.

As an example, take a memory-intensive workload that runs best with 16 vCPU, and 70 GB memory. Normally, you would need to pick a VM with 128 GB memory with our standard shapes, or in other cloud contexts, resulting in higher costs to run your workload due to the extra provisioned resources. Instead, with custom machine types, you can easily launch a VM with 16 vCPU and 70 GB memory, resulting in an 18% cost savings vs standard N4-highmem-16 VMs.

5. Make the most of committed use discounts

CUDs are a strategic cost-saving opportunity for organizations with steady, predictable computing needs. By committing to resource usage over one- or three-year periods, you can reduce cloud costs by up to 70% compared to on-demand pricing. This approach not only helps ensure budget predictability but also converts fixed infrastructure spending into a financial advantage, making it ideal for stable workloads that support core business functions.

Google Cloud offers flexible CUD structures to align with various operational models. Resource-based commitments target specific machine types and regions, flexible commitments apply discounts across projects, regions, and machine series — great for dynamic environments. By analyzing historical usage and forecasting future needs, you can identify workloads suited for these discounts, reinvesting the savings into innovation and scaling initiatives.

6. Manage unused disk space

You pay for the total provisioned disk space, regardless of how much you actually use. Many organizations tend to over-provision storage “just in case,” which often leads to unnecessary and costly waste. For instance, if you provision a 100GB disk but only use 20GB, you’re still paying for the entire 100GB. Being intentional and precise with your storage allocations — rather than rounding up to common sizes — can lead to significant cost savings.

To optimize spending, it’s important to adopt a few best practices. Using Ops Agent, regularly audit disk usage across your infrastructure to identify and eliminate inefficiencies. Resize disks to align with actual consumption, allowing a reasonable buffer for growth. Implement automated alerts in Cloud Monitoring to detect underutilized disks and take corrective action. For stateless applications, consider using smaller boot disk images to minimize overhead and reduce costs even further.

In addition, consider the following optimization strategies to further reduce costs and improve efficiency:

Use Google Cloud’s monitoring tools to track CPU, memory, and disk usage over time.
Establish a regular review cycle to identify and right-size over-provisioned resources.
Test workloads across different VM configurations to find the optimal balance between cost and performance.

7. Use Spot VMs

Spot VMs provide the same machine types and configuration options as standard virtual machines but at a significantly reduced cost — typically offering a 60% to 91% discount. This cost efficiency comes with the tradeoff of potential preemption at short notice, making them most suitable for workloads that are fault-tolerant and can recover quickly from unexpected interruptions. Spot VMs are designed to take advantage of unused compute capacity, allowing you to optimize your cloud spending without compromising access to high-performance resources.

Strong use cases for Spot VMs include batch processing jobs, big data and analytics workloads, continuous integration and deployment (CI/CD) pipelines, stateless web servers running in autoscaling groups, and compute-heavy tasks. When properly architected to handle interruptions — for example, by using job checkpointing, load balancing, task queues, or via GKE custom compute classes (see more above) — Spot VMs can play a critical role in minimizing infrastructure costs while maintaining high availability and system resilience. Leveraging Spot VMs in these scenarios lets you scale cost-effectively, especially when compute demand is variable or time-flexible.

8. Use optimization recommendations

Google Cloud’s Recommenders are a powerful tool designed to help you optimize your cloud resources efficiently. When browsing the Google Cloud console, you may see lightbulb icons next to specific resources — these indicate potential improvements identified by Google’s recommendation engine. By analyzing real-time usage patterns and current resource configurations, the Recommender delivers actionable insights tailored to each user’s unique environment. This intelligent system highlights opportunities not only to reduce costs but also to enhance security, performance, reliability, management efficiency, and environmental sustainability.

For example, there are idle VM recommendations to help you identify VM instances that have not been used over the last 1 to 14 days. Common recommendations include switching to more suitable machine types, rightsizing underutilized compute instances, or adopting more cost-effective storage solutions. The tool allows you to apply many of these changes directly, streamlining the optimization process. By continuously evaluating workloads and offering these automated, data-driven suggestions, the Recommendation Hub helps organizations maintain cloud performance while managing costs more effectively.

9. Take advantage of auto-scaling and scheduling

Matching your compute resources to actual demand patterns is one of the most effective ways to reduce cloud waste and improve overall cost efficiency. Many organizations over-provision their resources to handle peak workloads, leaving machines underutilized during off-peak periods. By aligning compute capacity more closely with real-time or predictable usage patterns, such as business hours or seasonal trends, you can significantly cut unnecessary spending without sacrificing performance.

Autoscaling is the key to achieving this efficiency. In fact, customers who leverage Google Compute Engine autoscaling for their virtual machines have seen average infrastructure cost savings of more than 40%.

You can implement autoscaling strategies to dynamically adjust resources based on CPU utilization, load balancing capacity, or custom application metrics, so that workloads receive the necessary compute power when needed, while scaling down automatically during low-demand periods.

For workloads with predictable patterns, such as those that fluctuate with business hours or planned seasonal events, schedule-based scaling is a particularly powerful tool. This approach allows you to proactively increase resources in anticipation of high demand and scale them down during lulls, for the performance you need without constant over-provisioning.

In addition to autoscaling, several practical implementation techniques can further optimize your resource usage. Setting up instance scheduling lets you automatically start and stop development and test environments according to business hours — a simple yet highly effective approach that can lead to cost savings of up to 70%. You can also leverage maintenance windows to reduce disruptions and resource consumption, by concentrating updates and system changes into low-usage periods. Together, these tactics help maintain high availability and performance while keeping infrastructure costs under control.

10. Understand your spend with detailed billing analysis

Before implementing any cost-saving strategies in Google Cloud, it’s essential to understand your current spending in detail. Google Cloud’s billing panel offers granular visibility into your expenses, including costs broken down by individual SKUs. This level of transparency lets you track where your money is going and identify potential inefficiencies. Begin by regularly reviewing your billing dashboard to monitor usage trends and spot anomalies. Applying labels and tags to your resources can further help categorize and attribute costs accurately, especially in complex environments with multiple projects or departments.

In addition, setting up budget alerts is a practical way to stay ahead of overspending by notifying you when costs approach or exceed predefined thresholds. It’s also important to identify and eliminate unused or idle resources, such as virtual machines or persistent disks that are no longer in active use — these can often be shut down or deleted to immediately reduce costs. By thoroughly analyzing your cost structure, you can uncover “low-hanging fruit” — resources that provide little or no value — and make data-driven decisions to optimize your cloud usage efficiently.

11. Consider serverless alternatives

Last but not least, Google Cloud’s serverless computing offerings provide a compelling alternative to traditional virtual machines, can deliver better cost efficiency, simplified operations, and greater scalability. By abstracting away infrastructure management, serverless platforms allow teams to focus on writing and deploying code without worrying about provisioning, scaling, or maintaining servers. This shift can not only reduce operational overhead but also cut costs by aligning compute spending directly with application usage.

There are multiple serverless options available, each tailored to different workloads. Cloud Run is designed for running containerized applications that need rapid scaling and flexible deployment. Cloud Run Functions supports lightweight, event-driven code execution for microservices or automation tasks. GKE (Autopilot Mode) simplifies Kubernetes operations by automatically managing nodes and scaling, allowing you to run Kubernetes workloads without handling the underlying infrastructure. All these options charge based on usage not allocation, significantly reducing costs associated with idle resources and over-provisioning. This makes them especially beneficial for variable or unpredictable workloads. Cloud Run and GKE both support GPU’s and flexibility to move between the two. You can start with Cloud Run then move to GKE or vice-versa. Some customers also leverage both offerings for workloads. The rule of thumb is to start with GKE if you need access to the Kubernetes API. Otherwise, start with Cloud Run.

Start reducing your costs today

Migrate to Google Cloud and optimize your infrastructure costs without compromising on what your workloads need. If you are new to Google Cloud, start with a migration assessment. Google Cloud’s Migration Center can help you with a clear understanding of your potential savings by migrating to Google Cloud, with detailed recommended paths for your workloads, along with TCO reports. Apply the strategies in this article and unlock substantial cost savings.

Read More for the details.

2025 10 06

AWS – Amazon Connect launches new case APIs to link related cases, add custom related items, and search across them

Tibor Kiss AWS, Cloud AWS

Amazon Connect now allows you to programmatically enrich case data by linking related cases, attaching custom related items, and searching across them, so agents have the full context they need to resolve issues faster. For example, an airline can link all customer cases tied to a single flight cancellation to coordinate rebookings and send proactive updates, while a retailer can attach order and shipment details to a refund request to deliver faster resolutions and keep customers informed.

Amazon Connect Cases is available in the following AWS regions: US East (N. Virginia), US West (Oregon), Canada (Central), Europe (Frankfurt), Europe (London), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and Africa (Cape Town) AWS regions. To learn more and get started, visit the Amazon Connect Cases webpage and documentation.

Read More for the details.

2025 10 06

GCP – More choice, more control: self-deploy proprietary models in your VPC with Vertex AI

Tibor Kiss Cloud, Google Cloud gcp

Building the best AI applications requires both the freedom to choose the most powerful, specialized model for the task at hand, and a platform that can handle them all. This flexibility is core to the Vertex AI platform, and today, we’re taking a significant step forward in our commitment to giving you unparalleled choice and control.

We are excited to announce that you can now securely deploy a growing selection of leading proprietary models from industry partners, including AI21 Labs, CAMB.AI, CSM, Mistral AI, Qodo, and Virtue AI, with models from Contextual AI and WRITER coming soon. You can deploy these models — including closed-source models and those with restricted commercial licenses — directly into your own Virtual Private Cloud (VPC).

You will find all of these models in the Vertex AI Model Garden, our central gateway to over 200 foundation models, including Google’s versatile Gemini family, leading open models, and third-party models. We provide a single, curated catalog where you can discover, test, and deploy the ideal model for your application.

Announcing self-deployable proprietary models in your VPC

For organizations that require maximum control over their data and infrastructure, you can now self-deploy powerful proprietary models from leading AI model builders directly within your VPC. With this new capability, you can acquire commercial licenses via Google Cloud Marketplace and deploy models securely within your environment, all while meeting Google Cloud’s high standards for security and compliance. Self-deploying proprietary models with Google Cloud gives you a number of benefits:

Deploy models within your VPC with full adherence to your VPC-SC policies, providing the highest assurance that your proprietary business data never leaves your environment. You can evaluate and deploy third-party models to production on a trusted platform.
You can optimize for performance or cost by selecting from a range of available machine types. Scale your replica count up or down manually to meet workload demands, or configure auto-scaling policies for hands-free management. Deploy to specific Google Cloud regions of your choice to achieve data compliance in your target markets or select locations for low latency delivery to your customers.
Discover, license, and deploy a curated selection of proprietary models from industry-leading providers, all in one place. We’re launching with models from eight partners—AI21 Labs, CAMB.AI, Contextual AI, CSM, Mistral AI, Qodo, Virtue AI, and WRITER. These models cover a wide range of use-cases and specializations. This is just the beginning, and you’ll see us continue to expand our catalog with the latest generative AI models.
Go from model discovery to production with ease. You can procure commercial licenses and deploy the models with just a few clicks directly from the Model Garden console. Our fully managed AI inference service handles the underlying infrastructure, so you’re free to focus on building your application.
Get started with simple pay-as-you-go pricing, so you only pay for what you use. You control your costs by scaling your deployment to meet your needs, and not deal with artificial limits or quota caps. You can further optimize costs by applying your existing Google Cloud committed-use discounts (CUDs) or reservations.

Meet Our Launch Models

Explore the new models available today for self-deployment in your VPC:

AI21 Labs – Jamba Large 1.6: Delivers leading model quality at fast speed, making it an excellent choice for private enterprise deployment.
CAMB.AI – MARS7: Enables you to ship production-ready voice applications with hyper-realistic, multilingual text-to-speech (TTS) outputs featuring optional voice cloning and fine-grained emotional control.
(Coming soon!) Contextual AI – Reranker: Designed to significantly enhance the relevance and quality of Retrieval-Augmented Generation (RAG) systems.
CSM – Cube: A generative AI model that transforms 2D images into detailed 3D models with remarkable precision and efficiency.
Mistral AI – Codestral (25.01): Explicitly designed for code generation tasks, helping developers write and interact with code through a shared instruction.
Qodo – Embed-1: A suite of large-scale code embedding models that enhance search accuracy for RAG by enabling efficient code and text retrieval.
Virtue AI – VirtueGuard: An enterprise-ready AI guardrail model that enables real-time content security, policy enforcement, and regulatory compliance with multilingual support for generative AI systems.
(Coming soon!) WRITER – Palmyra X4: Enterprise-grade LLM that combines a 128K token context window with a suite of capabilities, including advanced reasoning, tool calling, LLM delegation, built-in RAG, code generation, structured outputs, multi-modality, and multilingual

How to Get Started

You can deploy these new models in three simple steps:

Visit the Vertex AI Model Garden. On the left-hand navigation tab, under “Model Collections,” select “Self-deploy partner models”.
Select the model from a partner that you choose to deploy. To use the selected model, purchase a license by clicking “Enable”.
Your license is active in a few seconds. Simply click “Deploy” to configure and deploy the model endpoint in your VPC using Model Garden’s one-click deployment workflow.

We are committed to providing the most open and flexible AI platform for the enterprise. With even more choice and the fine-grained control and security of your own environment, you have everything you need to innovate responsibly. Explore the new models in Model Garden today and start building today!

Read More for the details.

2025 10 06

AWS – Amazon Connect now enables you to customize service level calculations

Tibor Kiss AWS, Cloud AWS

Amazon Connect now enables you to customize service level calculations to your specific needs. Supervisors and managers can define time thresholds for when a contact is considered to meet service level standards and select which contact outcomes to include in the calculation. For example, managers can choose to count callback contacts, exclude contacts transferred out while waiting in queue, and exclude short abandons using a configurable time threshold. Customization of service level calculation is available from the metric configuration section on the analytics dashboards.

With this feature supervisors and managers can now create a service level metric calculation that better aligns with their business operations. With a customized view of service level performance, operations managers can assess how effectively they have met their service standards.

This new feature is available in all AWS regions where Amazon Connect is offered. To learn more about customizing your service level calculation, visit the Admin Guide. To learn more about Amazon Connect, the easy-to-use cloud contact center, visit the Amazon Connect website.

Read More for the details.

2025 10 06

AWS – New Compute Optimized Amazon EC2 C8i and C8i-flex instances

Tibor Kiss AWS, Cloud AWS

AWS is announcing the general availability of new compute optimized Amazon EC2 C8i and C8i-flex instances. These instances are powered by custom Intel Xeon 6 processors, available only on AWS, delivering the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. These C8i and C8i-flex instances offer up to 15% better price-performance, and 2.5x more memory bandwidth compared to previous generation Intel-based instances. They deliver up to 20% better performance than C7i and C7i-flex instances, with even higher gains for some workloads. The C8i and C8i-flex instances are up to 60% faster for NGINX web applications, up to 40% faster for AI deep learning recommendation models, and 35% faster for Memcached stores compared to C7i and C7i-flex instances.

C8i-flex are the easiest way to get price performance benefits for a majority of compute intensive workloads like web and application servers, databases, caches, Apache Kafka, Elasticsearch, and enterprise applications. They offer the most common sizes, from large to 16xlarge, and are a great first choice for applications that don’t fully utilize all compute resources.

C8i instances are a great choice for all compute intensive workloads, especially for workloads that need the largest instance sizes or continuous high CPU usage. C8i instances offer 13 sizes including 2 bare metal sizes and the new 96xlarge size for the largest applications.

C8i and C8i-flex instances are available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Spain).

To get started, sign in to the AWS Management Console. Customers can purchase these instances via Savings Plans, On-Demand instances, and Spot instances. For more information about the new C8i and C8i-flex instances visit the AWS News blog.

Read More for the details.

2025 10 03

AWS – AWS Glue adds write operations for SAP OData, Adobe Marketo Engage, Salesforce Marketing Cloud, and HubSpot connectors

Tibor Kiss AWS, Cloud AWS

AWS Glue adds write operations support for SAP OData, Adobe Marketo Engage, Salesforce Marketing Cloud, and HubSpot connectors. This allows you to not only extract data from those applications, but also write data to them directly from your AWS Glue ETL jobs.

With the new write functionality you can create and update records in SAP systems; sync leads into Adobe Marketo Engage; updating subscriber and campaign data in Salesforce Marketing Cloud; manage contacts, companies, and deals in HubSpot; and more.

This feature simplifies building end-to-end ETL pipelines that both extract data from and write processed results back to target applications, eliminating the need for custom scripts or intermediate systems.

Write operations support for SAP OData, Adobe Marketo Engage, Salesforce Marketing Cloud, and HubSpot connectors is available in all Regions where AWS Glue is available. To learn more and see the list of supported entities, visit AWS Glue documentation.

Read More for the details.

2025 10 03

AWS – Amazon OpenSearch Ingestion now supports batch AI inference

Tibor Kiss AWS, Cloud AWS

You can now perform batch AI inference within Amazon OpenSearch Ingestion pipelines to efficiently enrich and ingest large datasets for Amazon OpenSearch Service domains.

Previously, customers used OpenSearch’s AI connectors to Amazon Bedrock, Amazon SageMaker, and 3rd-party services for real-time inference. Inferences generate enrichments such as vector embeddings, predictions, translations, and recommendations to power AI use cases. Real-time inference is ideal for low-latency requirements such as streaming enrichments. Batch inference is ideal for enriching large datasets offline, delivering higher performance and cost efficiency. You can now use the same AI connectors with Amazon OpenSearch Ingestion pipelines as an asynchronous batch inference job to enrich large datasets such as generating and ingesting up to billions of vector embeddings.

This feature is available in all regions that support Amazon OpenSearch Ingestion and 2.17+ domains. Learn more from the documentation.

Read More for the details.

2025 10 03

GCP – Connect Spark data pipelines to Gemini and other AI models with Dataproc ML library

Tibor Kiss Cloud, Google Cloud gcp

Many data science teams rely on Apache Spark running on Dataproc managed clusters for powerful, large-scale data preparation. As these teams look to connect their data pipelines directly to machine learning models, there’s a clear opportunity to simplify the integration. But running inference on a Spark DataFrame using a model from Vertex AI typically requires custom development, making it complex to build a single, end-to-end workflow.

To solve this problem, we are developing a new open-source Python library designed to simplify AI/ML inference for Dataproc. This library connects your Apache Spark jobs to use popular ML frameworks and Vertex AI features, starting with model inference. Because the library is open-sourced, you will be able to use it directly in your application code with full transparency into its operation.

How it works

Dataproc ML is built to feel familiar to Spark users, following a SparkML-style builder pattern. You configure the model you want to use, and then call .transform() on your DataFrame. Let’s look at a few common inference use cases.

Apply Gemini models to your Spark data

You can apply generative AI models, like Gemini, to columns in your Spark DataFrame. This is useful for tasks like classification, extraction, or summarization at scale. In this example, we take a DataFrame with “city” & “country” columns and use Gemini to create a new column by providing a simple prompt.

You can test in your local environment by installing from PyPi:

code_block: <ListValue: [StructValue([(‘code’, ‘pip install dataproc-ml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55db20>)])]>

To deploy/test at scale, create a Dataproc version 2.3-ml cluster:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud dataproc clusters create my-ml-cluster \rn –project=”YOUR_PROJECT_ID” \rn –region=”YOUR_REGION” \rn –image-version=2.3-ml-ubuntu \rn –properties=’dataproc:pip.packages=dataproc-ml==0.1”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d8e0>)])]>

Copy this example to a file gemini_spark.py.

code_block: <ListValue: [StructValue([(‘code’, ‘from pyspark.sql import SparkSessionrnfrom google.cloud.dataproc_ml.inference import GenAiModelHandlerrnrnspark = SparkSession.builder.getOrCreate()rnrn# Create a sample DataFramerndf = spark.createDataFrame([rn (“London”, “UK”),rn (“Bengaluru”, “India”),rn (“Paris”, “France”),rn (“Tokyo”, “Japan”)rn], [“city”, “country”])rnrn# Configure the model handler. It uses gemini-2.5-flash by default.rngenai_handler = GenAiModelHandler().prompt(rn “Write a short, one-line rhyming poem about the experience of visiting {city} in {country}.”rn)rnrn# Apply the model, which will output to a new `predictions` columnrngenai_handler.transform(df).show(truncate=False)rnrn# Outputrn# +———+——-+———————————————-+rn# |city |country|predictions |rn# +———+——-+———————————————-+rn# |London |UK |Big Ben’s loud chime, a magical time! |rn# |Bengaluru|India |Bengaluru’s green, a vibrant tech scene. |rn# |Paris |France |In Paris, I fell for romance at first glance. |rn# |Tokyo |Japan |In Tokyo’s vibrant pace, a smile upon my face.|rn# +———+——-+———————————————-+’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d520>)])]>

The handler is flexible to support customized options, as explained in the documentation.

code_block: <ListValue: [StructValue([(‘code’, ‘from google.cloud.dataproc_ml.inference import GenAiModelHandlerrnfrom vertexai.generative_models import GenerationConfigrnrn# Configure the model handlerrngenai_handler = (rn GenAiModelHandler()rn .prompt(“Write a short, one-line rhyming poem about the experience of visiting {city} in {country}.”)rn .model(“gemini-2.5-pro”)rn .output_col(“city_poem”)rn .generation_config(GenerationConfig(temperature=0.7))rn)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d550>)])]>

Submit this job to your Dataproc cluster:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud dataproc jobs submit pyspark gemini_spark.py \ rn –cluster=my-ml-cluster \rn –region=”YOUR_REGION”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d130>)])]>

2. Run inference with PyTorch and TensorFlow models

In addition to calling Gemini endpoints, the library also allows you to run inference with model files loaded directly from Google Cloud Storage. You can use the PyTorchModelHandler (and a similar handler for TensorFlow) to load your model weights, define a pre-processor, and run inference directly on your worker nodes. This is useful when you want to run batch inference at scale without managing a separate model serving endpoint.

code_block: <ListValue: [StructValue([(‘code’, ‘from google.cloud.dataproc_ml.inference import PyTorchModelHandlerrnrn# Get weights and transforms for the modelrnweights = ResNet50_Weights.DEFAULTrnrnimage_df = spark.read.format(“binaryFile”).load(“gs://cloud-samples-data/generative-ai/image/”)rnrndef vectorized_preprocessor(image_bytes_series: pd.Series) -> pd.Series:rn “””Applies ResNet50 transforms to a series of image bytes.”””rn return image_bytes_series.apply(rn lambda b: weights.transforms()(Image.open(io.BytesIO(b)).convert(“RGB”))rn )rnrnpytorch_handler = (rn PyTorchModelHandler()rn .model_path(“gs://<bucket>/resnet50_full_model.pt”)rn .input_cols(“content”)rn .pre_processor(vectorized_preprocess)rn .set_return_type(ArrayType(FloatType()))rn)rnrnpytorch_handler.transform(image_df).show()’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d970>)])]>

Built for performance

This library isn’t just a simple wrapper. It’s designed for running inference on large Dataproc clusters and includes several optimizations for inference:

Vectorized data transfer: We use pandas_udf to efficiently move data between Spark and the Python worker processes.
Connection re-use: Connections to the endpoint are re-used across partitions to reduce overhead.
Retry logic: The library automatically handles errors like HTTP 429 (resource exhausted) with exponential backoff and retries.

Get started

You can start using it today by checking out the open-source repository and reading our documentation.

Looking ahead, we plan to add the following features to this library in the coming months.

Spark Connect support: This would also allow using above functionalities within BigQuery Studio notebooks.
Vertex AI integrations: To ease inference, we plan to add more ModelHandlers to:
1. Directly call a vertex model endpoint for online inference
2. Refer to Vertex models and localize them to Spark workers
3. Refer to models hosted in Vertex Model Garden including embedding models
More Optimizations: Auto-repartition input dataframes to enhance inference runtime
Third-party integrations: Refer to open sourced models in HuggingFace

We are actively working on including this library by default in Dataproc on Google Compute Engine ML images and Google Cloud Serverless for Apache Spark runtimes.

We look forward to seeing what you build! Have feedback or feature requests to further simplify your AI/ML experience on spark? Reach us at dataproc-feedback@google.com.

Read More for the details.

2025 10 03

AWS – Amazon Kinesis Video Streams now supports IPv6 for Streams capability

Tibor Kiss AWS, Cloud AWS

Today, AWS announces Internet Protocol version 6 (IPv6) addressing support for Amazon Kinesis Video Streams (KVS). With this enhancement, KVS now offers dual-stack endpoints that let customers use both IPv4 and IPv6 addresses to stream video from millions of devices. This means that existing IPv4 implementations continue to work seamlessly while gaining the benefits of IPv6 connectivity.

As customers increasingly encounter IPv4 address exhaustion in their private networks, this enhancement delivers much-needed flexibility. Organizations can now seamlessly stream videos using IPv4, IPv6, or dual-stack clients. This advancement simplifies IPv6-based system transitions, ensures compliance requirements are met, and eliminates dependency on costly address translation equipment.

IPv6 support is available in all commercial AWS Regions where Amazon KVS is available except Ap-Southeast-1 and GovCloud regions . To learn more about Amazon KVS, refer to the developer guide.

Read More for the details.