GCP – Expanding Vertex AI with the next wave of generative AI media models
Today, we are introducing the next wave of generative AI media models on Vertex AI: Imagen 4, Veo 3, and Lyria 2.
We’ve already seen customers generate stunning, photorealistic images with Imagen 3, Google’s image generation model. Customers have taken these images and transformed them into high quality videos and assets with Veo 2. We’ve even seen customers take these remarkable videos and bring them to life with professional-grade audio using Lyria, Google’s advanced AI music generation model.
With a surge of momentum in the generative AI media space across marketing, media, and more, storytelling has never been easier. Users are creating campaign assets quicker, and building breakthrough creative content. Let’s take a look into each model and the ways you can get started today.
Imagen 4: Higher quality image generation
Today we’re introducing Imagen 4 text-to-image generation on Vertex AI in public preview. As Google’s highest quality image generation model, Imagen 4 delivers:
-
Outstanding text rendering and prompt adherence
-
Higher overall image quality across all styles
-
Multilingual prompt support to help creators globally
Prompt: Capture an intimate close-up bathed in warm, soft, late-afternoon sunlight filtering into a quintessential 1960s kitchen. The focal point is a charmingly designed vintage package of all-purpose flour, resting invitingly on a speckled Formica countertop. The packaging itself evokes pure nostalgia: perhaps thick, slightly textured paper in a warm cream tone, adorned with simple, bold typography (a friendly serif or script) in classic red and blue “ALL-PURPOSE FLOUR”, featuring a delightful illustration like a stylized sheaf of wheat or a cheerful baker character. In smaller bold print at the bottom of the package: “NET WT 5 LBS (80 OZ) 2.27kg”. Focus sharply on the package details – the slightly soft edges of the paper bag, the texture of the vintage printing, the inviting “All-Purpose Flour” text. Subtle hints of the 1960s kitchen frame the shot – the chrome edge of the counter gleaming softly, a blurred glimpse of a pastel yellow ceramic tile backsplash, or the corner of a vintage metal canister set just out of focus. The shallow depth of field keeps attention locked on the beautifully designed package, creating an aesthetic rich in warmth, authenticity, and nostalgic appeal.
Prompt: This four-panel comic strip uses a charming, deliberately pixelated art style reminiscent of classic 8-bit video games, featuring simple shapes and a limited, bright color palette dominated by greens, blues, browns, and the dinosaur’s iconic grey/black. The setting is a stylized pixel beach. Panel one shows the familiar Google Chrome T-Rex dinosaur, complete with its characteristic pixelated form, wearing tiny pixel sunglasses and lounging on a pixelated beach towel under a blocky yellow sun. Pixelated palm trees sway gently in the background against a blue pixel sky. A caption box with pixelated font reads, “Even error messages need a vacation.” Panel two is a close-up of the T-Rex attempting to build a pixel sandcastle. It awkwardly pats a mound of brown pixels with its tiny pixel arms, looking focused. Small pixelated shells dot the sand around it. Panel three depicts the T-Rex joyfully hopping over a series of pixelated cacti planted near the beach, mimicking its game obstacle avoidance. Small “Boing! Boing!” sound effect text appears in a blocky font above each jump. A pixelated crab watches from the side, waving its pixel claw. The final panel shows the T-Rex floating peacefully on its back in the blocky blue pixel water, sunglasses still on, with a contented expression. A small thought bubble above it contains pixelated “Zzz…” indicating relaxation.
Prompt: Filmed cinematically from the driver’s seat, offering a clear profile view of the young passenger on the front seat with striking red hair. Her gaze is fixed ahead, concentrated on navigating the dusty, lonely highway visible through her side window, which shows a blurred expanse of dry earth and perhaps distant, hazy mountains. Her arm rests on the window ledge or steering wheel. The shot includes part of the aged truck interior beside her – the door panel, maybe a glimpse of the worn seat fabric. The lighting could be late afternoon sun, casting long shadows and warm highlights across her face and the truck’s interior. This angle emphasizes her individual presence and contemplative state within the vast, empty landscape.
To get started with Imagen 4 in public preview on Vertex AI, you can use Media Studio or run the following code sample, which uses the Google Gen AI SDK for Python.
- code_block
- <ListValue: [StructValue([(‘code’, ‘from google import genairnrn# TODO(developer): Update and un-comment below linesrn# project_id = “PROJECT_ID”rnclient = genai.Client(vertexai=True, project=project_id, location=”us-central1”)rnrnprompt = “””rnA white wall with two Art Deco travel posters mounted. First poster has the text: “NEPTUNE”, tagline: “The jewel of the solar system!’ Second poster has the text: “JUPITER”, tagline: “Travel with the giants!rn”””rnrnimage = client.models.generate_images(rn model=”imagen-4.0-generate-preview-05-20”,rn prompt=prompt,rn)rnrn# OPTIONAL: View the generated image in a notebookrn# image.generated_images[0].image.show()’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5bc47874f0>)])]>
Veo 3: Higher-quality video generation with audio and speech
Veo 3 is our latest state-of-the art video generation model from Google DeepMind. With Veo 3, you can generate videos with:
-
Improved quality when generating videos from text and image prompts
-
Speech, such as dialogue and voice-overs
-
Audio, such as music and sound effects
Here’s what a few of our customers have to say about productivity and creative gains with Veo:
Klarna, a leader in digital payments, is leveraging Veo and Imagen on Vertex AI to boost content creation efficiency. From b-roll to YouTube bumpers, the company is significantly reducing production timelines.
“At Klarna, we’re constantly exploring ways to push the boundaries of innovation in our marketing efforts, and Veo has been a game-changer in our creative workflows. With Veo and Imagen, we’ve transformed what used to be time-intensive production processes into quick, efficient tasks that allow us to scale content creation rapidly. Whether it’s producing engaging b-roll, crafting eye-catching YouTube bumpers, or developing dynamic social media animations, these tools have empowered our teams to be more agile and creative. The results speak for themselves, driving increased engagement and content performance. With Google Cloud, we’re laying the groundwork for the future of commerce and revolutionizing how we bring our brand to life.” – David Sandström, Chief Marketing Officer, Klarna
Jellyfish, a renowned digital marketing company within The Brandtech Group, has integrated Veo into their top performing AI marketing platform, Pencil, and teamed up with Japan Airlines to offer AI generated in-flight entertainment.
“The addition of Veo 2 in Pencil reinforces our commitment to empowering marketers with sophisticated AI, enabling them to produce campaigns that are not only smarter and faster but also bolder and more artistically inspired. Our pilots have shown incredible results, with an average 50% reduction in costs and time-to-market efficiencies. This step change in control and quality turns previously impossible ideas into real marketing content in minutes. Japan Airlines is leading the way in applying Gen AI to the travel industry, and we’re excited to see how other brands follow suit.” – David Jones, Founder & CEO, Brandtech
Kraft Heinz’s Tastemaker platform empowers their teams with access to Imagen and Veo, dramatically accelerating creative and campaign development processes.
“With Veo and Imagen on Vertex AI as part of our Tastemaker platform, Kraft Heinz has unlocked unprecedented speed and efficiency in our creative workflows. What once took us eight weeks is now only taking eight hours, resulting in substantial cost savings.” – Justin Thomas, Head Digital Experience & Growth
Envato, a global leader for digital creative assets and templates, used Veo 2 to develop their newly launched video generation feature, VideoGen, to enable creative professionals to turn text or images into hyper realistic and cinematic video content.
“We’ve tried many of the top video models, and Veo 2 has driven the most impressive results in terms of speed and quality across a diverse set of text and image inputs. Within the first few days of launch, tens of thousands of Envato subscribers were already accessing VideoGen, with nearly 60% of their generated videos being downloaded for use in creative projects. Since March, Envato has seen VideoGen usage surpass 100%+ month over month. It’s been a pleasure working with Google Cloud to bring Envato’s VideoGen feature to life with Veo.” said Aaron Rutley, Head of Product for AI at Envato.
See how it works: Veo 3 is capable of handling intricate prompt details, as demonstrated in the following examples.
Prompt: A medium shot, historical adventure setting: Warm lamplight illuminates a cartographer in a cluttered study, poring over an ancient, sprawling map spread across a large table. Cartographer: “According to this old sea chart, the lost island isn’t myth! We must prepare an expedition immediately!”
Prompt: A low-angle shot shows an open, light purple door leading from a room with light purple walls and a gray floor to a vibrant outdoor scene. Lush green grass and wildflowers spill from the doorway onto the indoor floor, creating a whimsical transition between spaces. Beyond the door, rolling green hills dotted with more wildflowers stretch towards a bright, clear sky. A single tree stands prominently in the foreground of the outdoor scene, its leaves adding depth to the view. The sunlight and natural elements contrast with the simplicity of the indoor space, inviting a sense of wonder and escape.
Veo 3 is in private preview on Vertex AI and will be available more broadly in the coming weeks. If you’re interested in early access, please fill out this form.
Lyria 2: Greater creative control with music generation
At Google Cloud Next 2025, we announced Lyria in Vertex AI, Google’s text-to-music model. Today, we’re announcing Lyria 2 is generally available in Vertex AI. As Google’s latest music generation model, Lyria 2 features high-fidelity music across a range of styles. As your next creative collaborator, Lyria 2 provides:
-
High-quality audio content from text prompts
-
Greater creative control over instruments, BPM, and other characteristics
To start creating content with Lyria 2, check out Media Studio on Vertex AI. Once there, you can start generating music from text prompts or access the model API via Vertex AI. For inspiration, check out some of the music clips and prompts below.
Prompt: Upbeat, Rhythmic Peruvian Cumbia with a psychedelic edge, LA, Live performance at a Latin music Festival, incorporating electric guitars, bass, and often utilizing a prominent timbales percussion section, creating a powerful and danceable vibe. Vibrant and energetic.
Prompt: Sweeping Orchestral Film Score, Pristine Studio recording, London, 100-piece Orchestra, Majestic and profound. A blend of soaring melodies, dramatic harmonic shifts, and powerful percussive elements, with instruments such as french horns, strings, and timpani, and a thematic approach, featuring intricate orchestrations, dynamic range, and emotional depth, evoking a cinematic and awe-inspiring atmosphere.
See what some of our customers have to say about Lyria 2 so far:
Captions is an AI-powered video creation tool that allows users to create studio-grade talking videos quickly and easily. They have integrated Lyria 2 into their Mirage Edit feature enabling customers to quickly generate complete videos with customized sound.
“At Captions, our Mirage Edit feature already gives subscribers the power to go from prompt to fully-edited AI talking video — complete with images, B-roll clips, voiceovers, and transitions. Now, we’re adding a keystone element: adaptive music powered by Google’s Lyria 2. With a single prompt, Lyria composes a score that syncs to the script, pacing, and transitions at every emotional beat, so our customers can publish cinematic short-form videos without ever leaving Captions or shuffling through stock libraries.” said Dwight Churchill, Co-Founder and COO, Captions.ai
Dashverse, owner of digital content platforms such as Dashtoon and DashReels, is leveraging Google’s Lyria 2 on Vertex AI to provide the next generation of AI-native creators with advanced music generation capabilities. This integration allows users to craft dynamic and emotionally responsive soundtracks that seamlessly adapt to the narrative and pacing of their content on platforms like DashReels.
“We’ve always believed in empowering everyday creators at Dashverse — whether they’re making comics with Dashtoon or short dramas on DashReels. Our move into dynamic, emotionally resonant storytelling with DashReels needed a music engine that was just as expressive and responsive. Lyria 2 on Vertex AI delivers exactly that. It gives our users studio-level control over music — adapting to emotion, scene, and pacing — without the overhead. It’s not just a soundtrack generator; it’s a storytelling amplifier. We’re incredibly excited about what this unlocks for the next generation of AI-native creators.” said Soumyadeep Mukherjee, CTO, Dashverse
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5bc32b6700>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Create securely and share responsibly
The security and safety of any AI generated content is crucial. Therefore, these models are designed with built in safeguards, allowing you to concentrate on your creative work. Veo 3, Imagen 4, and Lyria 2 are all built with safety as a fundamental design principle in partnership with Google DeepMind.
Watermarking: By default, all creations generated with Veo, Imagen, and Lyria utilize SynthID, a technology that embeds an invisible watermark directly into the generated output. This watermark allows for the identification of AI generated media, ensuring transparency.
Safety filters: Both input prompts and output content for all generative AI media models are accessed against a list of safety filters. By being able to configure how aggressively the content is filtered, you can ensure the assets meet your brand values. In visual output data, you also have control over person generation.
Get started
You can learn more about these new models by checking out the resources below:
Read More for the details.