Tibor Kiss

About Tibor Kiss

Posts by Tibor Kiss:

2025 08 18

AWS – Amazon S3 introduces a new way to verify the content of stored datasets

Amazon S3 provides a new way to verify the content of stored datasets. You can efficiently verify billions of objects and automatically generate integrity reports to prove that your datasets remain intact over time using S3 Batch Operations. This capability works with any object stored in S3, regardless of storage class or object size, without the need to restore or download data. Whether you’re verifying objects for data preservation, accuracy checks, or compliance requirements, you can reduce the cost, time, and effort required.

With S3 Batch Operations, you can create a compute checksum job for your objects. To get started, provide a list of objects (called a manifest) or specify the bucket with filters like prefix or suffix. Then choose “Compute checksum” as the operation type and select from supported algorithms including SHA-1, SHA-256, CRC32, CRC32C, CRC64, and MD5. When the job completes, you receive a detailed report with checksum information for all processed objects. You can use this report for compliance or audit purposes. This capability complements S3’s built-in validation, letting you independently verify your stored data any time.

This new data verification, compute checksum operation, is now available in all AWS Regions. For pricing details, visit the S3 pricing page. To learn more, visit the S3 User Guide.

Read More for the details.

2025 08 18

AWS – Amazon Bedrock now supports Batch inference for Anthropic Claude Sonnet 4 and OpenAI GPT-OSS models

Tibor Kiss AWS, Cloud AWS

Anthropic’s Claude Sonnet 4 and OpenAI’s GPT-OSS 120B and 20B models are now available for Batch inference in Amazon Bedrock. With Batch inference, you can run multiple inference requests asynchronously, improving performance on large datasets at 50% of the on-demand inference pricing. Amazon Bedrock offers select foundation models (FMs) from leading AI providers such as Anthropic, OpenAI, Meta, and Amazon for batch inference, making it easier and more cost-effective to process high-volume workloads.

With Batch inference on Claude Sonnet 4 and OpenAI GPT-OSS models, you can process large datasets for scenarios such as document and customer feedback analysis, bulk content generation (e.g., marketing copy, product descriptions), large-scale prompt or output evaluations, automated summarization of knowledge bases and archives, mass categorization of support tickets or emails, and extraction of structured data from unstructured text—at scale and with lower costs. We’ve optimized our Batch offering to deliver higher overall batch throughput on these newer models compared to previous ones. In addition, you can now track your Batch workload progress at the AWS account level with Amazon CloudWatch metrics. For all models, these metrics include total pending records, processed records and tokens per minute, and for Claude models, they also include tokens pending processing.

To learn more about Batch inference in Amazon Bedrock, visit the Batch inference documentation. You can visit Supported Regions and models for batch inference page for more details on supported models and follow Amazon Bedrock API reference to get started with Batch inference.

Read More for the details.

2025 08 18

AWS – Amazon S3 Express One Zone now supports resilience testing with AWS Fault Injection Service

Tibor Kiss AWS, Cloud AWS

Amazon S3 Express One Zone, a high-performance S3 storage class for latency-sensitive applications, now supports resilience testing with AWS Fault Injection Service (FIS). With this launch, you can use the FIS network disruption action to test the failover response and recovery of your latency-sensitive applications in the unlikely event of a disruption to an Availability Zone (AZ) that impairs access to your data. You can use the results of FIS experiments to verify your monitoring, test recovery processes, and improve application resilience.

With FIS, you can disrupt connectivity to your S3 Express One Zone data in S3 directory buckets, helping you validate application resilience. During the FIS experiment, data plane requests made to directory buckets will timeout. This fault action is also included in the FIS AZ Availability: Power Interruption scenario, so you can test the resilience of your applications when an AZ event impacts multiple AWS services.

You can use the updated FIS network disruption action for S3 Express One Zone data in all AWS Regions where the storage class is available. To get started with testing the resilience of applications that store data in S3 Express One Zone, you can use the AWS Management Console, AWS CLI, or FIS API. For pricing information, visit the FIS pricing page. To learn more, visit the AWS FIS user guide.

Read More for the details.

2025 08 18

GCP – The AI-powered shift to “living games:” Meet the customers and partners leveling up the ways we play

Tibor Kiss Cloud, Google Cloud gcp

The games industry is on a powerful ride, surging forward with innovation and a sharp focus on the player experience. For years, the industry’s evolution was defined by familiar IPs getting better graphics and gameplay. At Google Cloud, we believe we’re on the cusp of something far more radical — a shift on the scale of the transition from cartridges to CD-ROMs, or 2D to 3D graphics. This new era is defined by the rise of “living games,” a new form of dynamic, ever-evolving experiences powered by AI that captivate players for years.

With the global market for games surpassing $180 billion in 2024, this fundamental shift in how games are developed, played, and experienced creates an entirely new opportunity for the industry. A big part of what’s driving this shift is the transformative power of cloud computing and AI, and many cutting-edge developers and startups are already taking advantage of these advances.

Cloud platforms are now the core of a $12.9 billion ecosystem within games, with AI adoption emerging as a central growth driver. In fact, Google Cloud’s new survey, conducted by The Harris Poll, reveals just how deeply integrated AI has become: 97% of game developers agree that AI is reshaping the games industry. They already see this evolving technology as fundamentally changing how they create games and what players expect. This new technology is turning the weeks-long live-operations cycle into an instantaneous, AI-driven feedback loop that creates a game world that feels truly alive.

The vision of truly living games is no longer a distant dream; it’s a reality unfolding today. Google Cloud is helping drive this forward through the powerful combination of Google’s deep live service expertise and cutting-edge cloud and generative AI technologies.

It’s this kind of innovation that’s driving new and expanded collaborations with incredible games customers and partners, including Atlas, Embody, Ludeo, Nacon, and Nitrado. These pioneers are pushing the boundaries of what’s possible in games, from creating more immersive player experiences to accelerating game development and scaling their operations.

Atlas: AI for creating vast 3D game worlds

Atlas is an agentic 3D-content creation platform designed for professional game studios, enabling them to generate game-ready assets, environments, tools, and workflows. It focuses on production-scale workflows rather than one-off asset generation, acting as a creative assistant through its multi-agent AI system. Developers can co-create with intelligent AI agents using natural language prompts, ensuring the output is tailored to their specific technical and aesthetic goals. Atlas integrates with industry-standard pipelines like Unreal Engine, Unity, and Houdini, making it ideal for AA+ teams building complex games at scale.

“We believe AI-native games will define the next chapter in interactive entertainment,” said Ben James, chief executive officer, Atlas. “These experiences will be dynamic, personalized, and constantly evolving — and they’ll require a new creative infrastructure. Partnering with Google Cloud gives us the compute foundation and orchestration support to bring that vision to life.”

Atlas is collaborating with Google Cloud to supercharge its multi-agent AI infrastructure and accelerate the development of AI-native games. The platform is built entirely on Google Cloud’s infrastructure and uses our model orchestration tools, including Vertex AI. This provides Atlas with the robust compute foundation and orchestration support necessary to bring its vision to life, enabling a new era of dynamic, evolving interactive entertainment.

“Atlas’s ability to seamlessly integrate with our highly customized workflows has been a game changer,” said Joseph Burnette, technical director of the Innovation Technology Division at SQUARE ENIX. “By deeply understanding the nuances of our pipeline, they’ve become an invaluable partner, enabling us to deliver high-quality, performance-optimized solutions with impressive agility.”

Embody: Personalized spatial audio for unrivaled immersion

Embody is an AI technology company revolutionizing sound for games, music, and XR experiences through its Immerse AI Engine. This engine uses machine learning, 3D neural networks, and computer vision to deliver personalized spatial audio on any headphones. Simply by analyzing a short smartphone video of a player’s head and ears, Embody creates a unique sound profile for a hyper-realistic and deeply immersive experience. AI-native head tracking and adaptive EQ further enhance the audio, ensuring a consistent and top-tier sound experience across all devices.

To power these complex, real-time calculations and scale to millions of gamers, Embody relies on Google Cloud’s infrastructure. Access to massive, cost-effective GPU compute power is crucial for generating personalized audio profiles in seconds and for their continuous innovation in spatial sound. Our collaboration also allows Embody’s R&D team to rapidly prototype new ideas and ensure their technology is economically viable and globally scalable. Embody’s Immerse is already enhancing AAA titles like Call of Duty: Black Ops 6, War Zone, Final Fantasy XIV, Cyberpunk 2077, and The Witcher 3: Wild Hunt — and just announced it’s launching with Sea of Thieves.

“Sound is the emotional core of every game, and we believe it should be personal,” said Kapil Jain, chief executive officer, Embody. “With Google Cloud, we’re scaling our AI-powered sound personalization engine to meet the demand of millions of gamers around the world.”

Ludeo: Redefining game discovery with playable moments

Ludeo is the world’s first playable media platform, enabling users to instantly play game highlight clips. Unlike gameplay content consumed today — which turns the experience of playing games into passive videos — Ludeo works directly with studios and publishers to create “playable moments,” called Ludeos, that users can instantly jump into and experience themselves. They can do so whether they own the game or not, without any downloads or lengthy installs.

These Ludeo moments can be shared with a link anywhere, from social media platforms to messaging apps. Passive viewers become active participants in seconds. This helps game studios attract new players, re-engage existing ones by showcasing new content, and even lets players “try before they buy” in-game items, boosting interest and conversion.

To power this vision, Ludeo will bolster its core infrastructure with Google Cloud, using Google Kubernetes Engine (GKE) and GPUs to create a highly optimized, low-latency infrastructure that’s required for their platform. Ludeo will also aim to build the “playable YouTube,” fundamentally changing how players discover and socially engage with games, from popular AAA titles to AAs and indies.

“Google Cloud’s infrastructure strengthens the capabilities and scale of the Ludeo platform,” said Uri Levanon, vice president of business development and partnerships at Ludeo. “This powerful combination will give players the magic of instantly playing game highlights instead of just watching them, in addition to unlocking new growth opportunities for game studios.”

NACON: Accelerating game production with AI transformation

NACON stands as a prominent AA video games company and a leader in high-end games hardware, known for popular titles like RoboCop: Rogue City, Ravenswatch and Test Drive Unlimited Solar Crown. With 15 game studios under its belt, NACON is making a bold strategic pivot, embedding AI at the core of its operations, from game development to marketing.

NACON’s goal is to increase annual game launches, a move heavily reliant on streamlining processes and boosting creativity with AI. This vision encompasses everything from crafting captivating trailers and in-game cinematics to optimizing game maps for racing titles, all designed to enhance player experience and developer efficiency.

Google Cloud is NACON’s partner in this AI-driven transformation, helping NACON innovate faster and deliver unforgettable games experiences. NACON has selected Google Cloud as its preferred partner for game servers, ensuring scalable and reliable infrastructure for their diverse portfolio. They are also using Google’s Veo 3 model as a complementary tool to help produce cinematic trailers and Google’s Gemini model to support localization efforts, enabling NACON to reach new global markets more efficiently. Additionally, NACON will use Looker for deep insights into in-game analytics and player behavior, and Google Threat Intelligence to help their ability to proactively secure their operations against industry threats.

“Partnering with Google Cloud marks a pivotal moment in NACON’s journey to transform game development with AI at its core,” said Alain FALC, president and chief executive officer, NACON. “Google Cloud’s cutting-edge tools empower our teams to innovate faster, streamline production, and deliver richer, more immersive experiences to gamers worldwide.”

Nitrado: Hybrid cloud scaling for flawless multiplayer gaming

Nitrado, a global leader in game server hosting, is making multiplayer game creation even easier for studios with a new capability for their orchestration solution, GameFabric. This platform acts as a unified orchestration layer, allowing game developers to seamlessly combine Nitrado’s high-performance bare metal infrastructure with the elasticity of the cloud, all managed through GameFabric.

This means studios can automatically use Google Cloud to support traffic spikes, like during a big game launch or a busy weekend. Furthermore, studios can bring games closer to their players by instantly deploying servers in new regions, using Google’s planet-scale network to ensure low-latency performance for a global audience. GameFabric can scale up or down automatically, allowing developers to focus on the player experience, not the infrastructure, and studios to keep their games running smoothly and cost-efficiently, no matter how many players jump online or where they are.

With Google Cloud as GameFabric’s preferred cloud provider, studios benefit from Google Cloud’s low-latency global network for a flawless player experience and elastic infrastructure for unlimited scalability. This partnership is built on operational tools like GKE and Agones, which are trusted for managing game servers efficiently and reliably. Plus, Google Cloud’s built-in security and reliability protect game and player data around the clock.

“GameFabric brings together bare metal and cloud in a unified orchestration layer, so studios can scale up, stay fast, and keep costs predictable,” said Raphael Stange, chief executive officer, Nitrado. “This partnership strengthens the hybrid model we’ve built to serve multiplayer studio needs.”

Bring your living games to life

The future of living games isn’t just a concept — it’s being built right now, powered by the dynamic combination of cloud and AI. We’re committed to being the foundational partner for game developers and studios of all sizes, offering the scalable infrastructure, powerful AI tools, and deep expertise needed to bring truly dynamic, immersive, and successful games to players worldwide. We’re excited to see what new experiences emerge as our customers and partners continue to push the boundaries of creativity and technology.

Build your AI-powered POC with Google Cloud for games. Visit cloud.google.com/solutions/games.

Read More for the details.

2025 08 15

AWS – Amazon Athena now supports CREATE TABLE AS SELECT with Amazon S3 Tables

Tibor Kiss AWS, Cloud AWS

Amazon Athena now supports CREATE TABLE AS SELECT (CTAS) statements with Amazon S3 Tables. Using CTAS statements makes it simple to create a new table and populate it with data using the results of a SELECT query. You can now use CTAS statements in Athena to query existing datasets and create a new table in S3 Tables with the query results, all in a single SQL statement.

S3 Tables deliver the first cloud object store with built-in Apache Iceberg support and streamline storing tabular data at scale. With today’s launch, you can quickly and efficiently convert existing datasets stored in Parquet, CSV, JSON, and other formats, including Apache Iceberg, Hudi, and Delta Lake, into fully-managed tables that are continually optimized for performance and cost. Once created, use Athena to analyze your data, JOIN it with other datasets, and evolve it over time using INSERT and UPDATE operations. Using CTAS, you can partition the data on the fly, giving you flexibility to optimize query performance for different use cases.

You can use CTAS to create S3 Tables in all AWS Regions where both Athena and S3 Tables are supported. To learn more, see the Amazon Athena User Guide.

Read More for the details.

2025 08 15

GCP – Beyond guardrails: A taxonomy of platform engineering control mechanisms

Tibor Kiss Cloud, Google Cloud gcp

The promise of platform engineering is to accelerate software delivery by empowering developers with self-service capabilities. However, this must be balanced with security, compliance, and operational stability, and for this, you need robust controls. But all too frequently, people talk about “guardrails” — a term whose meaning is often ambiguous, leading to confusion, or worse, disdain. A platform with too many guardrails can feel like a maze of restrictions, turning off the very developers it is trying to recruit.

In order to build a governance framework that enables both fast and safe software delivery, we need to move beyond generic guardrails. In this article, we introduce a practical taxonomy of four distinct platform engineering concepts: golden paths to steer developers; guardrails that act as emergency stops; safety nets, which help ensure recovery from failure; and lastly, manual checkpoints and reviews, which introduce human judgment, oversight, and intervention into the application lifecycle. Once you understand the distinctions between these concepts, you’ll be better equipped to select the right tools and strategies for safely advancing your application through its lifecycle.

A modern taxonomy for platform controls

1.Golden paths: Well-paved roads that guide you

The best platforms don’t block developers; they steer them. A golden path (sometimes referred to as a paved road) is a proactive, guiding track that makes the right choice the easy choice. The goal is to accelerate development by providing pre-configured, secure, and efficient patterns that developers want to use. Golden paths aren’t about preventing bad behavior with a wall, but about encouraging good behavior via a well-paved, high-speed lane. Examples include pre-approved Terraform modules that build secure infrastructure by default, standardized CI/CD pipeline templates, or internal developer portals that offer curated, one-click services.

Here are some tools you can use when creating golden paths for developers.

Custom Terraform Modules /Infrastructure Manager: Pre-approved, secure infrastructure patterns.
Internal Developer Platforms (IDPs): Simplified, curated self-service portals for developers.
Standardized CI/CD pipeline templates (in Cloud Build, ArgoCD, GitLab CI): Pre-defined, secure path for code to get to production.
Cloud Code IDE extensions (for VS Code & IntelliJ): Simplified and standardized developer interaction with Google Cloud.
Gemini Code Assist: An SDLC AI assistant that can be customized with code and rules to follow company best-practices.
Cloud Shell: A standardized, pre-configured command-line environment.
Cloud Workstations: Fully managed, secure, and pre-configured development environments.
Cloud Foundation Toolkit: Ready-made, best-practice blueprints for Terraform.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ed79d12a2b0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

2. Guardrails: The crash barriers

In platform engineering, guardrails are the hard, non-negotiable backstops designed to protect the fundamental integrity of a platform — its security, compliance, and operational stability. While low-friction golden paths guide a developer’s journey, guardrails act as the high-friction, non-negotiable last line of defense.

A guardrail is not a guide rail; its purpose is to prevent a catastrophic event, not to direct the workflow. It functions like an emergency brake, not a steering wheel. Think of it as a crash barrier like in the picture that prevents a catastrophic accident — developers should rarely encounter a guardrail, and when they do, only when a significant deviation from safe practice has occurred. A guardrail doesn’t consider a developer’s immediate goal or speed; it only cares about preventing an action that could compromise the entire system.

Prime examples of guardrails on Google Cloud include an Organization Policy that unconditionally blocks the creation of public storage buckets, or a Binary Authorization policy that rejects any container deployment whose image isn’t cryptographically signed by a trusted source.

The following tools act as guardrails to block potentially catastrophic events.

Organization Policies: Functions as the primary service for setting non-negotiable constraints e.g., blocking public IPs, restricting resource locations, so the constraint itself is the guardrail. Organization policies establish the guardrails, and Google Services provide the means to work effectively within those guardrails.
Binary Authorization: Acts as a strict, non-negotiable gatekeeper, blocking unapproved container deployments in Google Kubernetes Engine (GKE) and Cloud Run.
VPC Service Controls: Creates an impassable network perimeter to prevent data exfiltration.
IAM Conditions and Roles: Enforces strict, context-aware access controls at runtime.
Gatekeeper: Enforces non-negotiable security profiles on pods at creation time in GKE.
Kubernetes Network Policies: Lets you control which pods can send and receive network traffic.
Container sandboxing with gVisor: Provides hard isolation between a container and the host kernel, preventing container escapes.
Vertex AI safety filters: Unconditionally blocks the generation of harmful content from AI models.
Google Cloud Firewall: A globally distributed, stateful service that allows you to enforce granular, layer 4 traffic-filtering policies for your Virtual Private Cloud (VPC) networks.
Google Cloud Armor (WAF & DDoS Mitigation): Acts as a hard shield, blocking malicious web traffic and DDoS attacks before they reach the application.
Shielded GKE Nodes / Shielded VMs: Enforces secure boot and integrity checks, preventing the node from starting if its boot sequence is compromised.
Policy-as-code tools (Open Policy Agent – OPA, Terraform Validator): Validate IaC definitions and block non-compliant changes before deployment.
Artifact Registry (when used to block vulnerable dependencies): Can be configured to block builds if dependencies with critical vulnerabilities are found.

3. Safety nets: Detection and response airbags

Finally, because failures and threats are inevitable, we need safety nets. A safety net is a reactive control that activates after an error or failure has already occurred. Its purpose is not to prevent the initial event, but to detect the problem, mitigate its impact, and facilitate a swift recovery. Continuing with the car analogy, if a golden path is the well-marked road and a guardrail is the concrete barrier, the safety net is the airbag and seatbelt — it doesn’t prevent the crash, but it dramatically reduces the harm. This category includes monitoring systems that alert on failures, automated rollback mechanisms, backup and restore procedures, and security systems that detect intrusions. The focus is on resilience and damage limitation.

These tools are used to detect and mitigate failures or threats after they have occurred.

Cloud Monitoring: Detects performance degradation, failures, and anomalies and sends alerts.
Cloud Logging: Provides the raw data to detect and investigate incidents after they happen.
Security Command Center (SCC): Acts as the central hub for detecting and viewing existing misconfigurations, vulnerabilities, and threats across Google Cloud.
Chronicle Security Operations (SIEM/SOAR): Ingests telemetry to detect complex threats and orchestrate automated responses after an event.
Cloud Trace: Helps diagnose latency issues in distributed systems after they have been detected.
Automated rollback mechanisms (in Cloud Run and GKE): Reverts a failed deployment to a last known good state.
Backup and restore procedures (Cloud Storage Example, Cloud SQL Example): Allows recovery from data loss or corruption after it has happened.
Static/Dynamic Analysis Tools (SAST/DAST – SonarQube, OWASP ZAP): Used to detect existing vulnerabilities in code.
Artifact registry vulnerability scanning: Detects known CVEs in stored container images and packages.
Firebase Test Lab: Detects issues in mobile applications by running tests on real and virtual devices.

Understanding the unique purpose of these three automated control mechanisms — golden paths (steering), guardrails (prevention), and safety nets (reacts or detects post event) — clarifies the intent behind every tool we implement and empowers us to build a platform that is both fast and safe.

Beyond automated controls: Manual checkpoints and reviews

Everything that we’ve discussed thus far — golden paths, guardrails, and safety nets — almost always refers to automated controls, which are a type of control point programmatically integrated into the platform’s workflow, providing speed, consistency, and efficiency. However, other control points inherently require human judgment, oversight, and intervention — think budget approval, architecture reviews, or security post–mortems. As such, manual processes are still a crucial component of a comprehensive governance framework, allowing people to judge complex scenarios. Manual checkpoints and reviews help provide accountability, holistic risk assessments, and audit trails in ways that automated systems alone cannot guarantee (albeit frequently generating a high amount of friction).

Here are some examples of scenarios where you may want to implement manual checkpoints and reviews:

FinOps cost visibility and allocation: Using tools to track cloud spending and allocate costs to specific teams or projects. Here, the Google Cloud FinOps Hub can serve as a centralized dashboard.
FinOps budgeting and forecasting: Setting budgets and forecasting future cloud costs to prevent overspending.
FinOps cost optimization: Implementing strategies to reduce cloud costs, such as rightsizing resources, using reserved instances, and automating a “lights on/lights off” approach to your cloud infrastructure.
Architectural reviews: Formal sessions where architects and senior engineers review proposed system designs. To provide a structured approach, these reviews are often guided by the Google Cloud Well-Architected Framework, where reviewers assess the design against its core pillars: security, reliability, cost optimization, performance, and operational excellence. This involves validating specific aspects, such as the design of air-gapped environments, ensuring reliability requirements are met, and confirming cost-effectiveness. These sessions provide a critical check for complex system interactions that automated tools might miss.
Code reviews (manual): While automated tools catch many issues, it’s critical for a real person to review code changes. Reviewers can identify subtle logic errors, potential race conditions, adherence to non-automatable coding standards or architectural patterns, and opportunities for knowledge sharing and mentoring.
Security assessments: Activities like manual penetration testing, targeted vulnerability assessments, and threat modeling performed by specialized security teams or third-party experts. These assessments simulate real-world attacks and probe for weaknesses that automated scanners might overlook, providing deep insights into the platform’s security posture.
Change management: Formal processes for reviewing, approving, and scheduling significant changes to production environments, often involving a Change Advisory Board (CAB). The process includes assessing the potential risk and impact of changes, ensuring rollback plans are in place, and coordinating deployments. Backlog review and prioritization also fall into this category, as they involve human judgment on strategic direction.
Compliance audits: Verifying adherence to regulatory requirements (like PCI-DSS or HIPAA), which often involves manual inspection of configurations, processes, and collected evidence by internal or external auditors. Even if data gathering is automated via tools like Security Command Center, interpretation and sign-off typically require human auditors.
License management: Ensuring compliance with third-party software licenses, which can involve manual tracking, inventory management, and validation processes (although tools can assist).

The challenge lies in balancing these manual processes with the need for agility. Overly burdensome manual gates can become significant bottlenecks, slowing down delivery pipelines. Platform teams should continuously evaluate manual processes, seeking opportunities for streamlining or partial automation, all while ensuring they still provide their intended value in risk mitigation and governance.

From theory to practice

Ultimately, platform engineering is about balancing developer velocity with robust governance. A successful strategy on Google Cloud depends not on a single type of control, but on a thoughtful blend of different mechanisms. By implementing low-friction golden paths to steer developers, hard-stop guardrails to prevent disaster, and resilient safety nets for swift recovery, we create a layered and effective platform-control framework. By thoughtfully combining these automated and manual controls on Google Cloud, we can build a platform that truly empowers developers without sacrificing security or stability.

In the meantime, consider these strategies for adding extra layers of control to your platform — without placing an undue burden on developers.

Adopt the new vocabulary: Before using the term “guardrail”, stop and consider if you’re using it as a catch-all term, or if you need to start using the more precise taxonomy of golden paths, guardrails, safety nets, or manual checkpoints correctly.
Audit your existing controls: Use this new framework as a lens to evaluate your current platform.
Build with intent: Consciously decide which type of control is most appropriate for each situation.
Balance and optimize: Continuously evaluate the balance between automated controls and manual checkpoints. Strive to build a platform that empowers developers through the software lifecycle with self-service and speed, rather than putting up yet another wall.

To learn more about platform engineering on Google Cloud, you can find more information here. Also, check out some of our other articles: 5 myths about platform engineering: what it is and what it isn’t, Another five myths about platform engineering, and Light the way ahead: Platform Engineering, Golden Paths, and the power of self-service.

Read More for the details.

2025 08 15

GCP – Monitor your databases on Compute Engine with Database Center

Tibor Kiss Cloud, Google Cloud gcp

Database Center is an AI-powered unified fleet management solution that can help you identify and address security risks, performance bottlenecks, and reliability issues for Google Cloud databases including Cloud SQL, AlloyDB, Spanner, Bigtable, Memorystore, and Firestore. Today, we are excited to announce that Database Center can now monitor your self-managed MySQL, PostgreSQL, and SQL Server databases on Google Compute Engine. In addition, we’re also unveiling several new usability enhancements. Let’s dive in!

Expanded coverage: Support for self-managed databases

Many customers run their PostgreSQL, MySQL and SQL server databases on Compute Engine VMs, and have asked for support for monitoring them. Database Center’s monitoring capabilities now extend to these self-managed databases, giving you a holistic view of your entire database estate, both managed and self-managed, from a single, unified interface. Database Center can now also proactively detect and help troubleshoot common security vulnerabilities in databases hosted on Compute Engine VMs, including:

Outdated minor versions: Automatically identify databases running on older minor versions, which may lack the latest security patches.
Auditing not enabled: Flag databases where auditing is not enabled, a critical component for security and compliance.
Broad IP access range: Detect overly permissive IP access ranges, a common security risk that can expose your databases to unauthorized access.
No root password: Identify databases without a root password, a significant security risk.
Allows unencrypted direct connections: Highlight databases that permit unencrypted direct connections.

By bringing your self-managed databases on Compute Engine into the fold, Database Center helps you monitor security and drive operational rigor across your entire database fleet, improving your security posture and simplifying compliance.

This capability is currently in preview and you can sign-up for early access. To enable monitoring of self-managed databases, a lightweight VM agent must be installed. Please see the Database center documentation or the console for more details.

Alerting for new resources and issues for all the databases

To help you stay ahead of potential issues, Database Center now lets you create custom alerts for:

New database resources: Get notified whenever a new database (specific product/version/region) is provisioned in your project, helping to ensure that you have full visibility and control over your database landscape.
New signals: Receive alerts (email, slack/Google chat messages. etc.) for any new issue types detected by Database Center, enabling you to take immediate action and mitigate risks before they impact your applications.

These new alerting capabilities provide you with the proactive monitoring you need to maintain a highly performant, reliable, secure, and compliant database environment.

Simplify fleet monitoring at scale using folder-level chat

Database Center’s Gemini-powered natural language capabilities are now available at the folder level. This means you can now have contextual conversations about your databases within a specific folder, making it easier to manage and troubleshoot databases, especially in large and complex organizational environments.

Historical fleet comparison of up to 30 days

We’ve significantly enhanced Database Center’s historical comparison feature to aid in capacity planning and the analysis of database fleet health. We previously offered a seven-day historical comparison for database inventory and issues; now you have the option of 1 day, 7 days and 30 days historical comparison.

With the user-friendly time range picker, you can get a detailed comparison of:

New database inventory: See exactly which databases have been added to your fleet since the selected date.
New issues detected: Identify new security and operational issues that have emerged over the chosen time period.

This expanded historical view provides you with valuable insights into the evolution of your database fleet, enabling you to track trends, identify patterns, and make more informed decisions.

Get started today

These new features are designed to provide you with a more comprehensive, intelligent, and proactive database management experience. We’re confident that they will make it easier to manage your database fleet, help reduce your security risks, and improve the overall performance and availability of your applications. Please note that Database Center is available to use at no additional cost for Google Cloud customers.

To get started with these new features, please refer to Database Center documentation:

Database Center documentation
Database Center console
Sign-up form for early access for monitoring databases hosted on Compute Engine
Database Center Compute Engine VMs monitoring documentation
Set-up alerts in Database Center

Read More for the details.

2025 08 15

AWS – Amazon DynamoDB now supports a CloudWatch Contributor Insights mode exclusively for throttled keys

Tibor Kiss AWS, Cloud AWS

DynamoDB now supports the ability to selectively emit events for throttled keys to CloudWatch Contributor Insights, enabling you to monitor throttled keys without emitting events for all accessed keys. By emitting events for throttled keys exclusively, you no longer need to pay for all of your successful request events.

Cloudwatch Contributor Insights for DynamoDB can help you understand your traffic patterns by providing information about your most accessed and throttled keys in a table or global secondary index. This information can be used to understand your application usage patterns or diagnose throttling-related issues. By choosing to only emit events for throttled keys, you can reduce the amount you spend to receive these insights.

The new mode to exclusively emit throttled key events to CloudWatch Contributor Insights is available in all commercial AWS Regions, the AWS GovCloud (US) Regions, and the China Regions.

To get started, see the following list of resources:

CloudWatch Contributor Insights for DynamoDB in the DynamoDB developer guide
Enhanced throttling observability in Amazon DynamoDB blog post
Troubleshooting throttling in the DynamoDB developer guide

Read More for the details.

2025 08 15

AWS – Amazon DynamoDB now supports more granular throttle error exceptions

Tibor Kiss AWS, Cloud AWS

DynamoDB now supports more granular throttling exceptions along with their corresponding Amazon CloudWatch metrics. The additional fields in the new throttling exceptions identify the specific resources and reasons for throttling events, making it easier to understand and diagnose throttling-related issues.

You can see the new Amazon CloudWatch metrics immediately, and upon upgrading your SDK to the newest version, you will also see the new granular throttling exceptions. Every throttling exception now contains a list of reasons why the request was throttled, as well as the Amazon Resource Name (ARN) of the table or index that was throttled. These new throttle exception reasons help you understand why you were throttled and enable you to take corrective actions like adjusting your configured throughput, switching your table to on-demand capacity mode, or optimizing data access patterns.

The more granular throttling exceptions and their respective metrics are available in all commercial AWS Regions, the AWS GovCloud (US) Regions, and the China Regions.

To get started see the following list of resources:

Diagnosing throttling issues in the DynamoDB developer guide
Enhanced throttling observability in Amazon DynamoDB blog post
Troubleshooting throttling in the DynamoDB developer guide

Read More for the details.

2025 08 15

AWS – AWS Certificate Manager supports AWS PrivateLink

Tibor Kiss AWS, Cloud AWS

AWS Certificate Manager (ACM) now supports AWS PrivateLink so that you can access ACM APIs from your Amazon Virtual Private Cloud (VPC) without traversing the public internet. This feature can help you meet compliance requirements by allowing you to access and use ACM APIs entirely within the AWS network.

ACM simplifies the process of provisioning and managing public and private TLS certificates, wherever you need to securely terminate traffic; Whether it’s with integrated AWS services such as Amazon CloudFront, Load Balancing or with hybrid workloads. You can now create interface endpoints in AWS Private Link to connect your VPC to ACM. Communication between your VPC and ACM is then conducted entirely within the AWS network, providing a secure pathway for your data.

To get started, you can create an AWS PrivateLink to connect to ACM using the AWS Management Console or AWS Command Line Interface (AWS CLI) commands or AWS CloudFormation. This new feature is available in all AWS Regions including AWS GovCloud (US) and China Regions where AWS Certificate Manager Service and AWS PrivateLink are available. For more information, please refer to the AWS PrivateLink documentation.

Read More for the details.

2025 08 15

AWS – Amazon Managed Service for Prometheus adds support resource policies

Tibor Kiss AWS, Cloud AWS

Amazon Managed Service for Prometheus, a fully managed Prometheus-compatible monitoring service, now supports resource-based policies, making it easier to build applications that work across accounts. With resource-based policies, you can specify which Identity and Access Management (IAM) principals have access to ingest or query your Amazon Managed Service for Prometheus workspace.

To allow cross-account ingestion into an Amazon Managed Service for Prometheus workspace or query the metrics using PromQL from a different account, customers so far had to assume an IAM role in the workspace owner account. With this launch, you now can attach a resource-based policy to an Amazon Managed Service for Prometheus workspace and allow-list non-workspace owner to perform any actions using Prometheus-compatible APIs.

This feature is now available in all regions where Amazon Managed Service for Prometheus is generally available.

To learn more about Amazon Managed Service for Prometheus collector, visit the user guide or product page.

Read More for the details.

2025 08 15

AWS – Amazon Neptune now integrates with Cognee for graph-native memory in GenAI Applications

Tibor Kiss AWS, Cloud AWS

Today, we’re announcing the integration of Amazon Neptune Analytics with Cognee, a leading agentic memory framework designed to help AI agents structure, retrieve, and reason over information. With this launch, customers can use Neptune as the graph store behind Cognee’s memory layer, enabling long-term memory and reasoning capabilities for agentic AI applications.

This integration allows Cognee users to store and query memory graphs at scale, unlocking advanced use cases where AI agents become more personalized and effective over time by learning from ongoing interactions. Neptune supports multi-hop graph reasoning and hybrid retrieval across graph, vector, and keyword modalities—helping Cognee deliver richer, more context-aware AI experiences.

Cognee enables a self-improving memory system that helps developers build cost-efficient, personalized generative AI applications. To learn more about the Neptune–Cognee integration, visit the User Guide and the sample notebook.

Read More for the details.

2025 08 15

AWS – Amazon RDS for Db2 now supports cross-region automated backups for encrypted databases

Tibor Kiss AWS, Cloud AWS

Amazon Relational Database Service (RDS) for Db2 now supports cross-region automated backups for encrypted databases, providing customers with an additional layer of data protection while safeguarding their mission critical Db2 workloads against regional outages.

Customers can now securely copy encrypted database snapshots to regions outside of their primary AWS region for improved disaster recovery. The feature can be enabled by simply turning on encryption for RDS for Db2 instances, and configuring backup replication to the desired AWS region.

To learn more about Amazon RDS for Db2 cross-region automated backups and the supported destination regions, visit the documentation page. Amazon RDS for Db2 makes it simple to set up, operate, and scale Db2 deployments in the cloud. See Amazon RDS for Db2 Pricing for up-to-date pricing of instances, storage, backup, data transfer, and regional availability.

Read More for the details.

2025 08 15

AWS – AWS Billing and Cost Management Console adds new recommended actions

Tibor Kiss AWS, Cloud AWS

Starting today, customers can view 6 new recommended actions added to the existing list of 15 recommended actions available in the AWS Billing and Cost Management Console recommended actions widget. These recommended actions include notifications across AWS payments and tax settings, such as an expired payment method or if tax registration numbers are invalid. All recommended actions are now categorized as critical, advisory, or informational, enabling customers to prioritize and timely resolve any identified billing issues.

Using recommended actions on the AWS Billing and Cost Management Console, customers can quickly learn of and mitigate AWS billing or payment issues, identify cost saving opportunities, and avoid surprises by acting on time-sensitive information. Each recommended action includes a specific call-to-action, which allows customers to optimize their AWS spend and prevent any disruptions to their AWS account and billing status. Customers can access these recommendation actions through the recommended actions widget in the console or via a new public API at no additional cost.

The recommended actions widget and API is available in all AWS commercial regions, excluding China. To get started with new recommended actions, visit the recommended actions widgets on the AWS Billing and Cost Management Console. To learn more, see recommended actions in the AWS Billing and Cost Management user guide

Read More for the details.

2025 08 15

AWS – Amazon RDS for MariaDB now supports community MariaDB minor versions 11.4.8, 10.11.14 and 10.6.23

Tibor Kiss AWS, Cloud AWS

Amazon Relational Database Service (Amazon RDS) for MariaDB now supports community MariaDB minor versions 11.4.8, 10.11.14 and 10.6.23. We recommend that you upgrade to the latest minor versions to fix known security vulnerabilities in prior versions of MariaDB, and to benefit from the bug fixes, performance improvements, and new functionality added by the MariaDB community.

You can leverage automatic minor version upgrades to automatically upgrade your databases to more recent minor versions during scheduled maintenance windows. You can also leverage Amazon RDS Managed Blue/Green deployments for safer, simpler, and faster updates to your MariaDB instances. Learn more about upgrading your database instances, including automatic minor version upgrades and Blue/Green Deployments, in the Amazon RDS User Guide.

Amazon RDS for MariaDB makes it straightforward to set up, operate, and scale MariaDB deployments in the cloud. Learn more about pricing details and regional availability at Amazon RDS for MariaDB. Create or update a fully managed Amazon RDS database in the Amazon RDS Management Console.

Read More for the details.

2025 08 15

AWS – Amazon VPC now supports IPv4 ingress routing for large IP Pools

Tibor Kiss AWS, Cloud AWS

Amazon VPC now allows customers to route inbound internet traffic destined for large pools of public IP addresses, to a single elastic network interface (ENI) within a VPC.

Prior to this enhancement, internet gateways only accepted traffic destined to public IP addresses that were associated with network interfaces in the VPC. There are limits to the number of IP addresses that can be associated with network interfaces. These limits depend on the instance type and can be found in our documentation. There are use cases in Telco, Internet of Things (IoT) and other industries that require customers to route inbound traffic destined for public IP pools, larger than the allowed limits, to a single network interface. Customers would earlier perform address translation to consolidate traffic for such large number of IP addresses. This enhancement removes the need to perform address translation on inbound internet connections for these Telco and IoT use cases. Customers can bring their own public IP pools (BYOIP documentation) and configure their VPC Internet Gateway to accept traffic belonging to this BYOIP pool and route it to a network interface. They can also use this feature with VPC Route Server and dynamically update their routes in events of failure. Refer to our public documentation for details on VPC Route Server.

This enhancement is now available across all AWS commercial, AWS China and GovCloud regions. To learn more about this feature, please refer to our documentation.

Read More for the details.

2025 08 15

AWS – AWS Managed Microsoft AD increases directory sharing limits

Tibor Kiss AWS, Cloud AWS

AWS has increased the account sharing limits for AWS Managed Microsoft AD directory sharing, allowing customers to share their directories with significantly more AWS accounts. The Standard Edition limit has increased from 5 to 25 accounts, while the Enterprise Edition limit has expanded from 125 to 500 accounts. These enhanced limits remove previous technical constraints and enable organizations to scale their directory infrastructure more effectively across their AWS environments.

The increased limits help enterprise customers consolidate their Active Directory infrastructure and reduce operational complexity by supporting larger AWS account footprints from a single managed directory. Organizations can now centralize authentication and management across hundreds of AWS accounts, which in turn helps eliminate the need for complex workarounds with multiple directory deployments.

This feature enhancement is available in all AWS Regions where AWS Managed Microsoft AD is currently supported.

To learn more about AWS Managed Microsoft AD directory sharing, see the AWS Directory Service documentation . For detailed information about directory sharing capabilities and setup, visit the AWS Managed Microsoft AD directory sharing page . For regional availability, see the AWS Region table.

Read More for the details.

2025 08 14

AWS – CloudFormation Hooks Adds Managed Controls and Hook Activity Summary

Tibor Kiss AWS, Cloud AWS

WS CloudFormation Hooks now support managed proactive controls, allowing customers to validate resource configurations against AWS best practices without writing custom hook logic. Customers can select controls from the AWS Control Tower Controls Catalog and apply them during CloudFormation operations. This helps reduce setup time, avoid manual errors, and improve the completeness and consistency of governance coverage. With this launch, customers can also configure these controls to run in warn mode. This allows teams to test controls without blocking deployments and is currently only available through CloudFormation. It gives customers greater flexibility to evaluate control behavior before fully enforcing policies.

AWS also introduced a new Hooks Invocation Summary page in the CloudFormation console. This view provides a historical record of hook activity, showing which controls were invoked, when and where they ran, and their outcomes such as pass, warn, or fail. This helps customers troubleshoot issues faster and demonstrate control posture for audits and compliance reviews.

With this launch, customers can now use AWS managed controls as part of their provisioning workflows, without the overhead of writing and maintaining custom logic. These controls are curated by AWS and aligned with industry best practices, helping teams enforce policies consistently across environments. The new summary page offers visibility into hook execution history, enabling faster issue resolution and better reporting.

These capabilities are available in all AWS Regions where CloudFormation is supported. To learn more, visit the AWS CloudFormation Hooks documentation

Read More for the details.

2025 08 14

AWS – SageMaker HyperPod now supports fine-grained quota allocation of compute resources

Tibor Kiss AWS, Cloud AWS

SageMaker HyperPod task governance now supports fine-grained compute quota allocation of GPU, Trainium accelerator, vCPU, and vCPU memory within an instance. Administrators can allocate fine-grained compute quota across teams, optimizing compute resource distribution and staying within budget.

Data scientists often execute LLM tasks, like training or inference, that do not require entire HyperPod instances, leading to underutilization of accelerated compute resources. HyperPod task governance enables administrators to manage compute quota allocation across teams. With this capability, administrators can now strategically allocate compute resources, ensuring fair access, preventing resource monopolization, and maximizing cluster utilization. This capability enables fine-grained compute quota allocation in addition to instance-level allocation, aligning with organizational workload demands.

SageMaker HyperPod task governance is available in all AWS Regions where HyperPod is available: US East (N. Virginia), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Stockholm), and South America (São Paulo).

To learn more, visit SageMaker HyperPod webpage, and HyperPod task governance documentation.

Read More for the details.

2025 08 14

GCP – Scalable AI starts with storage: Guide to model artifact strategies

Tibor Kiss Cloud, Google Cloud gcp

Managing large model artifacts is a common bottleneck in MLOps. Baking models into container images leads to slow, monolithic deployments, and downloading them at startup introduces significant delays. This guide explores a better way: decoupling your models from your code by hosting them in Cloud Storage and accessing them efficiently from GKE and Cloud Run. We’ll compare various loading strategies, from traditional methods to the high-performance Cloud Storage FUSE CSI driver, to help you build a more agile and scalable ML serving platform.

Optimizing the artifact

To optimize the artifact, we recommend that you centralize in Cloud Storage, and then use quantization and cache warming.

Centralizing in Cloud Storage

The most important step toward a scalable ML serving architecture is to treat the model artifact as a first-class citizen, with its own lifecycle, independent of the application code. The best way to do this is to use Cloud Storage as the central, versioned, and secure source of truth for all model assets, such as .safetensors, .gguf, .pkl, or .joblib files.

This architectural pattern does more than just provide a convenient place to store files. It establishes a unified model plane that is logically separate from the compute plane where inference occurs. The model plane is hosted on Cloud Storage, and it handles the governance of the ML asset: its versioning, storage durability, and access control.

The compute plane—be it GKE, Cloud Run, Vertex AI, or even a local development machine—handles execution: loading the model into GPU memory and processing inference requests. This separation provides immense strategic flexibility. The same versioned model artifact in a Cloud Storage bucket can be consumed by a GKE cluster for a high-throughput batch prediction job; by a Cloud Run service for bursty, real-time inference; and by a fully managed Vertex AI Endpoint for ease of use, all without duplicating the underlying asset. This storage method prevents model sprawl and ensures that all parts of the organization are working from a single, auditable source.

To implement this architecture effectively, you need a structured approach to artifact organization. Best practices suggest the use of a Cloud Storage bucket structure that facilitates robust MLOps workflows. This approach includes using clear naming conventions that incorporate model names and versions (for example, a bucket named gs://my-model-artifacts/gemma-2b/v1.0/) and separate prefixes or even distinct buckets for different environments (such as dev, staging, and prod).

With this approach, access control should be managed with precision using Identity and Access Management (IAM) policies. For example, CI/CD service accounts for production deployments should only have read access to the production models bucket, data scientists might have write access only to development or experimentation buckets, and automated tests should gate promotion of development images to production pipelines.

You can also make specific objects or entire buckets publicly readable through IAM roles like roles/storage.objectViewer assigned to the allUsers principal, though this should be used with caution. This disciplined approach to storage and governance transforms Cloud Storage from a simple file repository into the foundational layer of a scalable and secure MLOps ecosystem.

That scalability is critical for performance, especially when serving large models. Model load is a bursty, high throughput workload, with up to thousands of GPUs trying to load the same model weights simultaneously as quickly as possible. Anywhere Cache should always be used for this scenario, which can provide up to 2.5 TB/s in BW at lower latency. As a managed, SSD-backed caching layer for Cloud Storage, Anywhere Cache colocates data with your compute resources. It transparently serves read requests from a high-speed local cache, benefiting any Cloud Storage client in the zone—including GKE, Compute Engine, and Vertex AI—and dramatically reducing model load times.

Quantization

Quantization is the process of reducing the precision of a model’s weights (for example, from 32-bit floating point to 4-bit integer). From a storage perspective, the size of a model’s weights is a function of its parameters and their precision (precision × number of parameters = model size). By reducing the precision, you can dramatically shrink the model’s storage footprint.

Quantization has two major benefits:

Smaller model size: A quantized model takes up significantly less disk space, leading to faster downloads and less memory consumption.
Faster inference: Many modern CPUs and GPUs can perform integer math much faster than floating-point math.

For the best results, use modern, quantization-aware model formats like GGUF, which are designed for fast loading and efficient inference.

Cache warming

For many LLMs, the initial processing of a prompt is the most computationally expensive part. You can pre-process common prompts or a representative sample of your data during the build process and save the resulting cache state. Your application can then load this warmed cache at startup, allowing it to skip the expensive initial processing for common requests. Serving frameworks like VLLM provide capabilities like Automated Prefix Caching that support this.

Loading the artifact

Choosing the right model loading strategy is a critical architectural decision. Here’s a breakdown of the most common approaches:

Cloud Storage FUSE CSI driver: The recommended approach for most modern ML serving workloads on GKE is to use the Cloud Storage FUSE CSI driver. This approach mounts a Cloud Storage bucket directly into the pod’s filesystem as a volume, so the application can read the model as if it were a local file. This implementation provides near-instantaneous pod startup and fully decouples the model from the code.
init container download: A more flexible approach is to use a Kubernetes init container to download the model from Cloud Storage to a shared emptyDir volume before the main application starts. This implementation decouples the model from the image, so that you can update the model without rebuilding the container. However, this implementation can significantly increase pod startup time and add complexity to your deployment. This approach is a good option for medium-sized models where the startup delay is acceptable.
Concurrent download: Similar to the init container, you can download the model concurrently within your application. This approach can be faster than a simple gsutil cp command because it allows for parallelization. A prime example of this is the vLLM Run:ai Model Streamer, which you can enable when you use the vLLM serving framework. This feature parallelizes the download of large model files by splitting them into chunks and fetching them concurrently, which significantly accelerates the initial load.
Baking into the image: The simplest approach is to copy the model directly into the container image during the docker build process. This approach makes the container self-contained and portable, but it also creates very large images, which can be slow to build and transfer. This tight coupling of the model and code also means that any model update requires a full image rebuild. This strategy is best for small models or quick prototypes where simplicity is the top priority.

Direct access with Cloud Storage FUSE

The Cloud Storage FUSE CSI driver is a significant development for ML workloads on GKE. It lets you mount a Cloud Storage bucket directly into your pod’s filesystem, so that the objects in the bucket appear as local files. This configuration is accomplished by injecting a sidecar container into your pod that manages the FUSE mount. This setup eliminates the need to copy data, resulting in near-instantaneous pod startup times.

It’s important to note that although the Cloud Storage FUSE CSI driver is compatible with both GKE Standard and Autopilot clusters, Autopilot’s security constraints prevent the use of the SYS_ADMIN capability, which is typically required by FUSE. The CSI driver is designed to work without this privileged access, but it’s a critical consideration when you deploy to Autopilot.

Performance tuning

Out of the box, Cloud Storage FUSE is a convenient way to access your models. But to unlock its full potential for read-heavy inference workloads, you need to tune its caching and prefetching capabilities.

Parallel downloads: For very large model files, you can enable parallel downloads to accelerate the initial read from Cloud Storage into the local file cache. This is enabled by default when file caching is enabled.
Metadata caching & prefetching: The first time that you access a file, FUSE needs to get its metadata (like size and permissions) from Cloud Storage. To keep the metadata in memory, you can configure a stat cache. For even better performance, you can enable metadata prefetching, which proactively loads the metadata for all files in a directory when the volume is mounted. You can enable metadata prefetching by setting the metadata-cache:stat-cache-max-size-mb and metadata-cache:ttl-secs options in your mountOptions configuration.

For more information, see the Performance tuning best practices in the Cloud Storage documentation. For an example of a GKE Deployment manifest that mounts a Cloud Storage bucket with performance-tuned FUSE settings, see the sample configuration YAML files.

Advanced storage on GKE

Cloud Storage FUSE offers a direct and convenient way to access model artifacts. GKE also provides specialized, high-performance storage solutions designed to eliminate I/O bottlenecks for the most demanding AI/ML workloads. These options, Google Cloud Managed Lustre, and Hyperdisk ML, offer alternatives that can provide high performance and stability by leveraging dedicated parallel file and block storage.

Managed Lustre

For the most extreme performance requirements, Google Cloud Managed Lustre provides a fully managed, parallel file system. Managed Lustre is designed for workloads that demand ultra-low, sub-millisecond latency and massive IOPS, such as HPC simulations and AI training and inference jobs. It’s POSIX-compliant, which ensures compatibility with existing applications and workflows.

This service, powered by DDN’s EXAScaler, scales to multiple PBs and streams data up to 1 TB/s, making it ideal for large-scale AI jobs that need to feed hungry GPUs or TPUs. It’s intended for high-throughput data access rather than long-term storage archiving. Although its primary use case is persistent storage for training data and checkpoints, it can handle millions of small files and random reads with extremely low latency and high throughput. It’s therefore a powerful tool for complex inference pipelines that might need to read or write many intermediate files.

To use Managed Lustre with GKE, you first enable the Managed Lustre CSI driver on your GKE cluster. Then, you define a StorageClass resource that references the driver and a PersistentVolumeClaim request to either dynamically provision a new Lustre instance or connect to an existing one. Finally, you mount the PersistentVolumeClaim as a volume in your pods, which lets them access the high-throughput, low-latency parallel file system.

Hyperdisk ML

Hyperdisk ML is a network block storage option that’s purpose-built for AI/ML workloads, particularly for accelerating the loading of static data like model weights. Unlike Cloud Storage FUSE, which provides a file system interface to an object store, Hyperdisk ML provides a high-performance block device that can be pre-loaded, or hydrated, with model artifacts from Cloud Storage.

Its standout feature for inference serving is its support for READ_ONLY_MANY access, which allows a single Hyperdisk ML volume to be attached as a read-only device to up to 2,500 GKE nodes concurrently. In this architecture, every pod can access the same centralized, high-performance copy of the model artifact without duplication. You can therefore use it to scale out stateless inference services that provide high throughput with smaller TB sized volumes. Note that the read-only nature of Hyperdisk ML introduces operational process changes each time a model is updated.

To integrate Hyperdisk ML, you first create a Hyperdisk ML volume and populate it with your model artifacts from Cloud Storage. Then you define a StorageClass resource and a PersistentVolumeClaim request in your GKE cluster to make the volume available to your pods. Finally, you mount the PersistentVolumeClaim as a volume in your Deployment manifest.

Serving the artifact on Cloud Run

Cloud Run also supports mounting Cloud Storage buckets as volumes, which makes it a viable platform for serving ML models, especially with the addition of GPU support. You can configure a Cloud Storage volume mount directly in your Cloud Run service definition. This implementation provides a simple and effective way to give your serverless application access to the models that are stored in Cloud Storage.

Here is an example of how to mount a Cloud Storage bucket as a volume in a Cloud Run service by using the gcloud command-line tool:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud run deploy my-ml-service \rn –image gcr.io/my-project/my-ml-app:latest \rn –add-volume=name=model-volume,type=cloud-storage,bucket=my-gcs-bucket \rn –add-volume-mount=volume=model-volume,mount-path=/models’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7fa15cfe4eb0>)])]>

Automating the artifact lifecycle

To automate the artifact lifecycle, you build an ingestion pipeline that includes a scripted Cloud Run job, and then you stream directly to Cloud Storage.

Building an ingestion pipeline

For a production environment, you need an automated, repeatable process for ingesting models, which you can build by using a Cloud Run job. The core of this pipeline is a Cloud Run job that runs a containerized script. This job can be triggered manually or on a schedule to create a robust, serverless pipeline for transferring models from Hugging Face into your Cloud Storage bucket.

Streaming directly to Cloud Storage

Instead of downloading the entire model to the Cloud Run job’s local disk before uploading it to Cloud Storage, we can stream it directly. The obstore library is perfect for this. It lets you treat a Hugging Face repository and a Cloud Storage bucket as object stores and stream data between them asynchronously. This is highly efficient, especially for very large models, because it minimizes local disk usage and maximizes network throughput.

Here is a simplified Python snippet that shows the core logic of streaming a file from Hugging Face to Cloud Storage by using the obstore library:

code_block: <ListValue: [StructValue([(‘code’, ‘import osrnimport asynciornfrom urllib.parse import urlparsernfrom huggingface_hub import hf_hub_urlrnimport obstore as obsrnfrom obstore.store import GCSStore, HTTPStorernrnasync def stream_file_to_gcs(file_name, hf_repo_id, gcs_bucket_name, gcs_path_prefix):rn “””Streams a file from a Hugging Face repo directly to Cloud Storage.”””rn rn # 1. Configure the source (Hugging Face) and destination (Cloud Storage) storesrn http_store = HTTPStore.from_url(“https://huggingface.co”)rn gcs_store = GCSStore(bucket=gcs_bucket_name)rnrn # 2. Get the full download URL for the filern full_download_url = hf_hub_url(repo_id=hf_repo_id, filename=file_name)rn download_path = urlparse(full_download_url).pathrn rn # 3. Define the destination path in Cloud Storagern gcs_destination_path = os.path.join(gcs_path_prefix, file_name)rnrn # 4. Get the download stream from Hugging Facern streaming_response = await obs.get_async(http_store, download_path)rnrn # 5. Stream the file to Cloud Storagern await obs.put_async(gcs_store, gcs_destination_path, streaming_response)rnrn print(f”Successfully streamed ‘{file_name}’ to Cloud Storage.”)rnrn# Example usage:rn# asyncio.run(stream_file_to_gcs(“model.safetensors”, “google/gemma-2b”, “my-gcs-bucket”, “gemma-2b-model/”))’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7fa15cfe4b50>)])]>

Conclusion

By moving your model artifacts out of your container images and into a centralized Cloud Storage bucket, you gain a tremendous amount of flexibility and agility. This decoupled approach simplifies your CI/CD pipeline, accelerates deployments, and lets you manage your code and models independently.

For the most demanding ML workloads on GKE, the Cloud Storage FUSE CSI driver is an excellent choice, providing direct, high-performance access to your models without a time-consuming copy step. For even greater performance, consider using Managed Lustre or Hyperdisk ML. When you combine these options with an automated ingestion pipeline and build-time best practices, you can create a truly robust, scalable, and future-proof ML serving platform on Google Cloud.

The journey to a mature MLOps platform is an iterative one. By starting with a solid foundation of artifact-centric design, you can build a system that is not only powerful and scalable today, but also adaptable to the ever-changing landscape of machine learning. Share your tips on managing model artifacts with me on LinkedIn, X, and Bluesky.

Read More for the details.