Azure – Generally available: Zone Redundant Storage for Azure Disks is now available in East Asia
Zone Redundant Storage (ZRS) for Azure Disks is now available on Azure Premium SSD and Standard SSD in East Asia
Read More for the details.
Zone Redundant Storage (ZRS) for Azure Disks is now available on Azure Premium SSD and Standard SSD in East Asia
Read More for the details.
When an incident disrupts a cloud service that you rely on, an effective response starts with identifying the source of that disruption and evaluating the scope of impact. This is crucial to charting a course of action — whether that’s communicating with your stakeholders or deploying a disaster recovery procedure. But when you use a cloud service provider, your ability to mount an effective incident response is dependent on the transparency, timeliness, and actionability of the incident communications provided.
Today, we’re excited to introduce Personalized Service Health, which provides fast, transparent, relevant, and actionable communication about Google Cloud service disruptions. Currently in Preview, you can use Personalized Service Health to receive granular alerts about Google Cloud service disruptions, as a stop in your incident response, or integrated with your incident response or monitoring tools.
Today, when Google detects an incident that could potentially impact you, we publish that information openly with Google Cloud Service Health, our highly reliable public dashboard that delivers information on active incidents that require wide distribution — typically those that tend to be larger in scope or severity. Organized by Google Cloud products and the regions they operate in, Google Cloud Service Health displays real-time information about incidents impacting Google Cloud products and provides mechanisms to download service disruption history.
Personalized Service Health takes these benefits a step further, and is the ideal destination for many customers to start their incident response journey. Personalized Service Health provides:
Controls to decide the service disruptions relevant to you: Google Cloud Service Health posts incidents that affect a broad set of customers, and is not an exhaustive list of incidents. If you prefer to see or be alerted of more incidents, earlier or more often — even smaller-scale ones — you can use Personalized Service Health to configure how and when you are alerted about incidents.
Ability to integrate with your incident management workflow: Personalized Service Health offers multiple integration options with your preferred incident management tools and workflows — for example, you can integrate alerts with PagerDuty to alert the appropriate incident responders when a service disruption begins.
Proactive incident discoverability: Personalized Service Health emits logs and can push customizable alerts to make incidents more discoverable in your workflow.
Let’s take a deeper look at these benefits.
Personalized Service Health can fire an alert to an extensive array of destinations when a Google Cloud service disruption is posted or updated. You can choose which of these you would like to be alerted on, where, and customize the alert content to include critical information about the incident — including the affected Google services and locations, current relevance to your project, observable symptoms, and known mitigations.
You can configure alerts directly in Personalized Service Health, in Cloud Monitoring, or via Terraform. Each alert can be fired to one or more destinations, including email, SMS, Pub/Sub, webhook, or PagerDuty. You can also create multiple alerts for a single project for a higher degree of granularity.
Personalized Service Health is designed to publish information related to disruptions that may affect your projects with various degrees of relevance. By definition, this approach may provide you more information than what you think is strictly necessary. To strike a balance, you can filter the incidents to only see what you may deem relevant, across a variety of integration points:
Dashboard: Filter the incident table by any displayed field and incident recency.
Alerts: You can create a conditional alerting policy with any incident field, including Google Cloud products, locations, or relevance to your project.
API: You can use request filters in your API requests to further filter events programmatically in your application.
Logs: Cloud Logging supports a robust query language to filter logs as they are routed to another destination through a log sink.
Incident response can span many people, teams, and tools in an organization. Personalized Service Health aims to fit into your existing incident response processes by offering several integration options depending on your preference for programmatic access, proactive versus reactive interactions, and existing tools.
You can use Personalized Service Health as a dashboard directly from the Google Cloud console, or fit it into any existing incident response or monitoring tool in your preferred workflow. The Service Health dashboard provides a list of active incidents relevant to your project, and, for each incident, you can see impact details about the incident or track updates from Google Cloud support. This is quick to set up and easy to maintain.
If you’re integrating Personalized Service Health with an external alerting, monitoring, or incident response tool, the Service Health API offers programmatic access to all incidents relevant to a specific project or for all projects across your organization. The API provides programmatic access to the complete list of all relevant incidents, updates from Google Cloud, and description of impact.
When a service disruption begins, Cloud Logging collects Personalized Service Health logs for all updates to the event. To build up a historic record of events, you can retain logs in a storage bucket. You can also use Log Analytics with BigQuery to analyze past service disruptions.
As of today, we are excited to announce Personalized Service Health is integrated with 50+ Google Cloud products & services – including Compute Engine, Cloud Storage, all Cloud Networking offerings, BigQuery, Google Kubernetes Engine, and many more. If any integrated Google Cloud product detects a disruption that may impact you, Personalized Service Health provides an impact assessment, and shares updates including symptoms, known workarounds, or an ETA for resolution.
Some products may offer more advanced capabilities through Personalized Service Health, including faster initial posting, definitive impact signals, and may post small blast-radius incidents not posted on the public Google Cloud Service Health dashboard. Here is the complete list of integrated products and supported capabilities; we expect the list of supported Google Cloud products and capabilities will expand over time.
“The instinct for cloud providers is to be overly cautious about sharing outages too quickly. I’d rather proactively move a workload and learn there was no issue than the workload go down unknowingly. We’re happy to see Google Cloud make this step to be more transparent with customers and look forward to leveraging PSH.”
– Justin Watts, Director Information Services & Technology Strategy, Telus
“Proactive alerts from Personalized Service Health to responders are critical to any enterprise customer’s incident response process. The PagerDuty and Google Cloud partnership is able to provide our customers an essential platform for modern operations that helps them quickly respond to cloud disruptions and deliver seamless digital experiences.”
-Jonathan Rende, SVP Products, PagerDuty
Reliable infrastructure is essential for workloads in the cloud and we’re continuously raising the bar on reliability through technology, product, and process innovation. A key component of reliability is the speed and effectiveness of incident response. During a cloud service incident, however unlikely, excellent communications are vital. Personalized Service Health provides the information you need to take your incident response communications to the next level, so can quickly assess what is happening, take actions to minimize impact to your applications, and keep your stakeholders informed. To get started, enable Personalized Service Health for a project or across your organization.
Read More for the details.
Work with JSON-style documents synced across multiple regions
Read More for the details.
Eliminate the need to set up new servers and complete manual upgrades with in-place major version upgrade for Azure Database for PostgreSQL – Flexible Server.
Read More for the details.
Public preview enhancements and updates released for Azure SQL in early-August 2023
Read More for the details.
With intra-account container copy in the Azure Cosmos DB API for MongoDB you can migrate data from one collection to another in an offline manner.
Read More for the details.
General availability enhancements and updates released for Azure SQL in early-August 2023.
Read More for the details.
Azure Database for PostgreSQL – Flexible Server now provides support for PostgreSQL 15 across all Azure regions.
Read More for the details.
Effortlessly scale your storage IOPS to match workload demands with the autoscale IOPS feature in Azure Database for MySQL – Flexible Server.
Read More for the details.
AWS Lake Formation is a service that allows you to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions. AWS Lake Formation is now available in the following two additional AWS Regions:
Asia Pacific (Melbourne)
Asia Pacific (Hyderabad)
Read More for the details.
AWS Backup for Amazon S3 now improves the speed of backups by up to 10x for buckets with more than 300 million objects. This performance improvement enables you to speed up your initial S3 backup workflow and back up buckets with more than 3 billion objects. AWS Backup is a policy-based, fully managed and cost-effective solution that enables you to centralize and automate data protection of Amazon S3 along with other AWS services (spanning compute, storage, and databases) and third-party applications. The performance improvement is automatically enabled at no additional cost in all Regions where AWS Backup support for Amazon S3 is available.
Read More for the details.
AWS Transit Gateway is now available in the Israel (Tel Aviv) AWS Region with AWS Direct Connect. AWS Transit Gateway enables customers to connect thousands of Amazon Virtual Private Clouds (Amazon VPCs) and their on-premises networks using a single gateway.
Read More for the details.
You can now use AWS Resource Access Manager (AWS RAM) in the AWS Israel (Tel Aviv) Region.
Read More for the details.
Amazon Simple Queue Service (SQS) announces an increased quota for a high throughput mode for FIFO queues, allowing you to process up to 9,000 transactions per second, per API action in US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (Frankfurt) regions. For Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo) regions, the throughput quota has been increased to 4,500 transactions per second, per API action. For all other regions where SQS is generally available today, the quota for high throughput mode quota has been increased to 2,400 transactions per second.
Read More for the details.
AWS Database Migration Service (DMS) makes homogeneous migrations simpler with built-in native database tooling. Today, in addition to supporting MySQL and PostgreSQL, this feature now supports MariaDB. Built-in native database tooling with homogeneous data migrations provides simple and performant like-to-like migrations with minimal downtime.
Read More for the details.
Azure Monitor container insights now allows customers to configure custom collection settings for the Perf, Inventory, InsightsMetrics, and KubeEvents tables to reduce their Log Analytics ingestion.
Read More for the details.
AWS Managed Services (AMS) Accelerate Operations Plan is now available in Jakarta region. AMS helps you operate AWS efficiently and securely. It provides proactive, preventative, and detective capabilities that raise the operational bar and help reduce risk without constraining agility, allowing you to focus on innovation. AMS extends your team with operational capabilities including monitoring, incident detection and management, security, patch, backup, and cost optimization.
Read More for the details.
Kakao Mobility is the leading mobility service provider in South Korea, providing taxi services, turn-by-turn directions, public transport, parking space searching, and real-time traffic information to more than 30M users in South Korea. In this blog post, we’re highlighting Kakao Mobility for the DevOps achievements that earned them the ‘Optimizing for speed without sacrificing stability’ award in the2022 DevOps Awards. If you want to learn more about the winners and how they used DORA metrics and practices to grow their businesses, start here.
Kakao Mobility customers rely heavily on our service, especially during the rush hour commute, and so we want to ensure a 100% uptime service level agreement. Incorporating a wide range of services, from real-time traffic monitoring and payment systems to booking drivers, we need cloud infrastructure that can scale on demand if we wish to continually expand. The current experience for developers was stress-inducing, as their environment lacks the scalability and flexibility of the cloud. When trying to deploy new services and features, scaling was continually causing delays and bottlenecks, leading to frustrating app performance for users.
The unexpected nature of city traffic, from fluctuating commute times to unforeseen accidents, meant that we often faced spikes in user traffic which hampered the customer experience. Furthermore, our company provides its services via APIs to third-party platforms to integrate our wide range of traffic and mobility services within native app experiences. To ensure our offerings remain secure, we also wanted to harden its API resiliency while maintaining its 100% SLA target for its customers. Recognizing that any reduction in availability and responsiveness can lead to distrust from our consumer base, and a loss in profits, we needed to provide a fault-tolerant, resilient system that could scale on-demand.
To provide users with a highly-available app experience as we expand our market share, countries of operation, and offer new features and services to customers, we needed to focus on reliability. Because reliability is a critical factor in ensuring that our commuting customers arrive at work on time, we are pursuing a multi-cloud strategy to maintain availability even in the worst of scenarios.
There are three main objectives of our migration strategy:
Uncouple and modernize the application to incorporate Kubernetes containers and microservices
Improve the performance speed of deployment while relieving developer burden
Improve the availability, reliability, and performance of the service
Our multi-step migration process began with modernizing our environment to build a multi-cloud, hybrid cloud architecture. Over the course of 2021 and 2022, our teams worked to refactor the application into microservices, migrating workloads to Google Cloud, and began using Anthos Service Mesh (ASM) as an API orchestration platform. The elasticity and scalability of cloud resources allows the team to experience a cost-effective, reliable solution.
Throughout 2022, we successfully migrated our flagship application to Google Cloud, implementing it using Google Kubernetes Engine (GKE) clusters to maintain scalability and reliability. Working closely with Google Cloud, our DevOps team has been modernizing their conventional services to support the new multi-cloud strategy between on-premises and Google Cloud.
Using Anthos Service Mesh, we deploy gateways to control the ingress and egress of traffic throughout the application, and separate the resources across several GKE clusters. APIs play a critical role in our ability to provide a reliable and scalable service to users, including in third-party, offline applications.
For major seasonal events where we expect significant increases in traffic — Korean traditional holidays, Christmas holidays and New Years Eve — Google Cloud’s Event Management Service (EMS) helps ensure reliability and availability. Working closely with the Google Cloud team helps us not only reinforce infrastructure to maintain stability throughout the entire seasonal event, but we run simulation and tabletop exercises to prepare the teams for any eventuality.
Thanks to Anthos Service Mesh, Kakao Mobility has modernized our IT environment and enhanced cloud security, with plans to adopt more Google Cloud services in the future.
Pod scaling time is no longer included in deployment time. The deployment manager does not need to wait for the pod to start, and the deployment is carried out only with traffic control. The deployment manager can deliver as much traffic as one needs to the new version and verify the change quickly. If a hot fix deployment is required, the deployment can be completed by switching traffic within 10 seconds! In a real example of a time when an urgent deployment was required, it took us only about 10 minutes to provision a new version of the pod (including the nodes), and the service processed traffic stably for those 10 minutes.
Developer teams can also provide security-enhanced access to their APIs while whitelisting traffic with Anthos Service Mesh, so we’re not inundated with latency during commute times. The development team in charge of the entire service was operating two microservices, but have increased that to nine since implementing ASM — a 450% increase.
None of this would have been possible without the DORA research that taught us how to focus on continuous improvement within our organization. While this project was not always a linear path to success, failing helped us better understand what to focus on in the future so we can serve our customers more effectively and efficiently.
Stay tuned for the rest of the series highlighting the DevOps Award Winners and read the 2022 State of DevOps report to dive deeper into the DORA research.
Read More for the details.
Configure hub routing preference to influence path selection for the same on-premises destination prefixes learned over (S2S) VPN, ExpressRoute, and SD-WAN NVAs.
Read More for the details.
Amazon ElastiCache now supports Graviton3-based M7g and R7g node families. ElastiCache Graviton3 nodes deliver improved price-performance compared to Graviton2. As an example, when running ElastiCache for Redis on an R7g.4xlarge node, you can achieve up to 28% increased throughput (read and write operations per second) and up to 21% improved P99 latency, compared to running on R6g.4xlarge. In addition, these nodes deliver up to 25% higher networking bandwidth.
Read More for the details.