Azure – Generally available: Autoscale Stream Analytics jobs
Generally available: Autoscale Stream Analytics jobs
Read More for the details.
Generally available: Autoscale Stream Analytics jobs
Read More for the details.
Amazon FSx for NetApp ONTAP, a service that provides fully managed shared storage built on NetApp’s popular ONTAP file system, is announcing two additional monitoring capabilities that enable you to monitor file system events and diagnose network connectivity: you can now access ONTAP Event Management System (EMS) logs and collect packet captures on your file systems.
Read More for the details.
Amazon FSx for NetApp ONTAP, a service that provides fully managed shared storage built on NetApp’s popular ONTAP file system, now supports using the IP security (IPsec) protocol to encrypt data in transit. With this additional option to encrypt your data end-to-end, FSx for ONTAP offers even more flexibility for you to protect your data.
Read More for the details.
Amazon FSx for NetApp ONTAP now supports SnapLock, an ONTAP feature that prevents data from being modified or deleted for a specified retention period by transitioning files to a write once, read many (WORM) state. FSx for ONTAP is the first and only fully managed file storage service in the cloud offering WORM protection. You can use SnapLock to meet regulatory compliance, protect business-critical data from ransomware, and prevent accidental or malicious attempts at alteration and deletion of data.
Read More for the details.
Amazon DocumentDB launches index improvements enabling faster index builds on collections and the ability to view index build statuses. Amazon DocumentDB index builds can now be sped by up to 14X when using parallel workers compared to using a single worker. The index creation process now uses two workers by default, and you can configure the number of workers on Amazon DocumentDB 4.0 and 5.0 instance-based clusters.
Read More for the details.
AWS Mainframe Modernization is now available with a range of new capabilities for easier management and operation of the service’s fully-managed runtime environments. The capabilities include role-based access, application-level monitoring, and application deployment updates that enable seamless user authorization control, greater application performance visibility, and faster application deployment, respectively, for modernizing mainframe applications.
Read More for the details.
Amazon QuickSight introduces new unified coloring experience for your analysis and dashboards. Authors now have a way to assign colors on a field level which means that different visuals having the same field will now have the same color. Thus, making it easier for authors to specify color once on field level and reuse across different visuals instead of assigning colors for each visual separately. Along with this, we are also improving the coloring experience where visual color will persist and not change with visual interactions like sorting, filtering and actions. More details can be found here.
Read More for the details.
Amazon QuickSight now supports new axis configurations for small multiples and radar charts, empowering users to customize axis settings according to their use case. In small multiples, users now have the option to select either shared or independent axis configuration for both the X and Y axes, specifically for line and bar charts.
Read More for the details.
Here at Google Cloud we are always trying to find new ways to simplify how our customers troubleshoot. We’re excited to announce the introduction of a new troubleshooting experience: recommended interactive playbooks for Google Kubernetes Engine (GKE).
When dealing with issues that may be new to you, but that we’ve seen commonly in the past, these new playbooks can help you more quickly resolve issues and improve your Mean Time to Resolution, or MTTR.
Let’s take a quick look at one of these new example playbooks.
Let’s say we have a GKE cluster and an application requesting more resources than are available to it, such as memory or CPU. In that situation, it’s often the case that a Pod (or Pods) will be marked as ‘unschedulable’.
A Pod being marked as ‘unschedulable’ is a common issue and something we have documentedextensively, but let’s see how we can simplify the troubleshooting process.
In the screenshot below we’ve highlighted the notification from the cluster view that Pods are unschedulable.
If we click this notification we see a screen appear offering us a few ways to better understand this issue:
Clicking into the playbook, we can see a lot of information relevant to the issue at hand including relevant logs, metrics, and suggested next steps:
We can see from the logs and metrics that the Pods of the Deployment have requested more memory than is available, but that the node has ample resources available and there are no maximum limits on Pods being set. So to resolve this issue, we’ll need to modify the amount of memory the Pod requests, or increase the size of our cluster.
This dashboard is also customizable, so if you’d like you can add or remove components based on what’s most pertinent to you and your organization.
Finally, at the bottom of the playbook, under ‘Future Mitigation Tips’, you can also create an alert policy to look specifically for this issue:
When this alert fires, you’ll be able to acknowledge the incident or click the policy link to jump straight into this dashboard and begin troubleshooting:
This week we’re making two playbooks available: Unschedulable Pods, and a playbook for troubleshooting repeated attempts of a deployment crashing, commonly known as CrashLoopBackOff. We have playbooks for Memory and CPU scaling issues coming soon.
Both will appear as notifications for clusters where issues are present, and we hope this helps you in your troubleshooting journey! As always, if you have any questions or feedback on the product, please let us know by leaving feedback under the question mark icon of the page.
Read More for the details.
According to this year’s State of FinOps report, across cloud providers, organizations employ 4.1 cloud cost-management tools. Four point one! We’ve observed a similar trend with our own customers. On average, they report using four tools — native, third-party, and homegrown — to make sense of their cloud bills. Astoundingly, we’ve learned from some of larger enterprise customers that they may have 10 or more cost-management and operation tools. When companies get in this situation, making decisions is a challenge.
The industry is taking note with efforts to standardize open billing data. And organizations like yours are going to great lengths to demystify their cloud spend. But when it comes to multicloud reporting, enterprise customers still often face the age-old question : Buy or build?
To illuminate the topic, we talked to one enterprise customer, UKG, which provides human capital management (HCM) and workforce management solutions. Recently, their Director of Cloud FinOps, Peter Crenshaw, joined us on the Framing Up FinOps podcast to share his thoughts about UKG’s Cloud FinOps journey and in-house cost-management tooling.
About three years ago, when leaders at UKG decided they needed to clarify the organization’s cloud costs, they approached Crenshaw to begin reporting on Google Cloud spend. “We had a massive product deployed but didn’t understand the bills,” Peter explains. “Eighteen months or so ago, we ran into issues finding a third party that could process and manipulate data based on our reseller discounting, then present it in a way that would work.”
Like many enterprises, UKG has a large cloud footprint. They had multiple and disparate advisees, advisors, providers, and tools, each producing data that didn’t match up. Together with teammates, Peter started a small FinOps practice. In his words, they began with “rudimentary Excel and menial efforts.” They needed to decide whether to build their own cost-management solution or to go with a third-party vendor. Ultimately, they partnered with Google Cloud — first, to find a solution to view current costs, then to expand on its use as their team and FinOps practice matured.
Since then, the official team has grown from one to four. And they’ve consolidated their cost-management tooling to align with industry averages. To better understand and report on UKG’s cloud spend, Peter and his team worked with Google Cloud to build an advanced reporting solution.
For cost reconciliation and granular visibility into cloud cost breakdowns, Peter and team elected to build their solution on Looker, a platform designed for insight-powered workflows and applications. Additionally they leveraged Cloud Billing data to BigQuery for advanced billing data at the resource level and the Google Cloud billing console for validation.
In assembling UKG’s FinOps team, Peter departed from industry recommendations to establish a cross-functional team with engineers and cloud financial analysts. Instead — leveraging the strengths of his organization — his team partners with the in-house experts of their SRE teams. “We looked at FinOps as an analytics team. We’re not engineers, and we don’t build tooling or software. . . We know and can show the data and partner with teams that have those skill sets to make strides and advance,” he explains.
When we asked Peter about the benefits and outcomes of building tooling on Looker, he underscored the depth of visibility, the ability to alert teams to real-time data, and increased invoicing efficiency. Currently, he and the FinOps team have been evangelizing the advantages of labeling. With real-time reports and demos, they’re showing internal teams the true costs of running various components, services, or products. These data and demos have elicited a measure of FinOps buy-in across the organization and have begun to precipitate a culture shift.
Additionally, UKG’s Finance and FinOps teams are seeing the biggest wins in the reduction of time spent on tasks. PO invoice processing at UKG used to take as many as 2 weeks in the payment cycle for approvals. But with advanced reporting and custom templates, Peter and his team can process their own data within 30 minutes of receiving an invoice. Organization wide, UKG has reduced PO invoice processing time from two weeks to one day.
As Peter notes, and as we’ve shared here before, getting visibility into cost drivers is oftentimes the first and most powerful step in the cloud cost-optimization journey. “The data and visibility has been [most] impactful with my team. Before, we were drowning in spreadsheets and watching Excel crash every day,” he recalls. “Our reporting is much easier now with the same inputs, same reports, and same visibility. But also, the entire org is benefiting as we’re peeling back the layers, seeing what’s under the hood. [We’re] able to tell people what they have, what their costs are.”
UKG is transitioning to a value-stream operational model, where a product owner is responsible for their own cloud budget. Previously, they had one cost center across the entire cloud bill, then manually completed journal entry to assign cloud costs. With the new system, which will eventually incorporate labeling and chargebacks directly into POs, individual teams will be responsible for their own cloud costs.
But labeling remains an organizational challenge. As such, the challenge to the FinOps team will be to clearly articulate costs, break them down month by month, and not only enter them but also assign them in the payment system. “The granularity of the reporting isn’t as it could be, but we’re getting there,” Peter says.
Whether your organization is a well-established enterprise or a small business just getting started with a Cloud FinOps practice, determining whether to build, buy, or both is really about understanding your business objectives, then selecting the right tools to meet them.
There are plenty of out-of-the-box Google Cloud billing tools to make sense of your cloud spend. But if, like UKG, your enterprise is considering or working on building an advanced reporting solution, we recommend a rationalization exercise to understand what you have as a first step.
To optimize reporting tools, we’ve found that organizations must often work backward to understand their desired outcomes and business objectives. Some critical questions to consider include:
What are the goals of your FinOps org?
What are you trying to accomplish in the near and long term?
Where is your team’s time best spent?
Do you have the appropriate tool set to augment your time and accomplish your goals?
Hopefully, you’ve taken away some useful insights to help you get more value from your cloud investment. If you need support to determine the right solutions for your company, we’re happy to help. There’s much more to learn, so please remember to join us every other week for our newest episode of Framing up FinOps on Twitter Spaces.
Special thanks to Googlers Sheri Cunningham, Cloud FinOps Team Lead, and Nick Davidson, Technical Account Manager for UKG, as guest speakers on our Framing up FinOps.
Read More for the details.
IT infrastructure is essential to the success of any organization. Unfortunately burnout is a very real threat to many IT teams’ wellbeing and productivity, compromising their ability to deliver and maintain that infrastructure. We’ve seen that utilizing the public cloud can reduce burnout, but many IT teams still aren’t sure how to make that switch. That’s why we published a new ebook where we follow a day in the life of On-Prem Pete and Cloud Carrie, two infrastructure managers whose workdays play out very differently. By following their paths, it quickly becomes clear how switching to cloud can make life much easier for both IT staff and developers.
As you may know from personal experience, IT and infrastructure teams are often under a lot of stress, some of which might be exacerbated by non-ideal IT landscapes in on-prem data centers. A few of the most common reasons that IT teams may feel stressed include the ever-changing nature of technology, the push for integrating new AI solutions, the need for staff to be constantly available, the responsibility to keep data safe, the pressure to perform, and work-life imbalance.
All of these pressures can take a toll on IT teams, sometimes leading to burnout — a state of physical and emotional exhaustion that can cause a loss of motivation or efficiency. In fact, as you can see in the infographic below, 74% of employees have experienced burnout at some point in their careers, and there was a 500% increase in searches for ‘pandemic burnout symptoms’ from 2020 to 2022. Clearly, burnout is a very real issue.
Making this even more challenging? A lot of people don’t even know when they’re burned out — especially since burnout symptoms can vary from person to person. However, there are some common symptoms that you should look out for:
Exhaustion. This is one of the most common symptoms of burnout. You feel tired all the time, even when you’ve had a good night’s sleep. You may also feel like you are running on empty.
Cynicism. With this common symptom, you may start to feel like nothing you do makes a difference, or that your work is pointless.
Lost motivation. You may start to feel like you don’t care about your job anymore, or that you’re just going through the motions.
What’s worse is that in addition to the human impact, burnout can also impact the business. That’s why businesses should take avoiding burnout among employees seriously! Thankfully, there are many things you can do to prevent burnout, with some of the most important being:
Setting boundaries. It’s important to establish boundaries between your work life and your personal life. This means not checking work emails or messages outside of work hours. It also means taking time for yourself, even when you are busy, and having clear conversations with your management teams about what is and isn’t realistic.
Taking breaks throughout the day. This will help you from feeling overwhelmed. You also need to take vacations and use your sick days.
Talking to someone. It’s important to talk to someone about how you are feeling. This could be a friend, family member, therapist, or counselor. Talking to someone can help you feel better and get the support you need.
Making changes. If you are feeling burned out, it may be time to make some changes. This could include changing your job, your work environment, or your lifestyle.
Understanding burnout — especially its symptoms — is the first step you can take to avoiding it. The second step you can take is finding ways to accomplish your work goals as optimally and efficiently as possible. We hope our free ebook and infographic can help you accomplish all of your IT goals — while maintaining your wellbeing and productivity along the way.
Read More for the details.
Large Machine Learning (ML) models – such as large language models, generative AI, and vision models – are dramatically increasing the number of trainable parameters and are achieving state-of-the-art results. Increasing the number of parameters results in the model being too large to fit on a single VM instance thus demands distributed compute to spread the model across multiple nodes. Google Kubernetes Engine (GKE) has built-in support for NCCL Fast Socket, to help improve the time to train large ML models with distributed, multi-node clusters.
Enterprises are looking for faster and cheaper performance to train their ML models. With distributed training, communicating gradients across nodes is a performance bottleneck. Optimizing inter-node latency is critical to reduce training time and costs. Distributed training uses collective communication as a transport layer over the network between the multiple hosts. Collective communication primitives such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, and point-to-point send and receive are used in distributed training in Machine Learning.
The NVIDIA Collective Communication Library (NCCL) is commonly used by popular ML frameworks such as TensorFlow and PyTorch. It is a highly optimized implementation for high bandwidth and low latency between NVIDIA GPUs. Google developed a proprietary version of NCCL called NCCL Fast Socket to optimize performance for deep learning on Google Cloud.
NCCL Fast Socket uses a number of techniques to achieve better and more consistent NCCL performance.
Use of multiple network flows to attain maximum throughput. NCCL Fast Socket introduces additional optimizations over NCCL’s built-in multi-stream support, including better overlapping of multiple communication requests.
Dynamic load balancing of multiple network flows. NCCL can adapt to changing network and host conditions. With this optimization, straggler network flows will not significantly slow down the entire NCCL collective operation.
Integration with Google Cloud’s Andromeda virtual network stack.This increases overall network throughput by avoiding contentions in virtual machines (VMs).
We tested (NVIDIA NCCL tests) the performance of NCCL Fast Socket vs NCCL on various machine shapes with 2 node GKE clusters.
The following chart shows the results. For each machine shape, the NCCL performance without Fast Socket is normalized to 1. In each case, using NCCL Fast Socket demonstrated increased performance in a range of 1.3 to 2.6 times faster internetwork communication speed.
As a built-in feature, GKE users can take advantage of NCCL Fast Socket without changing or recompiling their applications, ML frameworks (such as TensorFlow or PyTorch), or even the NCCL library itself. To start using NCCL Fast Socket, create a node pool that uses the plugin with the –enable-fast-socket and –enable-gvnic flags. You can also update an existing node pool using gcloud container node-pools update.
To achieve better network throughput with NCCL, Google Virtual NICs (gVNICs) must be enabled when creating VM instances. For detailed instructions on how to use gVNICs, please refer to the gVNIC guide.
To verify that NCCL Fast Socket has been enabled, view the kube-system pods:
And the output should b similar to:
To learn more visit GKE NCCL Fast Socket documentation. We look forward to hearing how NCCL Fast Socket improves your ML Training experience on GKE.
Read More for the details.
Introducing resize capability for Azure dedicated hosts
Read More for the details.
Microsoft Dev Box is generally available now. Dev Box is a virtualized solution that empowers developers to quickly spin up self-service workstations preconfigured for their tasks while maintaining centralized management to maximize security and compliance.
Read More for the details.
Set BGP community tags on traffic sent from Azure to on-premises over ExpressRoute, enabling a greater variety of hybrid network designs.
Read More for the details.
Are you ready to start your IPv6 journey? Is your cloud provider ready to start it with you? Google Kubernetes Engine (GKE) supports dual-stack Kubernetes clusters to help your journey to IPv6 while ensuring your applications are v6-ready. And to address the operational demands for IPv6 workloads, we’re adding several features to GKE networking to expand protection for both inbound and outbound IPv6 traffic, making them more highly available, secure, and observable.
The following features for dual-stack GKE clusters are now IPv6-aware, making it easier to enable v6 workloads with solutions that use both v6 and v4 Pods:
Load Balancer Services
FQDN Network Policies
Dataplane V2 observability
These new features complement the extensive work we’ve been doing for GKE to support IPv6 at the same level as we do IPv4. For example:
Dual-stack clusters – We’ve supported IPv4 and IPv6 front-ends with Ingress for some time, and our managed Gateway API has supported them since it launched. As of December 22nd, 2022, dual-stack GKE clusters have been available with global unicast addresses (GUA) as well as unique local addresses (ULA) on Google Cloud VPC networks. With GKE’s dual-stack clusters, both nodes and Pods get an IPv4 and IPv6 address to enable communication with both IP address families.
DNS support – GKE supports both IP address families with multiple DNS solutions. From inception, kube-dns supports dual-stack with both A and AAAA records. GKE also provides a more robust, scalable and performant DNS service through Cloud DNS. This Google Cloud-native DNS integration includes in-cluster name resolution with full support of IPv4 and IPv6 records.
Dual-stack Kubernetes Services – For Services, either single-stack IPv4, single-stack IPv6, or dual-stack addresses can be allocated. When we released dual-stack clusters, we supported clusterIP and nodePort Services. These fundamental constructs enable IPv6-capable Kubernetes workloads to be connected in a cluster.
Serving IPv6 to the world – GKE clusters have long been able to expose your workloads in a highly available manner through Kubernetes Ingress services on Google Cloud. By deploying your Gateway and Ingress services on GKE, you get the benefits of Google Networking at the edge to serve and protect with IPv6! Both Kubernetes Gateway API and Ingress on GKE use our tried-and-true Google Cloud Load Balancers, giving you the assurance of proven infrastructure. Additionally, while serving IPv6 to the world, you can protect your applications with Google Cloud Cloud Armor security policies. For example, you can reference your Cloud Armor security policy on your Gateway or the Backend Configuration CRD on your Ingress to define allow and deny lists with IPv6 addresses.
Now, let’s take a look at the latest IPv6 features and capabilities we’ve developed for GKE.
We’re excited to announce that the Service type LoadBalancer is now available with dual-stack capabilities. This means you will be able to create Kubernetes LoadBalancer Services and specify their IP families. As a benefit of running GKE, these are deployed as Google Cloud Network Load Balancers, which can be addressed either publicly or privately with the IP address family of your choice (i.e. IPv4-only, IPv6-only, or both).
Here’s an example of a YAML that you can use to create a dual-stack Kubernetes LoadBalancer Service on GKE exposed as a Google Cloud Network Load Balancer:
Once you’ve created a dual-stack Kubernetes LoadBalancer Service, you can confirm that both an IPv4 and IPv6 address have been assigned to the Service:
You can use the standard Kubernetes API to create dual-stack Load Balancers and apply GKE annotations as you wish.
We are advancing our capabilities for GKE with dual-stack support for fully qualified domain name (FQDN) Network Policies. This exciting feature advances the Network Security posture of workloads deployed on GKE to account for IPv6-capable applications.
By leveraging both A and AAAA records, FQDN Network Policies seamlessly provides advanced network security for both IPv4 and IPv6 address families. FQDN Network Policies enforce egress traffic policies when a workload reaches out to specific destinations that are outside of GKE cluster(s) resolving as IPv4 or IPv6 addresses. FQDN is additive to any existing endpoints allowed by the egress Network Policy. Once FQDN Network Policies are created and applied as an egress policy, an implicit DENY is applied for all endpoints that are not specified as an allowlisted destination.
These capabilities provide network security consistency across both IPv4 and IPv6 as you bring your IPv6-capable workloads onto GKE.
Opening up a world of metrics — our GKE Dataplane V2 observability launch brings visibility into your IP4/IPv6 workloads. This feature set includes metrics and troubleshooting tools to make your dual-stack GKE clusters operationally ready. The GKE Dataplane V2 observability stack enables you to have dual-stack Pod traffic metrics for the network info you care about. You can use Cloud Monitoring Metrics Explorer to monitor Dataplane V2 metrics for your IPv6 workloads, while our Managed Hubble solution for IPv6 Kubernetes workloads on GKE lets you troubleshoot the environment. The open source Hubble project is an observability platform built on Cilium and eBPF. Built for GKE’s Dataplane V2, our Managed Hubble UI gives you visibility into connection information and Network Policy enforcement in the form of a service map and a Network Policy verdict table. Finally, a CLI for interactive live troubleshooting lets you better understand your dual-stack Kubernetes workloads.
We hear from our users that dual-stack clusters are the stepping stones to an IPv6-only world. Together, this suite of features improves the operational readiness of your Kubernetes workloads for IPv6. Going to production with IPv6 implicitly means showing operational readiness in terms of high-availability, security, and observability. These releases should increase your confidence running dual-stack workloads on GKE.
For further reading, check out these resources on our current dual-stack capabilities.
GKE dual-stack clusters network overview
GKE dual-stack cluster creation on an IPv4/IPv6 network
GKE Gateway configuration for Cloud Armor security policies
Read More for the details.
Kubernetes is an increasingly key part of the application deployment strategies at large organizations, and one of the most recommended options for the teams we work with. An idea brought to life at Google, organizations throughout the world use containers and Kubernetes on-premises, in Google Cloud, or in a multi-cloud scenario, and it has emerged as a leading application deployment platform. And because it’s open source, anyone can pop the hood, so to speak, to see how each component of it works, creating a trusted, verifiable framework that users can rely on.
Customers begin their Kubernetes and containerization journeys using various offerings in our product portfolio. Some, like Colgate, started their modernization journey with Kubernetes. Colgate is an $18B global consumer products company with ~34,000 diverse and dedicated people serving over 200 countries and territories. Through science-led innovation, they drive growth and reimagine a healthier future for all people, their pets, and our planet.
During ideation, they talked through various considerations: What technology strengths does their organization have? What skills do their teams need? What will this initiative look like a decade from now? They implemented the following Kubernetes-focused architecture:
Google Cloud also helped Colgate break new ground over the years, especially in the areas of cloud-native networking, security, monitoring, pub/sub, managed containerization, and multi-tenant environments.
Over time Colgate began to leverage Google’s managed container portfolio, which includes Cloud Run. Cloud Run lets you run containers on top of a serverless platform, unlocking workload possibilities for public websites, private services, APIs and batch jobs and eliminating a lot of the time spent on infrastructure management. Cloud Run also requires no prior knowledge of Kubernetes or containers.
Many teams have found that they prefer the serverless, hands-off approach that Cloud Run provides, and Colgate now evaluates Google Cloud’s serverless solutions alongside GKE for any applications destined for the cloud. At the same time, Cloud Run lets them continue to leverage their investment in workloads based on the Open Container Initiative.
For example, Cloud Run is designed for Kubernetes compatibility with consistent management capabilities such as the ability to manage resources using kubectl via the config controller, and the ability to browse logs and metrics from both platforms in Cloud Logging and Monitoring. Cloud Run and GKE data planes are also interoperable, allowing Cloud Run and GKE services to be exposed behind a VPC behind private IPs using an internal load balancer. Cloud Run as an option has contributed to faster innovation, allowing Colgate to bring smiles to many more faces globally.
Google’s managed container offerings provide a composable and comprehensive set of solutions for customers’ applications. At Colgate, container-based managed services are used across the stack: on the front-end, where they use Identity-Aware Proxy to manage authentication and External Load Balancers to handle incoming traffic with high availability and low latency; at the application layer, where they can choose from Cloud Functions, Cloud Run, or GKE, depending on the level of control they need over the application; and at the internal load balancing level where NGINX® controllers serve internal applications. Together, these managed services ensure that Colgate has the flexibility to choose the right toolchain and maximize their goals for each use case.
Colgate wanted to build internal applications on Cloud Run in a way that complied with their organization’s policies while maximizing developer productivity. They were able to use new features like the Cloud Run Identity Aware Proxy GA to build a secure, serverless deployment for their applications.
Colgate and Google Cloud have enjoyed a deep partnership for many years, engaging across many technologies, teams, and design patterns.
They’ve engaged with Product and Engineering across compute, networking, Kubernetes, and serverless as they brought this new way of thinking to their users.
For Colgate and many of our customers looking to address the needs of the modern user, Google’s managed container offerings are a breath of fresh air. Its reliability, scalability, and control offer the flexibility to build applications that meet the demands of both internal and external consumers. In addition, the variety of container offerings available on Google Cloud — GKE and Cloud Run — allow customers to make app deployment decisions based on the amount of Day 2 operations the users are willing to take on. Platform administrators appreciate the reduced management effort, users enjoy reduced downtime, and developers can simply deploy.
Google Cloud contributors: Rex Orioko, Rachel Tsao
Colgate-Palmolive contributors: Matthew Tattoli, Nicholas Farley, and David Wiser
Read More for the details.
As companies modernize their applications and infrastructure, many IT stakeholders are looking to open source technologies to help them do so. One area where open source is gaining traction is in relational database management systems (RDBMS). Open-source RDBMSs such as PostgreSQL are becoming increasingly popular alternatives to commercial RDBMSs such as Oracle and Microsoft SQL Server. There are several reasons for this shift.
First, open-source RDBMSs are often more cost-effective than legacy, proprietary RDBMSs. They are typically free to download and use, and there are no licensing fees.
Second, open-source RDBMSs are often more flexible and scalable than legacy RDBMSs. They can be easily customized to meet the specific needs of an organization, and they can be scaled up or down as needed.
Due to these factors, open source RDBMSs are becoming a popular choice for organizations of all sizes. They offer a number of advantages over legacy RDBMSs, including cost-effectiveness, flexibility and scalability.
Migrating from Oracle/SQL Server to PostgreSQL can be a challenging task. The migration strategy will vary depending on your specific goals, but there are some general strategies that can help to smooth the migration process:
Utilize Database Migration Assessment (DMA): There are a number of assessment tools that will provide insights detailing the level of effort required to modernize an Oracle or SQL Server database to PostgreSQL. To expedite this step, customers can leverage the Google Cloud first-party tooling called DMA to assess Oracle/SQL Server database complexity. It’s Google Cloud’s “no cost” database assessment solution, which estimates the migration effort, cost, and anticipated return on investment. We’ll talk about DMA in more depth below.
Leverage code/schema conversion tooling: These tools can help to reduce the risk of human error and improve the overall efficiency of the migration. Leverage Google Cloud tooling to automatically convert your schema and code from Oracle/SQL Server to PostgreSQL.
Migrate data in stages: Migrating all of your data at once can be a risky proposition. It is often better to migrate your data in stages, starting with a small subset of data and then gradually migrating more data over time. Depending on your requirements, we can help you build a data migration strategy. Whether you can tolerate downtime or need real-time replication, we offer data migration solutions to meet your requirements.
Make application changes: Switching a database engine often involves application changes. We can help scan your application repositories and identify the SQL statements that may need to be altered to run on the new database engine.
Test the migrated data: Once you’ve migrated your data, it’s important to test it to make sure that it is working properly. There are a number of tasks required in this stage: First you need to validate that all the data is synced between the source and the target; you can leverage our tooling for this task. Second, you’ll need to perform application functional testing to validate that the application behaves as expected. Lastly, we recommend performance testing.
By following these general strategies, you can help to ensure that your database migration from Oracle/SQL Server to PostgreSQL is smooth and successful.
The rest of this blog will focus on our database assessment process, which is the starting point in this cloud journey. A database assessment brings together industry best practices, database experts and data collection tooling to provide you with a migration strategy to tackle your Oracle and SQL Server migrations to PostgreSQL.
The Database Migration Assessment (DMA) is a no-cost customer engagement in which we collect database metadata and deliver a detailed customized customer readout. The collected data contains detailed metadata spanning database objects types, PL/SQL or T-SQL code, database features in use, current resource usage and the workload characteristics. This data is collected using our *new* first partyDMA tool.
Once the metadata is collected, it gets processed and our database expert team delivers a detailed customized readout containing:
Google Cloud target state database recommendations based on your current on-premises database workload characteristics.
Migration effort: This reflects the level of effort required in hours to convert the Oracle/SQL Server schema and code to an open source database like PostgreSQL.
Right-sizing for the cloud: Current on-premises resources usage information is used to right-size your Google Cloud database, so that you only pay for resources that you actually need.
Complete migration plan: Identify your first-mover databases and create migration waves for subsequent databases.
Below is an excerpt of a customer readout, in which we looked at 14 Oracle databases. In this example, our customer wanted to modernize from Oracle to either Cloud SQL for PostgreSQL or AlloyDB for PostgreSQL. Originally, the customer’s perception was that their Oracle databases would be too complex to modernize and that it would require too much effort. Once we ran the migration assessment with DMA, we were able to determine that 89% of their database environment could be automatically converted by using Google Cloud schema/code conversion tooling. We determined this based on the graph below, in which each database is listed and the compatibility between Oracle/SQL Server and PostgreSQL is displayed.
For each database, we categorize database objects and code into three buckets. The green bucket identifies the database object that can be automatically converted using the Google Cloud tooling. The yellow bucket identifies the database objects/code that require some manual work. Lastly, the red bucket identifies the database objects that need to be completely refactored because the Google Cloud schema/code conversion tooling cannot automatically convert these objects. In summary, the more green there is on the chart, the more objects we can automatically convert using Google Cloud conversion tooling, and the less effort is required by the customer.
For this customer, the results were better than they expected. Originally, their overall impression was that there was a lot of PL/SQL code in these Oracle databases which were making these environments sticky. However, this visualization showed that most of their Oracle objects and code could be automatically converted.
Next, we showed the customer the level of effort required to convert each of their databases. The graph below displays all the databases ordered by the level of effort (shown in hours of time) required to complete a migration. The first database listed will take the least amount of time to migrate, whereas the last database will take the most amount of time to migrate. This data is useful to help you identify the first-mover databases and define your migration waves. In other words, you can define a database group to migrate in phases.
So, how do you start? As a recommendation, we suggest starting with the least complex database first as opposed to the most complex. This will allow you to begin building a migration blueprint with repeatable processes and get you comfortable with the Google Cloud schema/code conversion and data movement tooling. These learnings in combination with a repeatable process will set the stage to expedite migrating your more complex databases.
This customer was delighted with these results and concluded that the level of effort required to modernize their Oracle databases to AlloyDB offered a high return-on-investment (ROI), as they were moving away from an expensive licensing/support structure to a cost-effective fully managed database service.
In this example, we only provided an excerpt of the customer readout that we delivered to the customer. However, a full customer readout provides a wealth of information. We can drill down multiple levels, right down to the individual database objects that make up the hours of effort calculation.
The DMA tool is open source and is available on Github. It currently supports Oracle and SQL Server databases:
For Oracle, DMA supports both AWR and non-AWR (statspack) database assessments based on customer licensing policy.
For MS SQL, DMA leverages Perfmon data and only collects metadata.
The table below lists the type of data that is collected with the DMA tool.
If you are currently using Oracle or SQL Server databases and are looking to modernize them onto an open-source database like PostgreSQL, you can download the DMA tool from our Github repository. Or, reach out to your Google Cloud sales representative and schedule an assessment. Our database experts can provide recommendations, best practices and guidance on modernizing these databases, which in turn will provide a data-driven approach to assessing if modernizing your Oracle or SQL Server database is right for you.
Read More for the details.
Cloud SQL is Google Cloud’s enterprise-ready, fully managed database service for running MySQL, PostgreSQL and SQL Server workloads. It’s used across industries from digital services to banking to retail, with more than 95% of Google Cloud’s top 100 customers using Cloud SQL today. As more demanding workloads move to the cloud, some of you have been asking us for higher performance and availability. In addition, you’ve been asking us for more choice and flexibility to optimize your configurations for your individual workload needs.
Today, we’re announcing the Cloud SQL Enterprise Plus edition for MySQL and PostgreSQL, featuring three major enhancements. First, we have significantly accelerated both read and write performance with a variety of software optimizations, improved machine types and configurations, and an integrated SSD–backed data cache option. With Cloud SQL Enterprise Plus for MySQL, we deliver up to 3x higher read throughput and up to a 2x improvement in write latency compared to the current Cloud SQL offering. Second, we deliver near-zero downtime for planned maintenance operations and a 99.99% availability SLA inclusive of maintenance to satisfy your availability goals for your business-critical applications. Third, we enhanced the data protection capabilities with 35 days of log retention, a must-have for all organizations with strict compliance requirements. Cloud SQL Enterprise Plus edition addresses these needs while building on the already proven capabilities of Cloud SQL.
The existing version of Cloud SQL will continue with no changes to features or pricing, but will now be known as the Cloud SQL Enterprise edition.
Cloud SQL Enterprise Plus for MySQL delivers superior performance through co-optimized hardware and software configurations while maintaining full open-source compatibility. For read-intensive transactional workloads, Enterprise Plus edition for MySQL, with configurable data cache, delivers up to 3x higher read throughput compared to Enterprise edition. Data cache leverages flash memory as a way to transparently extend caches based on DRAM, lowering read latency, improving throughput and scaling to larger data sets. Additionally, software optimizations deliver up to a 2x reduction in transaction commit latency and up to a 2x improvement in write throughput, making Cloud SQL the best destination for your most demanding MySQL workloads.
“We are delighted to join forces with Google Cloud in launching Cloud SQL Enterprise Plus edition for MySQL,” says Jonathan Blais, CTO and Co-founder of Everflow Technologies. “This new edition has enabled us to immediately reap the benefits of enhanced performance, availability, and data protection without making any modifications to our application. In particular, we have observed more than 3x improvement in query performance for clients who run intensive reports on our platform after incorporating the Data Cache. We look forward to partnering with Google Cloud and Cloud SQL on future innovations.”
Availability and reliability are a key focus and Cloud SQL Enterprise Plus edition offers a 99.99% SLA, inclusive of maintenance. During planned maintenance, Enterprise Plus instances use rolling updates behind the scenes and additional hardware resources to reduce downtime to less than 10 seconds.
Enterprise Plus edition also provides a higher level of data protection, with 35 days of log retention for point-in-time-recovery, supporting demanding regulatory needs and data protection requirements that your business critical workloads need.
“At Workday, it’s important to be able to offer great user experience to our customers and as we moved to the public cloud, it was clear that we needed a high-performance database to support our requirements. We worked closely with the Google Cloud team to invest in Cloud SQL improvements and with the launch of the Cloud SQL Enterprise Plus edition, we now have options for high performance and availability for our most demanding Workday workloads, leveraging the higher read and write performance, higher availability, and minimal planned downtime. We are really excited to co-innovate with Google and benefit from these new capabilities along with Enterprise-grade features of the managed services,” says Chirag Andani, VP, Persistence and LifeCycle Engineering at Workday.
Cloud SQL Enterprise Plus for PostgreSQL delivers up to 2x improved write throughput compared to Enterprise edition. Similar to the MySQL offering, Cloud SQL Enterprise Plus for PostgreSQL co-optimizes hardware and software to deliver this improved performance.
All of the availability, reliability and data protection enhancements mentioned for MySQL, are also available with Cloud SQL Enterprise Plus for PostgreSQL: a 99.99% availability SLA inclusive of maintenance, reduced planned downtime of less than 10 seconds, and a higher level of data protection with 35 days of log retention.
“Partnering with Google Cloud and using Cloud SQL Enterprise Plus edition for PostgreSQL has allowed us to immediately take advantage of increased performance, availability and data protection while maintaining the open-source PostgreSQL compatibility that our applications require. With our very first workload, we saw 2x improvement in sustained performance (TPS and query latency) out of the box,” says Richard Helms, Manager, Database Engineering, Salesloft.
Cloud SQL Enterprise Plus edition is now available in select Google Cloud regions via gcloud and API and will be available in the cloud console by July 17, 2023. Cloud SQL Enterprise Plus edition for SQL Server will follow later.
Getting started is easy: visit the Cloud SQL console and create a database instance with just a few clicks. New Google Cloud customers can create a free trial instance and get $300 in free credits. To learn more about Enterprise Plus edition, see our documentation.
Read More for the details.
The race to increase global mobile broadband access has been on for years. As more people depend daily on fast, reliable internet, telecom companies are looking to continuously improve and expand their services. The challenge for many is: how to accomplish this on a massive scale quickly and cost effectively?
The answer lies in advanced data analytics.
MTN Group is the leading telecom provider in Africa, offering voice, data, fintech, and enterprise solutions to over 270 million customers across 19 nations. It was one of the first companies on the continent to build a data science platform that would de-silo information across all markets.
Let’s look at how MTN Group uses Google Cloud solutions and partners with Deloitte to achieve its goals and maximize the impacts.
MTN Group launched its Advanced Data Analytics Management (ADAM) Platform to transform how it gathers, analyzes, and uses data across the business.
“The ADAM Platform is complex, but its objective is simple: improve customer experiences by understanding their preferences in near real-time and tailoring experiences accordingly,” says Mohanoe Mokhitli, General Manager, Data and Analytics, Group Business Intelligence Competency Centre, at MTN Group. “Google Cloud was the clear choice as the technology foundation due to its connectivity across Africa, scalability, and analytics tools.”
The company chose to work with partner Deloitte to build a blueprint of the platform on Google Cloud that could be deployed in all 19 markets it operates in. Deloitte provided expertise in Google Cloud solutions and a presence across many African nations to help guide the project from its inception.
“Deloitte brought the skills to make this project work at scale, helping us to cover everything from change management and decision-making to the actual platform tooling,” says Mokhitli. “The partnership just makes sense. We’ve benefited from a lot of knowledge transfer thanks to how closely we’ve worked with Deloitte on the ADAM Platform.”
Deloitte brought essential expertise as MTN looked to adopt a new technology foundation. An important focus was on ensuring the platform was built efficiently and could scale to handle the enormous data quantities associated with 270 million subscribers.
The ADAM Platform relies on BigQuery as its central data warehouse, as well as several other Google Cloud analytics technologies such as Dataproc to accelerate the process of capturing data and gaining insights. Vertex AI has been crucial to the platform’s success, helping make sense of billions of records quickly and efficiently.
With the ADAM Platform up and running, MTN is now processing roughly 4 trillion data records monthly and democratizing insights to derive palpable business value from its data.
“In addition to the sheer power of Vertex AI and BigQuery, we have also benefited from its ease of use,” says Dr Quentin Williams, Associate Director at Deloitte Consulting. “With Google Cloud, we can make sure MTN business users can take advantage of the technologies regardless of their IT experience. This has helped to drive adoption among our business units, which translates to a greater customer impact.”
Furthermore, ADAM enables MTN Group to gain richer, more timely insights into customer needs. For example, sales teams can use insights from the ADAM Platform to decide which services should be packaged together in offers to certain customers.
With access to the platform, data-driven decision making is now more prevalent and streamlined across the MTN Group, as previously siloed initiatives are connected to increase visibility across all markets. Teams benefit because there is less redundant work and they can take quicker action on what’s working well. The result is that each business unit can contribute more to building customer satisfaction and loyalty and reducing churn.
Deloitte and Google Cloud have helped MTN Group manage the technical demands of the project so that MTN Group employees can focus on educating users and increasing adoption.
“If you look back at our business when this project kicked off and compare it to where we are now, we have closed a significant skills gap that held us back from data-driven decision making,” says Mokhitli. “With help from Deloitte and Google Cloud, the ADAM Platform increases our ability to build our business on data-driven insights, which ultimately provides our customers with even better experiences.”
Read more customer stories to learn how Google Cloud and partners like Deloitte can help to transform your business.
Read More for the details.