Azure – General availability: Azure Sphere version 22.02
The 22.02 release includes new and changed features in the OS and the SDK.
Read More for the details.
The 22.02 release includes new and changed features in the OS and the SDK.
Read More for the details.
AWS Backup now supports AWS PrivateLink, providing direct access to AWS Backup via a private endpoint within your virtual private network in a scalable manner. With PrivateLink, you can now simplify your network architecture by connecting to AWS Backup using private IP addresses in your Amazon Virtual Private Cloud (VPC), eliminating the need to use public IPs, firewall rules, or an Internet Gateway.
Read More for the details.
Amazon FinSpace customers can now manage their Amazon FinSpace application users with the AWS SDK and CLI. Using these APIs, customers can integrate Amazon FinSpace into their identity management provisioning process to help Amazon FinSpace meet their organization’s access management rules. For example, when a user joins a quantitative research team that uses FinSpace, they can have a user account in FinSpace automatically created. Similarly, when a user leaves the customer’s organization or switches roles, they can be automatically deactivated in FinSpace as part of their organization-wide application entitlement workflows. This new feature adds to Amazon FinSpace’s existing single-sign-on capability to give customers more effective controls to manage access.
Read More for the details.
Starting today, NAT64 and DNS64 capabilities are available in all AWS Commercial and the AWS GovCloud (US) Regions.
Read More for the details.
The EC2 Hibernation feature is now available in the Asia Pacific (Jakarta) and (Osaka) AWS Regions. Hibernation gives you the ability to launch EC2 instances, set them up as desired, and then pause and resume them again whenever you need to. Your instances and applications will pick up right where they left off instead of rebuilding their memory footprint from a cold boot. Hibernation enables you to maintain a fleet of pre-warmed instances, getting you to a productive state in less time and without modifying your existing applications. Hibernation is available for instances running Linux and Windows OSes.
Read More for the details.
Keeping business running is essential to maintaining employee productivity. Some IT teams like the added assurance of Chrome support to help tackle unexpected issues and avoid potential downtime. In the event of an issue, our browser experts can help troubleshoot online or over the phone and get your team back up and running, no matter what time of day it is.
With Chrome, you have a variety of support options for your business. We’ve put together a list of these below as well as detailed how to start opening tickets today.
With the many support options available, IT teams can choose the one that’s right for their organization.
For starters, many organizations prefer to manage their Chrome deployment using our readily available online resources. Our Help Center hosts articles on a variety of topics with detailed information and step-by-step instructions for how to deploy and manage Chrome in enterprise environments. There’s also an online community where admins can post questions for other Chrome admins and Google teams to troubleshoot issues or learn best practices. However, some companies want direct access to Chrome experts, and we have you covered.
Many enterprises already have access to Chrome’s support team included in their existing agreements with Google. Customers receive Chrome support for their entire organization if they have any of the following:
100 or more Chrome Enterprise Upgrade licenses across Chrome OS or Chrome OS Flex
100 or more Google Workspace licenses
Google Cloud Platform Enhanced or Premium support
If you’re looking for another option, we also offer standalone Chrome Browser Enterprise Support. This option is priced per user or through a site license and requires a minimum amount of 1,000 licenses. If you’re interested in learning more about this option, contact us.
Before creating a support ticket, ensure that you have the appropriate role access level required to receive support.
As a Chrome Enterprise Upgrade or Google Workspace customer, you may need to have Super Admin access or have your Super Admin create this role for you. To create the admin role access, have your super admin navigate to Account>Admin Roles and click Create new role. Check the box under the Security section that says “Support” and click the save button.
For Google Cloud Platform customers, you need to have a Tech Support Editor role before creating any support cases.
After you’ve picked a support option that is right for your organization, and confirmed you had the right administrative access you’re ready to start opening up support tickets.
If you’re a Chrome Enterprise Upgrade or Google Workspace customer, opening tickets is as simple as:
Opening the Google admin console and browse to the home page
Scrolling down and click on the Support button to launch the Help Assistant
Clicking “contact support”, select your product and describe your issue or question
You will be presented with some relevant support articles, and if those don’t answer your question, click on, “This didn’t help, continue to Support”
Select your method of contact and work with Support to resolve your issue or question
If you’re a Google Cloud Platform customer follow these steps to create a new support case:
Sign in to the Google Cloud Console Support page as a support user
Select the project for which you’d like to open a support case
Open “Cases”
Click “Create case”
Complete the required fields and submit the form
While IT teams can’t predict all potential issues before they arise, the Chrome team is always here to help troubleshoot them. Having the right support for your business can be a game changer in today’s hybrid work environment. Choose the right option for your business to stay one step ahead.
Read More for the details.
AWS Application Migration Service (AWS MGN) has added support for Windows Server 2003 (32-bit and 64-bit) and Windows Server 2008 (32-bit and 64-bit). You can now use Application Migration Service to rehost applications running on these legacy operating systems. Application Migration Service has also added support for Windows 10 and Windows Server 2022. View a complete list of the service’s supported operating systems.
Read More for the details.
Following the announcement of updates to the PostgreSQL database by the open source community, AWS has updated Amazon Aurora PostgreSQL-Compatible Edition to support PostgreSQL versions 13.5, 12.9, 11.14, and 10.19 in commercial and AWS GovCloud (US) Regions. These releases contain bug fixes and improvements by the PostgreSQL community.
Read More for the details.
A retailer needs to predict product demand or sales, a call center manager wants to predict the call volume to hire more representatives, a hotel chain requires hotel occupancy predictions for next season, and a hospital needs to forecast bed occupancy. Vertex Forecast provides accurate forecasts for these, and many other business forecasting use cases.
Forecasting datasets come in many shapes and sizes.
In univariate data sets a single variable is observed over a period of time. For example, the famousAirline Passenger dataset (Box and Jenkins (1976): Times Series Analysis: Forecasting and Control, p. 531), is a canonical example of a univariate time series data set. In the graph below, you can see an updated version of this time series that shows clear trend variations and seasonal patterns (source: US Department of Transportation).
Monthly Air Passengers in the US from 1990 to 2020
More often, business forecasters are faced with the challenge of forecasting large groups of related time series at scale using multivariate datasets. A typical retail or supply chain demand planning team has to forecast demand for thousands of products across hundreds of locations or zip codes, leading to millions of individual forecasts. Infrastructure SRE teams have to forecast consumption or traffic for hundreds or thousands of compute instances and load balancing nodes. Similarly, financial planning teams often need to forecast revenue and cash flow from hundreds or thousands of individual customers and lines of business.
The most popular forecasting methods today are statistical models. Auto-Regressive Integrated Moving Average (ARIMA) models, for example, are widely used as a classical method for forecasting, BigQuery ML offers an advanced ARIMA+ model for forecasting use cases.
BQARIMA+ is perfect for univariate forecasting use cases; see this great tutorial on how to forecast a single time series from the Google Analytics public data set.
More recently, deep learning models have been gaining a lot of popularity for forecasting applications. For example the winners of the last M5 competition all used neural networks and ensembles. There is ongoing debate on when to apply which methods, but it’s becoming increasingly clear that neural networks are here to stay for forecasting applications.
Deep learning’s recent success in the forecasting space is because they are Global Forecasting Models. Unlike univariate (i.e. local) forecasting models, for which a separate model is trained for each individual time series in a data set, a Deep Learning time series forecasting model can be trained simultaneously across a large data set of 100s or 1000s of unique time series. This allows the model to learn from correlations and metadata across related time series, such as demand for groups of related products or traffic to related websites or apps. While many types of ML models can be used as GFM, Deep Learning architectures, such as the ones used for Vertex Forecast, are also able to ingest different types of features, such as text data, categorical features, and covariates that are not known in the future. These capabilities make Vertex Forecast ideal for situations where there are very large and varying numbers of time series, and use cases like short lifecycle and cold-start forecasts.
You can build forecasting models in Vertex Forecast using advanced AutoML algorithms for neural network architecture search. Vertex Forecast offers automated preprocessing of your time-series data, so instead of fumbling with data types and transformations you can just load your dataset into BigQuery or Vertex and AutoML will automatically apply common transformations and even engineer features required for modeling.
Most importantly it searches through a space of multiple Deep Learning layers and components, such as attention, dilated convolution, gating, and skip connections. It then evaluates hundreds of models in parallel to find the right architecture, or ensemble of architectures, for your particular dataset, using time series specific cross-validation and hyperparameter tuning techniques (generic automl tools are not suitable for time series model search and tuning purposes, because they induce leakage into the model selection process, leading to significant overfitting).
This process requires lots of computational resources, but the trials are run in parallel, dramatically reducing the total time needed to find the model architecture for your specific dataset. In fact, it typically takes less time than setting up traditional methods.
Best of all, by integrating Vertex Forecast with Vertex Workbench and Vertex Pipelines, you can significantly speed up the experimentation and deployment process of GFM forecasting capabilities, reducing the time required from months to just a few weeks, and quickly augmenting your forecasting capabilities from being able to process just basic time series inputs to complex unstructured and multimodal signals.
For a more in-depth look into Vertex Forecast check out this video.
For more #GCPSketchnote, follow the GitHub repo. For similar cloud content follow me on Twitter @pvergadia and keep an eye out on thecloudgirl.dev
Read More for the details.
Today, Amazon CloudWatch Container Insights adds support for Helm chart for Amazon EKS on EC2 using AWS Distro for OpenTelemetry (ADOT), enabling customers to easily define, install, and upgrade applications built on EKS.
Read More for the details.
Amazon MQ now provides support for RabbitMQ version 3.8.27, which includes several fixes to version 3.8.26. Amazon MQ is a managed message broker service for Apache ActiveMQ and RabbitMQ that makes it easier to set up and operate message brokers on AWS. You can reduce your operational burden by using Amazon MQ to manage the provisioning, setup, and maintenance of message brokers. Because Amazon MQ connects to your current applications with industry-standard APIs and protocols, you can more easily migrate to AWS without having to rewrite code.
Read More for the details.
Amazon Braket, the quantum computing service from AWS, adds support for a new superconducting gate-based quantum processing unit (QPU) from Oxford Quantum Circuits (OQC) named Lucy, located in the UK. With this launch, European customers can now access more types of quantum hardware during the typical EU workday. Furthermore, customers can now run quantum programs on two different superconducting, gate-based devices on Amazon Braket, allowing them to compare and contrast quantum hardware across different providers.
Read More for the details.
Utilize new aggregation pipeline functionality and client-side field encryption with Azure Cosmos DB API for MongoDB version 4.2.
Read More for the details.
Data stewards can now certify assets that meet their organization’s quality standards in the Azure Purview data catalog
Read More for the details.
The semiconductor industry has long sought to do more in fixed release windows; however, new factors including the move to seven, five, and three nanometer (nm) process nodes has driven the scale of compute and data required to a new extreme. Rather than choosing to build and maintain ever-larger data centers to meet that demand, major semiconductor design teams are leveraging Google Cloud. These organizations have discovered that Google Cloud’s elasticity allows them to deliver efficiency in an environment where their compute resources scale to match their needs throughout the design cycle.
The most common way to migrate EDA workflows to the cloud initially is to “lift and shift,” maintaining parity with on-premises storage systems, and plan to modernize the workflow later. Additionally, “bursting” to the cloud enables companies to spin up and down cloud compute resources when additional work needs to be done. This allows semiconductor customers to move applications more quickly and without requiring immediate modifications or leverage on-prem resources during the migration. NetApp’s partnership with Google Cloud provides a unique storage platform for semiconductor customers to lift and shift easily or burst to cloud, and modernize and extract insights out of that data in the future. First, customers can easily migrate their data with enterprise-class cloud storage solutions that can tightly integrate with on-premises NetApp systems, and are optimized and validated for Google Cloud. Then, they are able to modernize their data management with application-driven NetApp storage and intelligent data services, all deployed and managed for you in the cloud.
At a high level, there are two NetApp Cloud Volumes offerings in Google Cloud. NetApp Cloud Volumes Service (CVS), a fully managed storage service and NetApp Cloud Volumes ONTAP (CVO), a self-managed storage service. Unlike CVS, CVO enables customers to leverage on-premises NetApp deployments and utilize SnapMirror and/or FlexCache to seamlessly move data from on-prem to CVO in Google Cloud. With CVO for Google Cloud, you can unlock more savings and performance from Google Cloud while boosting data protection, security, and compliance. In addition, existing NetApp customers will continue to have all the features they know and love, and their investments in training and automation on-premise continue to apply.
The key benefits of using CVO:
Fast Deployment and Easy Configuration
The entire CVO solution can be deployed and deleted in minutes using either NetApp Cloud Manager or Terraform scripts
Configuration Flexibility
NetApp CVO deployment are supported in all 29 Google Cloud regions
Deployments allow flexibility in selecting compute and storage resources to match customer’s specific needs throughout the design cycle
In addition to single node CVO deployment options, customers can deploy highly available storage with a CVO HA configuration. This can be done within a single zone or across multiple zones. The CVO HA configuration provides nondisruptive operations and fault tolerance while data is synchronously mirrored between the two CVO compute nodes.
Monitoring and Logging
In addition to NetApp’s tools, Google Cloud Monitoring & Logging integration allows quick diagnosis and resolution of performance bottlenecks and configuration issues
Advanced data caching and synchronization technologies
Using NetApp data management applications, customers can mirror data between on-premises systems and the Google Cloud, cache data in either location or migrate data to the cloud
Faster MigrationsNetApp CVO solutions maintain parity with on-premises storage system and solution designs which allows customers to migrate their applications more quickly and without requiring modifications to their workflows
Although on-premises computation is still the norm for the majority of EDA customers, increasingly the existing on-premises infrastructure is not able to meet the growing and elastic demands of today’s design requirements. Companies are finding that the elasticity provided by Google Cloud enables designers to get more done within a fixed release window and accommodates ever growing data capacities. Utilizing cloud allows design teams to dramatically improve time to market and eliminate the uncertainty of whether a fixed amount of, potentially aging, on-premises storage and compute infrastructure will be able to satisfy the EDA workload demands.
Whether building a hybrid cloud or migrating completely to the Google Cloud, the scale and gravity of data requires special handling. NetApp advanced data caching, FlexCache, and synchronization technologies can provide customers with the ability to mirror data between on-premises systems and Google Cloud, to cache data in either location, or to migrate data to the cloud while preserving existing workflows.
With NetApp FlexCache, customers can seamlessly connect their on-premises data to Google Cloud. NetApp CVO can be configured to act as both high-performance scalable storage for EDA workloads in the cloud and a cache of on-premises tools and libraries. To an EDA workload in the cloud, these tools and libraries appear to be local. And not only does it eliminate the need to mirror all of the tools and libraries to the cloud, there is no need to actively manage a separate collection of versioned tools and libraries in the cloud. Developers can get just the data they need, where and when they need it. In addition, some customers may choose to provision a FlexCache on-premises so that the results of jobs run in the cloud appear local to debugging tools running on-premises.
EDA workloads present a unique set of challenges to storage systems, where I/O profiles can vary with the software tools in use and the workflow design stage.
When trying to generalize and simulate EDA workloads, we make the following assumptions:
During logical design phases, when a large number of jobs run in parallel, I/O patterns tend to be mostly random, very metadata intensive and access a large number of small files
During physical design phases, the I/O patterns become more sequential, with fewer read/write jobs running simultaneously and accessing much larger files
We evaluated NetApp CVO for GCP performance using a synthetic EDA workload benchmark suite. The benchmark was developed to simulate real application behavior and real data processing file system environments, where each workload has a set of pass/fail criteria that must be met in order to successfully pass the benchmark.
For best performance, CVO should be run in high write speed mode. In this mode, data is buffered in memory before it is written to disk. If CVO High Availability is desired, then the active/active configuration is recommended. In both configurations, CVO delivers the rich feature set of ONTAP and the scale of Google Cloud while preserving the customer’s workflow.
Try the Google Cloud and NetApp CVO Solution. You can also learn more about NetApp Cloud Volumes ONTAP for EDA and learn how to Get started with Cloud Volumes ONTAP for Google Cloud.
In addition, if you are interested to learn more about silicon design on Google Cloud, you can read “Never miss a tapeout: Faster chip design with Google Cloud publication” and “Using Google Cloud to accelerate your chip design process”.
A special thanks to Guy Rinkevich and Sean Derrington from Google Cloud and Jim Holl and Michael Johnson from NetApp for their contributions.
Read More for the details.
Customers are using Kubernetes as the core technology for the transformation cloud era and for many, cloud-native has become synonymous with container-native. Kubernetes builds on 15 years of running Google’s containerized workloads and the important contributions from the open source community. Inspired by Google’s internal cluster management system, Borg, Kubernetes makes everything associated with deploying and managing your application easier.
Recently honeypot.io launched the Kubernetes documentary film that captures the story directly from the people who made it possible, outlining how Google engineers set to work on the container orchestrator that would come to be known as Kubernetes. We have come a long way since Google open-sourced Kubernetes and later made it available to everyone through CNCF.
Our deep belief in open source led not only to Kubernetes, but more broadly to our position that open source is worth investment and that leadership is based on action: contributing back, driving positive change, and working with partners and foundations. In turn, our customers benefit from our “open cloud” leadership and our investment in open source ecosystems and communities.
Kubernetes is not just a technology, it’s a model for creating value for your business, a way of developing apps and services and a means to secure and develop cloud-native IT capabilities for innovation. Google delivers the best infrastructure to run key, cloud native, open source projects — all based on Kubernetes at its core. Google Kubernetes Engine (GKE) makes it easy to recognize the benefits of innovation initiatives without getting bogged down troubleshooting infrastructure issues and managing day-to-day operations related to enterprise-scale container deployment.
The year 2021 saw record adoption of containers and Kubernetes among organizations worldwide. At the same time, built on open source Kubernetes, GKE has become the fully automated, most scalable and cost optimized Kubernetes service in the market. Only GKE provides fully automated cluster lifecycle management, including upgrades and backup/restore. The new revolutionary autopilot mode automatically applies industry best practices and can eliminate all node management operations. We believe new autoscaling in GKE, which allows customers to run 15,000 node clusters, outscales the competition by up to 10 times. GKE provides the industry-first cost optimization capabilities. Other unique features include pre-integrated cluster, node, and container logging and monitoring for system data, all available in one consolidated dashboard.
Latest analyst ratings and feedback confirm GKE’s leadership in the market. In the Gartner Public Cloud Kubernetes Scorecard, GKE’s overall solution score was 92 out of 100, where AWS EKS and Azure AKS scored 87 and 82, respectively. Likewise, in the recent Forrester WaveTM Container Platforms Q1 2022, Google Cloud ranked well ahead of competition because of its leadership in container management.
Register here for the on-demand training event to learn how to build, optimize, and secure your applications with the latest Kubernetes Innovations.
Interested in accelerating your learning and growth on Google Cloud? Sign up here to join Google Cloud Innovators to connect with other members of the technical community and access special events, including technical deep dives and live discussions with Google leaders and experts.
Read More for the details.
Testing the theory that serverless platforms accelerate bringing new ideas to market, we hosted the inaugural Google Cloud Easy as Pie Serverless Hackathon, backed by a $20,000 prize pool. The response was overwhelming: More than 1,500 developers from 70 countries signed up and submitted nearly 100 projects solving issues in health, education, gaming, data, finance, agriculture, and technology industries.
The developers we spoke with ranged from experienced programmers to relative novices. Many of them were using Google Cloud Platform for the first time, for others, the hackathon was an introduction to fully managed serverless products like Cloud Run, Cloud Functions, Eventarc, and Workflows. Advisors from product and engineering hosted introductory sessions and assisted participants with 1-1 mentoring sessions. Of the nearly 100 submissions, our panel of judges from product and engineering selected the top submissions as winners. We are excited to share the top entries:
Combo (grand prize winner): Helps find COVID-19 vaccination availability based on location and vaccination status. This entry presents a virtual assistant chatbot to help find appointments. This project runs a periodic background search process leveraging Cloud Scheduler, Workflows, Eventarc triggers, running in Cloud Functions and Cloud Run. Users can then ask for available vaccination appointments by sending a message to a chat bot created with Dialogflow.
MLess at Scale: Machine Learning operations are complicated by many tools and stacks used during model development. This is particularly true when data scientists that develop prediction models are also tasked with deploying those models into production. This solution was built (in just 10 days!) to help companies quickly deploy ML models into production. This project uses BigQuery ML to create and train a model, then exports it to Google Cloud Storage, and finally deploying it to Cloud Run using Cloud Build.
Venture Capitol: Listing a German startup in the public market can be overwhelming. This entry helps German founders easily navigate the complex, multi-step process. This project leverages Cloud Run, Firebase Auth, Cloud Storage, and Cloud Load Balancer to quickly build and deploy multiple services without having to worry about security patches, infrastructure uptime, resource constraints or server management. Our judges found the use of staging environments and dev processes very interesting.
CloudPress: It is challenging to self-host WordPress at the speed of cloud without breaking the bank. This entry demonstrates how to run WordPress on Cloud Run. This project is a great use-case where developers with elastic load that may often scale to zero can realize great benefits from Cloud Run.
Dank League of Memeing Battlegrounds: A game to free you from boredom. This solution integrates Scheduler, Workflows, Eventarc, Cloud Functions, and Cloud Run. The game uses Cloud Functions to make sure referenced images are still available on the internet and regularly optimizes usage of storage for intermediate training images, and Cloud Run to host the web app. This participant wanted to gain experience with GCP serverless, and they got an A+ from our judges for execution.
Tamil Aadal தமிழாடல்: A game which teaches Tamil words to kids (and adults). With the spiky nature of traffic in gaming, the makers of this game took full advantage of the elasticity offered by Cloud Run to auto-scale very quickly. With maximum and minimum instances, developers with sharp spikes on traffic can save money by balancing performance and limits with the number of instances an application may use.
Serverless technologies help foster a modern low-ops vision, opening up opportunities to developers who don’t need complex infrastructure knowledge. Developing with serverless approaches therefore enables developers to focus on building ideas instead of setting up and managing servers. But, as Kelsey Hightower concluded on Twitter last month, there is still a learning curve in understanding the compute, orchestration, database, and storage systems that are available. We are excited to show through these entries how that curve has demonstrated to be shallower and shorter for our participants, empowering more developers to bring new ideas to market quickly. Get started by trying one of our Cloud Functions, Cloud Run, BigQuery, or Workflows Qwiklabs today.
Read More for the details.
Delivering an automated solution capable of going from images of vehicle damage to actual car part repair/replace predictions has been an elusive dream for many auto insurance carriers. Such a solution could not only streamline the repair process, but also increase auditability of repairs and improve overall cost efficiency. In keeping with its reputation for being an innovative leader in the insurance industry, USAA decided it was time for things to change and teamed up with Google to make the dream of touchless claims a reality.
Google and USAA previously worked together on an effort to identify damaged regions/parts of a vehicle from a picture. USAA’s vision went far beyond simple identification, however. USAA realized that by combining their data and superb customer service expertise with Google’s AI technology and industry expertise, they could create a service that could map a photo of a damaged vehicle to a list of parts, and in turn identify if those parts should be repaired or replaced. If a repair was needed, the service could also predict how long it would take to do so, taking into account local labor rates.
This provided an opportunity to streamline operations for USAA and improve the claims processing workflow, ultimately leading to a smoother customer and appraiser experience. Through our 16-month collaboration, we achieved a peak machine learning (ML) performance improvement of 28% when compared to baseline models, created a modern ML operations infrastructure offering real time and batch prediction capabilities, developed a complete retraining pipeline, and set USAA up for continued success.
To understand how this all came together, our delivery team will explore the approach, architecture design, and underlying AI methodologies that helped USAA once again demonstrate why they’re an industry leader.
Recognizing a key piece of USAA’s vision was for the solution to be customer-centric and in alignment with USAA’s service-first values, Google Cloud’s AI Industry Solutions Services team broke down the problem into several, often parallel, workstreams that focused on both developing the solution and enabling its adoption:
1. Discover and Assess Available Data
2. Explore Different Modeling Approaches and Engineered Features
3. Create Model Serving Infrastructure
4. Implement a Sophisticated Model Retraining Pipeline
5. Support USAA’s Model Validation Process
6. Provide Engagement Oversight & Ongoing Advisory
“The partnership between Google and USAA has allowed us to learn through technology avenues to better support our members and continue to push forward the best experience possible.” described Jennifer Nance, lead experience owner at USAA.
For this work to be successful, it was critical to bring together a team of experts who could not only build a custom AI solution for USAA, but could also understand the business process and what would be required to pass regulatory scrutiny.
USAA and Google had previously developed a service, provided as a REST endpoint hosted on Google Cloud, that takes a photo of an image and returns details about the damaged parts of a vehicle. This system needed to be extended to provide additional repair/replace and repair labor hour estimates.
While the output of the computer vision service was very important, it wasn’t sufficient to make repair/replace decisions and labor hours estimates. Additional structured data about the vehicle and insurance claim, such as the vehicle model, year, primary point of impact, zip code, and more could be useful signals for prediction. As an example, some makes and models of vehicles contain parts that are not easy to acquire, making “repair” a better choice than “replace.” USAA and Google leveraged industry knowledge and familiarity with the problem to explore available data to arrive at a starting point for model development.
Once additional datasets were identified, both the structured and unstructured data had to be prepared for machine learning.
The unstructured data included millions of images, each of which needed to be scored by the existing USAA/Google computer vision API. Results of that process could then be stored in Google Cloud BigQuery to be used as features in the repair/replace and labor hours models. Of course, sending millions of images through a model scoring API is no small feat. We used Google Cloud Dataflow, a serverless Apache Beam runner, to process these images. Dataflow allowed us to score many images in parallel, while respecting the quota of the vision API.
The structured data consisted of information about the collision reported by the customer (e.g. point of impact, drivability of the vehicle, etc.) as well as information about the vehicle itself (e.g. make, model, year, options, etc.). This data was all readily available in USAA’s BigQuery based data lake. The team cleaned, regularized, and transformed this data into a single common table for model training.
“Having a single table as the input for AutoML on Vertex AI is useful for multiple reasons: 1) You can reference the input data in one place instead of searching multiple databases and sources of all the different data sources 2) It becomes easy to compare metrics and distributions of the data when it comes times to retraining the data 3) It’s an easy snapshot of the data for regulatory purposes.” noted USAA Data Science Leader Lydia Chen.
Once the relevant data sources were identified, cleaned, and assembled, initial models for both the repair/replace decision and the labor hours estimation were created to establish a baseline using AutoML for structured data on Vertex AI. Area under the curve for receiver operating characteristic (AUC ROC) was selected as the metric of choice for the repair/replace decision model since we cared equally about both classes, while root-mean-square error (RMSE) was selected as the metric of choice for the repair labor hours estimation model since it was a well understood number. See more on evaluation metrics for AutoML models on Vertex AI here.
These models required a few iterations to identify the correct way to connect both the image data and the structured data into a single system. Ultimately, these steps proved that the solution to this problem was viable and could be improved with iteration.
Feature engineering is always an iterative process where the existing data is transformed to create new model inputs through various techniques aimed at extracting as much predictive power as possible from the source dataset. We started with the existing data provided, created multiple hypotheses based on current model performance, built new features which supported those hypotheses, and tested to see if these new features improved the models.
This iteration required many models to be trained and for each model to produce global feature importances. As you can imagine, this process can take a significant amount of time, especially without the right set of tools. To rapidly and efficiently create models that could screen new features and meet USAA’s deadlines, we continued using AutoML for structured data on Vertex AI, which allowed us to quickly prototype new feature sets without modifying any code. This fast and easy iteration was key to our success.
Domain expertise was another important part of the feature engineering process. Google Cloud’s AI Industry Solutions Services team brought in multiple engineers and consultants with deep insurance expertise and machine learning experience. Combined with USAA’s firsthand knowledge, several ‘golden’ features were created. As a simple example, when asking a model if a part should be repaired or replaced, the model was also given information about other parts that were damaged on the same car in the same collision.
Ultimately, the feature engineering process proved to be the best source for model performance improvement.
We experimented with three separate modeling approaches during this collaboration, with a focus on identifying the best models to productionize in later workstreams. These approaches included leveraging an AutoML model for structured data on Vertex AI, a K-Nearest Neighbors (K-NN) model, and a Tensorflow model created in TensorFlow Extended (TFX) for rapid experimentation. These models needed to cover 6 body styles (coupe, hatchback, minivan, sedan, SUV, and truck). Our selection criteria included model performance, improvement potential, maintenance, latency, and ease of productionization.
Since AutoML for structured data on Vertex AI uses automated architecture and parameter search to discover and train an optimal model with little to no code and provides one click deployment as a Vertex AI Prediction endpoint, productionization and maintenance of the model would be fairly straightforward. On average, AutoML models provided a 6.78% performance improvement over baseline models for the Repair/Replace decision and a 16.7% performance improvement over baseline models for Repair Labor Hours estimation.
The team also explored building and testing a K-NN model to experiment with the idea of “Damage Like Yours.” Since USAA provided a rich dataset containing similar auto claims, the hypothesis was that similar damage should take a similar amount of time to repair. We implemented approximation search to strike a balance between latency (K-NN models required longer inference times) and accuracy using state-of-the-art algorithms (FAISS, citation). On average, K-NN models provided a 8.51% performance decline over baseline models for the Repair/Replace decision and a 4.03% performance decline over baseline models for Repair Labor Hours estimation.
Finally, we explored using Tensorflow and TFX for a single model across all body styles to give the model builder complete control over the model architecture and hyperparameter tuning process. It would also be easy to use Vertex AI Vizier for hyperparameter optimization and to deploy the trained Tensorflow model as a Vertex AI Prediction endpoint, making deployment relatively straightforward as well. On average, TensorFlow models provided a 6.45% performance improvement over baseline models for the Repair/Replace decision and a 16.1% performance improvement over baseline models for Repair Labor Hours estimation.
By the end of the workstream, we had processed 4GB of data, written over 5000 lines of code, engineered 20 new features, and created over 120 models in search of the best model to productionize. Both AutoML on Vertex AI and TFX modeling approaches performed very well and had surprisingly similar results. Each model would outperform the other on various slices of the data. Ultimately, USAA made the choice to use AutoML on Vertex AI going forward because of its superior performance for the most common body styles along with the low administrative burden of managing, maintaining, and improving an AutoML on Vertex AI based model over time. In addition to being a great way to create a baseline, AutoML on Vertex AI also provided the best models for productionization.
“The same features were used as input for the TFX, KNN, and AutoML on Vertex AI models evaluated for comparison. Specific model performance metrics were then used for optimization and evaluation of these models, which were based on USAA’s business priorities. Ultimately, AutoML on Vertex AI was chosen based on its performance on those evaluation metrics, including AUC ROC and RMSE. AutoML on Vertex AI optimizes metrics by automatically iterating through different ML algorithms, model architectures, hyperparameters, ensembling techniques, and engineered features. Effectively you are creating thousands of challenger models simultaneously. Because of AutoML on Vertex AI you can focus more on feature engineering than manually building models one by one.”said Lydia Chen.
After selecting the AutoML on Vertex AI based approach, we created a production capable model serving infrastructure leveraging Vertex AI that could be incorporated into the retraining pipeline. This included defining API contracts and developing API’s that covered all vehicle types and enabled both online and batch classification of “Repair” vs. “Replace” at a configurable % threshold. If the classification decision was “Repair,” then the API would also provide a prediction of the number of Repair Labor Hours associated with the subparts at the claim estimate level. We incorporated explainability leveraging out of the box capabilities within Vertex Explainable AI to arrive at feature importance. After writing test cases and over 1200 lines of code, we confirmed functionality and advised on the automated deployment of the API’s to Cloud Functions using Terraform modules.
Once in production, it’s important to retrain your models based on new data. Since USAA processes millions of new claims a year, new vehicles with new parts and new damage present a constant challenge to models like these. So it was important to learn from and adapt to this new data over time through a proper model retraining process. Model retraining and promotion, however, come with inherent change management risks. We chose to use Vertex AI Pipelines and Vertex ML Metadata to establish an automated and auditable process for retraining all models on newly-incorporated data, tracking the lineage of all associated model artifacts (data, models, metrics, etc.), and promoting the models to the production serving environment.
We created two generic Vertex AI Pipeline definitions: one capable of retraining the six Repair/Replace classification models (one per body style), and the other capable of retraining the six Repair Labor Hours regression models (one per body style). Both pipeline definitions enforced MLOps best practices across the 12 models by orchestrating the following steps:
1. Vertex AI managed dataset creation
2. Vertex AI Training of AutoML models
3. Vertex AI Model Evaluation (classification- or regression-specific)
4. Model deployment to a Vertex AI Prediction endpoint
In addition to the two Vertex AI Pipeline definitions, we leveraged 12 “model configuration” JSON files to define model-specific details like the input training data location, the optimization function, the training budget, etc. Actual Vertex AI Pipeline executions would be created by combining a Vertex AI Pipeline definition and a model configuration at runtime.
We deployed a combination of Cloud Scheduler, Cloud PubSub, and Cloud Functions in order to trigger these retraining pipelines at the appropriate cadence. This architecture enables three triggering methods required by USAA: scheduled, event-driven, and manual. The scheduled trigger can be easily configured depending on USAA’s data velocity. The event-driven trigger enables upstream processes to trigger retraining, such as when valuable new data becomes available for training. Finally, the manual trigger allows for ad-hoc human intervention when needed.
We also created A CI/CD pipeline to automate the testing, compilation, and deployment of both Vertex Pipeline definitions and the triggering architecture described above.
In production, the result of a model-specific Vertex AI Pipeline execution is a newly-trained AutoML model on Vertex AI that is served by a newly-deployed Vertex AI Prediction endpoint. The newly-trained model performance metrics can then easily be compared to the existing “champion” model (currently being served) using the Vertex AI Pipelines user interface. In order to promote the new model to the production serving infrastructure, a USAA model manager must manually update a version-controlled config file within the Model Serving repository with the new Vertex AI model and endpoint IDs. This action triggers a CI/CD pipeline that finally promotes the new model to the user-facing API, as described in the previous section. The human-in-the-loop promotion process allows USAA to adhere to their model governance policies.
“Working with the Google PSO team, we’ve developed a customized approach that works efficiently for us to productionize our models, integrate with the data source, and reproduce our training pipeline. To do this, we leveraged the Vertex AI platform, where we componentized each stage of our pipeline and as a result were able to quickly test and validate our operations. Once each stage of our operation was fine-tuned and working efficiently, it was straightforward for us to integrate implementation into our CI/CD system. As a financial services company, we are even more accountable to model governance and risk that might be introduced into the environment. Having a clear picture of the training pipeline helps demonstrate clarity to partners.” noted USAA Strategic Innovation Director Heather Hernandez.
Historically, the use of algorithms in the insurance industry has required approval by state level regulators. Since 2008, however, some in the industry have faced additional scrutiny at the federal level under SR 11-07 and are now required to understand the risk a machine learning model could potentially pose to their business and available mitigations. While this adds a step in the process to develop and implement machine learning models, it also puts the industry ahead of others in both understanding and managing the risk/impact that comes from leveraging machine learning.
Google Cloud’s AI Industry Solutions Services team utilized its extensive experience in financial services model risk management to document and quantify the new system, while integrating with and conforming to USAA’s model risk management services. This use case also underwent ethical analysis early on in the partnership, as part of Google Cloud’s Responsible AI governance process, to assess potential risks, impacts and opportunities and drive alignment with Google’sAI Principles. Together, Google and USAA created nearly 300 pages of model risk documentation completely documenting and quantifying the acceptable use of the system.
Knowing that true success goes beyond solution development, the Google team provided an additional 6 months’ worth of office hours and support to answer any questions that the USAA team had regarding the solution to best support its use and adoption.
Google Cloud AI Industry Solutions Services leads the transformation of enterprise customers and industries with cloud solutions. To try Vertex AI for free, please visit here and to learn more about how we can help transform your business, please get in touch with Google Cloud’s sales team.
Acknowledgements
This post was written by the authors listed and the rest of the delivery team (Jeff Myers, Alex Ottenwess, Elvin Zhu, Rahul Gupta, and Rostam Dinyari). Each of them contributed significantly to this engagement’s success.
We would also like to thank our counterparts at USAA for their close collaboration throughout this effort (Heather Hernandez, Brian Katz, Lydia Chen, Jennifer Nance, Pietro Spitzer, Mark Prestriedge, Kyle Sterneckert, Kyle Talbott, Bertha Cortez, Jim Gage, Erik Graf, and Patrick Burnett). We would also like to recognize our account team members (Cindy Spess, Charles Lambert, Joe Timmons, Wil Rivera, Michael Peter, Siraj Mohammad), leadership (Andrey Evtimov, Marcello Pedersen), and our partner team members from Deloitte and Quantiphi who made this program possible.
Read More for the details.
It’s a short month, but there’s still many important security updates to discuss. Below, I’ll recap the latest efforts from the Google Cybersecurity Action Team such as our second Threat Horizons Report, and highlight new capabilities from our cloud security product teams who have been working to deliver new controls, security solutions and more to earn the trust of our customers globally.
Earlier this month, I joined a panel at the Munich Cyber Security Conference (Digital Edition) to discuss supply chain risks and cyber resiliency. It was great to see a packed agenda featuring diverse voices from the security industry along with government leaders and policymakers coming together to discuss the challenges we’re working to collectively solve in cybersecurity. One area of particular focus is securing the software supply chain. During the panel, we talked about Google’s approach to building our own internal software and incorporating open source code in a secure way. This has been the foundation of our BeyondProd approach.We implement multiple layers of safeguards like multi-party change controls and a hardened build process that produces digitally signed software that our infrastructure explicitly validates before executing. We’ve since turned this into an open framework that all organizations can use to assess themselves and their supply chains: SLSA. How we collectively as an industry secure the software supply chain and prevent vulnerabilities in open source software will continue to be critical for cloud and SaaS providers, governments and maintainers throughout 2022.
On March 9, we’ll host our first Cloud Security Talks of 2022 that will focus on how enterprises can modernize their approach to threat detection and response with Google Cloud. Sessions will highlight how SecOps teams can leverage our threat detection, investigation and response capabilities across on-premise, cloud, and hybrid environments, including new SOAR capabilities from our recent acquisition of Siemplify. Registerhere.
Here are the latest updates, products, services and resources from our cloud security teams this month:
FIDO security key support for GCE VMs: Physical security keys can now be used to authenticate to Google Compute Engine virtual machine (VM) instances that use our OS Login service for SSH management. Security keys offer some of the strongest protection against phishing and account takeovers and are strongly recommended in administrative workflows like this.
IAM Conditions and Tags support in Cloud SQL: We introduced IAM Conditions and Tags in Cloud SQL which bring powerful new capabilities for finer-grained administrative and connection access control for Cloud SQL instances.
Achieving Autonomic Security Operations: Anton Chuvakin and Iman Ghanizada from the Cybersecurity Action Team shared their latest blog post on how organizations can achieve Autonomic Security Operations by leveraging key learnings from SRE principles. The post highlights multiple ways automation can serve as a force multiplier to achieve better outcomes in your SOC.
Certificate Manager integration with External HTTPS Load Balancing: We released the public preview of our Certificate Manager service and integration with External HTTPS Load Balancing to help simplify the way you deploy HTTPS services for your customers. You can bring your own TLS certificates and keys if you have an existing certificate lifecycle management solution or use Google Cloud’s fully managed TLS offerings. Another helpful feature of this release is integration of alerts on certificate expiry into Cloud Logging.
Virtual Machine Threat Detection: The cloud is impacted by unique threat vectors but also offers novel opportunities to build effective detection into the platform natively. This dynamic underpins our latest Security Command Center Premium capability: Virtual Machine Threat Detection (VMTD). VMTD helps ensure strong protection for VM-based workloads by providing agentless memory scanning that can detect threats like cryptomining malware inside your Google Compute Engine VMs.
Chrome Browser Cloud Management: A large part of enterprise security is protecting endpoints that access the web overall and a big part of this is not only using a secure browser like Chrome, but also how you get to manage and support that. We have a lot of these capabilities in Chrome Browser Cloud Management along with our overall zero trust approach. We also recently extended CIS benchmark coverage to include Chrome.
Google Cloud architecture diagramming tool: We recently launched the brand new Google Cloud Architecture Diagramming Tool. This is an awesome tool for cloud architects, developers and security teams alike, and it’s another opportunity for us to be helpful in providing pre-baked reference architectures into the tools. Watch out for more on this as we build in more security patterns.
Some of the Best Security Tools Might Not be “Security Tools”: Remember, there are many problems in risk management, security and compliance that don’t need specialist security tools. In fact some of the best tools might be from our data analysis and AI stacks such as our Vertex AI capability. Check out these new training features from the team.
Stopping website attacks with reCAPTCHA Enterprise: reCAPTHA Enterprise is a great solution that mitigates many of the issues in the OWASP Automated Threat Handbook and can be deployed seamlessly for your website.
Open source software security: Just a few weeks after technology companies (including Google) and industry foundations convened at the White House summit on open source security, the OpenSSFannounced the Alpha-Omega project. The project aims to help improve software supply chain security for 10,000 OSS projects through direct engagement of software security experts and automated testing. Microsoft and Google are supporting the Alpha-Omega Project with an initial investment of $5 million.
Building cybersecurity resilience in healthcare: Taylor Lehmann and Seth Rosenblatt from Google’s Cybersecurity Action team recently outlined best practices healthcare leaders can adopt to build resilience for IT systems, overcome attacks to improve both security and business outcomes, and above all, protect patient care and data.
Threat Horizons Report Issue 2: Providing timely, actionable cloud threat intelligence to our customers so they can take action to protect their environments is critical and this is the aim of our Threat Horizons report series. Customers benefit from guidance on how to securely use and configure the cloud, which is why we operate within a “shared fate” model that exemplifies a true partnership with our customers regarding their security outcomes. In the latest Google Cybersecurity Action Team Threat Horizons Report, we observed vulnerable instances of Apache Log4j are still being sought by attackers, which requires continued vigilance by customers and cloud providers alike in ensuring patching is effective. Additionally, Google Cloud Threat Intelligence has observed that the Sliver framework is being used by adversaries post initial compromise in attempts to ensure they maintain access to networks. Check out the fullreport for this month’s findings and best practices you can adopt to stay protected against these and other evolving threats.
Assured Workloads for EU: Organizations around the world need confidence they can meet their unique and evolving needs for security, privacy, and digital sovereignty as they use cloud services. Assured Workloads for EU, now GA, allows GCP customers to create and maintain workloads with data residency in their choice of EU Google Cloud regions, personnel access and customer support restricted to EU persons located in the EU, and cryptographic control over data access using encryption keys stored outside Google Cloud infrastructure.
Client Authorization for gRPC Services with Traffic Director: One way developers use the open source gRPC framework is for backend service-to-service communications. The latest release of Traffic Director now supports client authorization by proxyless gRPC services. This release, in conjunction with Traffic Director’s capability for managing mTLS credentials for Google Kubernetes Engine (GKE) enables customers to centrally manage access between workloads using Traffic Director.
Don’t forget to sign-up for our newsletter if you’d like to have our Cloud CISO Perspectives post delivered every month to your inbox. We’ll be back next month with more updates and security-related news.
Read More for the details.
Are you familiar with the four golden signals of Site Reliability Engineering (SRE): latency, traffic, errors, and saturation? Whether you’re a developer or an operator, you’ve likely been responsible for collecting, storing, or analyzing the data associated with these concepts. Much of this data is captured in application and infrastructure logs, which provide a rich history of what is happening behind the scenes in your workloads.
Getting insights from your logs to track those four golden signals can become unruly very quickly as the application scales up, hindering the ability for your developers and operations teams to identify when and where errors are occurring. If you fail to set up your monitoring and logging systems correctly, your Mean Time to Recovery (MTTR) from service impacting events can be impacted.
Google Cloud provides guidance on what to think about when deciding how to set up your logging, monitoring, and alert systems in the operational excellence section of the Cloud Architecture Framework. Google Cloud also provides managed services as part of the operations suite to automate collection, storage and analysis of the four golden signals. Cloud Error Reporting is one such service.
Error Reporting automatically captures exceptions found in logs ingested by Cloud Logging from the following languages: Go, Java, Node.js, PHP, Python, Ruby, and .NET, aggregates them, and then notifies you of their existence. The service intelligently groups together the errors that it finds and makes them available in a dedicated dashboard. The dashboard displays the details of the exception including a histogram of occurrences, list of affected versions, request URL and links to the request log, meaning you can get to the affected resource immediately, with just one click!
How can Error Reporting help your organization today?
Error Reporting helps focus your most valuable resource (i.e Developer attention) on the potential source of exceptions that are impacting your workloads. With the notifications and embedded links, exceptions can quickly be resolved before they impact your customers and bottom line.
What do you have to do to enable Error Reporting?
Error Reporting is automatically enabled as soon as logs that contain error events like stack traces are ingested into Cloud Logging or when you use the API to self configure a service to capture exceptions.
When you use Google Kubernetes Engine and our serverless offerings, application logs written to stdout or stderr will appear automatically in Cloud Logging, and therefore Error Reporting will automatically start analyzing them. To capture logs from applications running on VMs in Google Compute Engine, you will need to install the Ops Agent. From there, app logs will be captured in Cloud Logging and exceptions will flow through to Error Reporting.
To view available error events, visit the Error Reporting page in the Google Cloud Console. You can find it in the left navigation panel or by searching in the search bar at the top of the console.
If you have any questions or want to start a discussion with other Error Reporting users, visit the Cloud Operations section of the Google Cloud Community and post a discussion topic.
Read More for the details.