Google Cloud

2023 01 19

GCP – A journey from App Engine to Cloud Run: Adopting containers and reducing infrastructure costs by 70 percent

Editor’s note: In today’s guest post we hear from SAP about their application modernization journey towards a serverless application architecture with Cloud Run. SAP developers partnered with Google Cloud Professional Services to migrate App Engine workloads to Cloud Run in just three weeks by focusing on small but efficient changes to the application, achieving remarkable cost savings.

SAP is a global market leader in enterprise application software with nearly 110,000 employees. As a tech and data driven company, we use Google Cloud to run a variety of our internal applications. Our application stores and processes approximately 250,000 files per day, which are produced and uploaded by SAPs CI/CD pipelines. Core services such as Google Cloud Storage and Google Pub/Sub helped us achieve this with minimum effort on our infrastructure. We started providing our microservices using Google App Engine in 2019. What’s more, we especially appreciated the beauty, efficiency, and simplicity of App Engine, which allowed us to focus on our code and business logic instead of managing servers.

As part of the migration initiative to consolidate Google Cloud workloads, projects and resources to specific regions, we re-evaluated App Engine as our platform for microservices-based applications. Since App Engine is tied to the Google Cloud project and region, we thought about making a strategic move to Google Cloud Run. This had the advantage of avoiding creating a whole new Google Cloud project just to use App Engine in a new region. In addition, Cloud Run retains the serverless value proposition of App Engine, with the added flexibility of containers plus our event-driven architecture, which perfectly fit together with this serverless approach. Our application traffic varies and can have high peaks but can also be near zero. The flexible scalability of Cloud Run helped us serve the traffic while keeping costs low, since it is able to scale down to complete idle.

With Cloud Run’s source-based deployment approach using Google Cloud’s buildpacks, we did not have to define container manifests or build containers ourselves. The buildpacks can automatically detect the language of our source code and transform it into an executable container image that can run on any container platform. The main changes to get our microservices up and running on Cloud Run, involved adjusting the deployment steps in our pipelines — code changes were not needed.

For a smooth migration, we registered a custom domain and leveraged a Google Cloud HTTP load balancer, with a Google-managed SSL certificate. The load balancer initially was pointing to App Engine services and we could easily switch them one by one to Cloud Run, by updating theURL maps without any impact to our API consumers and end-users.

All this has brought us to the point where today we are completely built on open standards.

Each of our eight microservices faced different infrastructure and scaling requirements and Cloud Run allows us to set suitable configurations for each service individually. It was easy to move the application, especially as we were able to leverage our existing pull integrations with Cloud Pub/Sub for the time being.

For all the above reasons, our small team was able to implement this and bring it to production in just four to five weeks. Cloud Run’smore granular billing model helped SAP to save 70% on compute costs for our application. This was directly visible after migration (see visualization above).

With the support of Google Cloud Professional Services, we were able to achieve this without additional resources while all our services continued to work smoothly.

Adopting Cloud Run and moving away from App Engine was a successful step in our journey of maintaining a state-of-the-art event-driven architecture. Cloud Run’s simplicity and source-based deployment option made this transition remarkably smooth, while the cost savings achieved made it an even greater success.

Learn more in this guide on Cloud Run for App Engine customers and get started using serverless on Google Cloud today.

Read More for the details.

2023 01 19

GCP – Accelerating the grocery pickup experience with Google Maps Platform

Cloud, Google Cloud gcp

Editor’s note: Today’s guest post comes from Craig Hutler, Digital Product Manager, Kroger. To better serve its customers, Kroger began working with Google Maps Platform to create the On My Way initiative, a process to streamline the curbside pickup experience and put food into customers’ hands faster.

With an increase in customer demand for pickup, we are constantly striving to increase capacity and shorten wait times for our customers. By using Google Maps Platform to help power our On My Way initiative, we are able to implement process efficiencies and help lower our cost-to-serve.

Improving the customer experience

When we initially launched the On My Way initiative, we reduced customer wait times and improved our cost-to-serve, a key metric that measures our ability to serve customers more efficiently. It began as a simple button in the Kroger app. When customers were ready to pick up their groceries, they could tap the button to signal their estimated time of arrival (ETA) to their local store.

In the first iteration, we relied on our customers to notify us by hitting the button when they were 15 minutes away, and their ETA would show up on a dashboard in-store. Then our associates would prepare the already-picked order. We realized that we were relying on our customers for accuracy, which meant they could actually be five minutes away (not the requested 15) when our associates are alerted. That means when the customer arrived, their groceries wouldn’t be ready.

To help solve for this, we knew we wanted to use a map that our shoppers are familiar with. We initially ran a pilot for shoppers to place pickup orders through the Google Maps app. While some shoppers liked the experience, it was hard to reach the majority of our customers.

Fulfilling orders faster

We realized that to give customers the easiest, fastest pickup experience, we needed an integrated map within the Kroger app that allowed for a seamless shopping journey, from order through pickup. And Google Maps Platform delivered.

Now, we have an enhanced version of the pickup experience called Geo On My Way, which uses the Maps SDK for Android and Directions API. Without having to leave the Kroger app, customers can voluntarily send their device location directly to their local store at the touch of a button, giving associates an accurate ETA based on route and traffic. Ultimately, this helps optimize our in-store operations, reduce customer wait times, increase customer satisfaction, and deliver on our cost-to-serve metric.

Since implementing Directions API, we reduced our median ETA error by almost half for users who opt in to give us location permissions and engage with Geo On My Way. It also took a full minute off the time the customers wait for their groceries to be loaded into their vehicles. This functionality is now successfully deployed across every pickup location, helping us meet the needs of our customers everywhere.

Future-proofing pickup experiences

Geo On My Way has improved significantly since its early days, and we’re still iterating and optimizing to continue reducing ETA errors. In the future, we plan to use geofencing to help us gauge when customers, who have opted in to share their location with us, enter the parking lot. This should provide even more meaningful insights to our associates and improve the customer experience.

Kroger has always been about providing positive, uplifting experiences. With Geo On My Way, we’ve been able to hone in on the last mile for our customers. This initiative is about more than data and metrics. It’s about helping millions of daily customers have easy access to food, and closing the gap between our doors and their tables.

For more information on Google Maps Platform, visit our website.

Read More for the details.

2023 01 19

GCP – Canary deployments using Kubernetes Gateway API, Flagger and Google Cloud Deploy

Cloud, Google Cloud gcp

Canary deployment is an advanced technique used to test changes in a production environment by gradually rolling out the changes to a small subset of users before fully deploying them to the entire user base. This allows for real-world testing of the changes, and the ability to quickly roll back the changes in the event of any issues. Canary deployments are particularly useful for testing changes to critical parts of an application, such as new features or updates to the database schema. By using canary deployments, you can ensure that any new changes do not negatively impact the user experience, and can fix issues before they affect the entire user base.

The new Kubernetes Gateway API gives you a great new tool for managing traffic to applications running on your Google Kubernetes Engine clusters. Together with Google Cloud Deploy you can leverage this new capability to enable faster releases to production for your applications. At the end of this post, you are going to have a Continuous Deployment pipeline that is using an iterative traffic shift pattern to release your application to production, allowing you to do fast zero downtime deployments of your applications.

Flagger is a OSS tool that allows you to do canary releases or A/B testing in a declarative fashion using your K8S cluster. It monitors configurable metrics from your application to determine the health of your release and controls the release process based on those metrics. It supports metrics from various sources like Prometheus or Google Cloud Monitoring. In this post, I’m using Google Managed Prometheus as a metrics source.

Flagger is often used with service meshes likeIstio orAnthos Service Mesh, but since recently it also supports the new Kubernetes Gateway API for traffic management, that we are using in this blog post. I updated the implementation of Gateway API in Flagger to support the latest version v1beta1, and decided to put together this blog post.

High Level Design

Here is a small architecture diagram of how the components in this blog post connect with each other:

We are going to need several resources in our Google Cloud setup. We are using Artifact Registry to store the container image. Cloud Load Balancing is used for routing traffic to the application. Cloud Deploy is providing us with a managed continuous delivery pipeline that deploys the application to the various environments. Google Managed Prometheus is providing us with observability of the application so that the canary strategy can be data driven.

On the GKE cluster, we are using a 2 namespace setup with a dev namespace for the development environment that is directly deployed from Cloud Deploy and a prod namespace where the K8S deployment is done with a gradual traffic shift using Flagger. For the prod namespace we are also going to deploy a Google-managed Prometheus (GMP) query interface.

Since we are using an internal Cloud Load Balancer, we are going to need a jump host VM on Compute Engine to actually access the application.

Environment setup

Let’s start with setting up our environment. In order to follow this post, you are going to needkubectl,gcloud,jq andskaffold installed on your machine, or you can use Cloud Shell since it has all of them installed. We are also going to set a few variables that will help us in the next steps.

code_block[StructValue([(u’code’, u’export GOOGLE_CLOUD_PROJECT_ID=<your_project_on_google_cloud>rnexport GOOGLE_CLOUD_REGION=<your_google_cloud_region>’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d79933450>)])]

Also we need to enable a few APIs upfront.

code_block[StructValue([(u’code’, u’gcloud services enable compute.googleapis.com –project $GOOGLE_CLOUD_PROJECT_IDrngcloud services enable container.googleapis.com –project $GOOGLE_CLOUD_PROJECT_IDrngcloud services enable clouddeploy.googleapis.com –project $GOOGLE_CLOUD_PROJECT_IDrngcloud services enable artifactregistry.googleapis.com –project $GOOGLE_CLOUD_PROJECT_ID’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d7992d0d0>)])]

In order to set-up Artifact Registry and configure your environment run the following commands.

code_block[StructValue([(u’code’, u’gcloud artifacts repositories create canary-repo –repository-format=docker \rn–location=$GOOGLE_CLOUD_REGION –project $GOOGLE_CLOUD_PROJECT_ID –description=”Docker repository for canary blog”rngcloud auth configure-docker $GOOGLE_CLOUD_REGION-docker.pkg.devrnexport SKAFFOLD_DEFAULT_REPO=$GOOGLE_CLOUD_REGION-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT_ID/canary-repo’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6a8ced10>)])]

We are going to need a proxy-only subnet in our VPC for the Load Balancer. If you don’t already have one, create one with the following command. You might need to change up the IP range to a free range in your network. This example uses the default VPC, but feel free to choose whichever VPC you prefer.

code_block[StructValue([(u’code’, u’gcloud compute networks subnets create proxy \rn–purpose=REGIONAL_MANAGED_PROXY \rn–role=ACTIVE \rn–region $GOOGLE_CLOUD_REGION –project $GOOGLE_CLOUD_PROJECT_ID \rn–network=default \rn–range=10.103.0.0/23′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6a8ce6d0>)])]

We need aGKE cluster withGateway API,Horizontal Pod Autoscaling andWorkload Identity enabled, with a newer GKE version higher or equal to 1.24, so let’s create that one next:

code_block[StructValue([(u’code’, u’gcloud container clusters create “example-cluster” \rn–region $GOOGLE_CLOUD_REGION \rn–project $GOOGLE_CLOUD_PROJECT_ID \rn–cluster-version “1.24.5-gke.600” \rn–machine-type “e2-medium” \rn–num-nodes “1” \rn–max-pods-per-node “30” \rn–enable-autoscaling \rn–min-nodes “0” \rn–max-nodes “3” \rn–enable-managed-prometheus \rn–workload-pool “$GOOGLE_CLOUD_PROJECT_ID.svc.id.goog” \rn–enable-shielded-nodes \rn–gateway-api=standard \rn–enable-ip-alias’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d7992d150>)])]

After the creation is complete, we connect our local machine to the cluster:

code_block[StructValue([(u’code’, u’gcloud container clusters get-credentials example-cluster \rn –region $GOOGLE_CLOUD_REGION –project $GOOGLE_CLOUD_PROJECT_ID’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d7992d790>)])]

Lastly, we are going to need an example app. I created a small example app in Golang that you can check out.

code_block[StructValue([(u’code’, u’git clone https://github.com/cgrotz/blog-examples.gitrncd ./blog-examples/cd-flagger-gateway-api’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6a57aa10>)])]

Now everything is ready.

Deploy Google Managed Prometheus (GMP) Query Interface in the cluster

Flagger is using telemetry data to get the status of the deployment. The example Golang application we are deploying for this demo provides Prometheus metrics. We already enabled managed collection on the cluster, so the metrics from the app should be available in Cloud Monitoring already. For this demo, we use a GMP query interface inside the cluster so that Flagger can check the deployment health. Flagger can also query Google Cloud Operations directly but we found it easier to calculate success rates with PromQL.

code_block[StructValue([(u’code’, u’kubectl create namespace prodrnkubectl create namespace devrnrnkubectl create serviceaccount gmp -n prodrnrngcloud iam service-accounts create gmp-sa –project=$GOOGLE_CLOUD_PROJECT_IDrnrngcloud iam service-accounts add-iam-policy-binding gmp-sa@$GOOGLE_CLOUD_PROJECT_ID.iam.gserviceaccount.com \rn–role roles/iam.workloadIdentityUser –project=$GOOGLE_CLOUD_PROJECT_ID \rn–member “serviceAccount:$GOOGLE_CLOUD_PROJECT_ID.svc.id.goog[prod/gmp]”rnrngcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT_ID \rn–member=serviceAccount:gmp-sa@$GOOGLE_CLOUD_PROJECT_ID.iam.gserviceaccount.com \rn–role=roles/monitoring.viewerrnrnkubectl annotate serviceaccount gmp \rn–namespace prod \rniam.gke.io/gcp-service-account=gmp-sa@$GOOGLE_CLOUD_PROJECT_ID.iam.gserviceaccount.comrnrnsed -i “s/GOOGLE_CLOUD_PROJECT_ID/$GOOGLE_CLOUD_PROJECT_ID/g” gmp-frontend.yamlrnrnkubectl apply -n prod -f gmp-frontend.yaml’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6bee7990>)])]

Install Flagger in the Kubernetes cluster

In order to start with canary deployments, we need to install Flagger with Gateway API enabled in our cluster. You can simply do that by running:

code_block[StructValue([(u’code’, u’kubectl apply -k github.com/fluxcd/flagger//kustomize/gatewayapi’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d8000c910>)])]

It will install the flagger components and CRDs into the flagger-system namespace.

Bootstrap the environment

Next we will bootstrap the environment with:

a K8S gateway for dev and prod (Using an Internal L7 LB)
a Metric Template for querying the Success Rate (we are going to take a look at this later)
and a Canary Release object for Flagger (will be explained further down as well)

code_block[StructValue([(u’code’, u’kubectl apply -f bootstrap.yaml’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6b0a5c10>)])]

We need to fetch the IP address of the gateway for DNS setup (it might take a few minutes for it to show up):

code_block[StructValue([(u’code’, u’kubectl get gateways.gateway.networking.k8s.io app -n dev \rn -o=jsonpath=”{.status.addresses[0].value}”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d7992dd90>)])]

If you like you can now go ahead and try to deploy directly from Skaffold:

code_block[StructValue([(u’code’, u’skaffold run’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d8000ca50>)])]

You can now call the service from a VM inside the same VPC:

code_block[StructValue([(u’code’, u’curl -H “Host: app.dev.example.com” http://<DEV_IP>’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d7992d710>)])]

The response should show “Hello World!”. It might take a couple of seconds for the state to reconcile and the backend to become healthy:

Create a Cloud Deploy Pipeline

First, set permissions for Cloud Deploy and apply the pipeline. This example uses a simplified IAM configuration using the default compute service account to reduce complexity. To improve security you should use a custom service account when you set this up for production usage.

code_block[StructValue([(u’code’, u’gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT_ID \rn–member=serviceAccount:$(gcloud projects describe $GOOGLE_CLOUD_PROJECT_ID \rn–format=”value(projectNumber)”)-compute@developer.gserviceaccount.com \rn–role=”roles/clouddeploy.jobRunner”rnrngcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT_ID \rn–member=serviceAccount:$(gcloud projects describe $GOOGLE_CLOUD_PROJECT_ID \rn–format=”value(projectNumber)”)-compute@developer.gserviceaccount.com \rn–role=”roles/container.developer”rnrnsed -i “s/GOOGLE_CLOUD_PROJECT_ID/$GOOGLE_CLOUD_PROJECT_ID/g” clouddeploy.yamlrnrnsed -i “s/GOOGLE_CLOUD_REGION/$GOOGLE_CLOUD_REGION/g” clouddeploy.yamlrnrngcloud deploy apply –file clouddeploy.yaml \rn –region=$GOOGLE_CLOUD_REGION –project=$GOOGLE_CLOUD_PROJECT_ID’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6a8baed0>)])]

Next, create a new release for deployment to prod with Cloud Deploy:

code_block[StructValue([(u’code’, u’skaffold build -p prodrngcloud deploy releases create release-001 \rn –project=$GOOGLE_CLOUD_PROJECT_ID –region=$GOOGLE_CLOUD_REGION \rn –delivery-pipeline=canary \rn –images=skaffold-kustomize=$(skaffold build -q | jq -r “.builds[].tag”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6a8bae90>)])]

Next, we promote the release to prod, this step can also take some time to complete:

code_block[StructValue([(u’code’, u’gcloud deploy releases promote –release=release-001 \rn –project=$GOOGLE_CLOUD_PROJECT_ID –region=$GOOGLE_CLOUD_REGION \rn –delivery-pipeline=canary –to-target=prod’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6a8ce5d0>)])]

Let’s fetch the IP for the prod gateway:

code_block[StructValue([(u’code’, u’kubectl get gateways.gateway.networking.k8s.io app -n prod \rn -o=jsonpath=u201d{.status.addresses[0].value}u201d’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d807a8a90>)])]

And curl the prod gateway from a VM inside the clusters VPC:

code_block[StructValue([(u’code’, u”curl -H ‘Host: app.prod.example.com’ http://<PROD_IP>”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d802ddf90>)])]

Once the deployment is finished you should see “Hello World!” again. Since there wasn’t a version of the prod deployment already running, it “skipped” the canary step.

Canary Deployment

So let’s try the canary functionality. Let’s make a small change in the “app/main.go” file. For example, let’s add your name to the output string in line 27 and then deploy the new version directly to prod skipping the dev stage (which you shouldn’t do in a real production scenario of course).

code_block[StructValue([(u’code’, u’skaffold build -p prodrngcloud deploy releases create release-002 \rn –project=$GOOGLE_CLOUD_PROJECT_ID –region=$GOOGLE_CLOUD_REGION \rn –delivery-pipeline=canary –to-target=prod \rn –images=skaffold-kustomize=$(skaffold build -q | jq -r “.builds[].tag”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d7af3fcd0>)])]

You can observe the canary process using:

code_block[StructValue([(u’code’, u’kubectl -n prod describe canary/app’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d78365d10>)])]

Now when you curl the prod gateway again, you should be able to see a mixture of messages, with the ratio shifting on how far the release has already progressed.

You can also check the traffic split directly on the GCLB Url Map:

code_block[StructValue([(u’code’, u’# First fetch the url-map name (since the name is generated), we need the part after the last ‘/’rnkubectl get gateways.gateway.networking.k8s.io app -n prod \rn-o=jsonpath=”{.metadata.annotations.networking\.gke\.io/url-maps}”rn# thenrngcloud compute url-maps export <URL_MAP> \rn–region=$GOOGLE_CLOUD_REGION –project=$GOOGLE_CLOUD_PROJECT_ID’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6bc29190>)])]

The result should look something like this:

The application also contains a failed endpoint that will cause a 500 – Internal Server Error response. How about you make another small change to the app/main.go to trigger a new deployment and observe how Flagger will stop the rollout of the new version due to lower request success rate?

How it works

Inside the bootstrap.yaml, we defined a simple PromQL query for the success rate of the app; requests without a 200 status code are taken as failed. In the Flagger canary object we define a target success rate of 60%

code_block[StructValue([(u’code’, u’1 – (sum(rn rate(rn promhttp_metric_handler_requests_total{rn namespace=”{{ namespace }}”,rn pod=~”{{ target }}-[0-9a-zA-Z]+(-[0-9a-zA-Z]+)”,rn code!=”200″rn }[{{ interval }}]rn )rn )rn /rn sum(rn rate(rn promhttp_metric_handler_requests_total{rn namespace=”{{ namespace }}”,rn pod=~”{{ target }}-[0-9a-zA-Z]+(-[0-9a-zA-Z]+)”rn }[{{ interval }}]rn )rn ))’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d6bc29a50>)])]

We also defined a Flagger Canary object where this success rate query is referenced. The canary object observes new app deployments, and intercepts the routing configuration to gradually shift the traffic to the new version instead of shift the whole traffic all at once:

code_block[StructValue([(u’code’, u’apiVersion: flagger.app/v1beta1rnkind: Canaryrnmetadata:rn name: apprn namespace: prodrnspec:rn # deployment referencern targetRef:rn apiVersion: apps/v1rn kind: Deploymentrn name: app-prodrn # the maximum time in seconds for the canary deploymentrn # to make progress before it is rolled back (default 600s)rn progressDeadlineSeconds: 60rn service:rn # service port numberrn port: 8080rn # container port number or name (optional)rn targetPort: 8080rn # Gateway API HTTPRoute host namesrn hosts:rn – app.prod.example.comrn # Reference to the Gateway that the generated HTTPRoute would attach to.rn gatewayRefs:rn – name: apprn namespace: prodrn analysis:rn # schedule interval (default 60s)rn interval: 60srn # max number of failed metric checks before rollbackrn threshold: 5rn # max traffic percentage routed to canaryrn # percentage (0-100)rn maxWeight: 50rn # canary increment steprn # percentage (0-100)rn stepWeight: 10rn metrics:rn – name: success-ratern templateRef:rn name: success-ratern namespace: prodrn thresholdRange:rn min: 0.9rn interval: 1m’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2d7a3c9550>)])]

So that’s it, now you have a nice little canary Continuous Delivery pipeline up and running on Google Cloud with the new Gateway API and Google Cloud Deploy.

Next Steps

If you would like to learn more about CI/CD on Google Cloud I would recommend the following articles:

Building a secure CI/CD pipeline using Google Cloud built-in services contains a great end-to-end example for CI/CD on Google CloudIntroducing Software Delivery Shield for end-to-end software supply chain security shows how Google helps you protect your software deliveryThe evolution of Kubernetes networking with the GKE Gateway controller gives an overview of GKE Gateway API and its capabilities.

Read More for the details.

2023 01 19

GCP – [Infographic] Navigating secure digital transformation in financial services

Cloud, Google Cloud gcp

Adopting cloud computing technologies and services presents financial services institutions with opportunities to address many forms of security risks in new, innovative, and more effective ways.

However, firms often lack the tools required to map out their digital transformation journey in the context of security and risk governance.

That’s why it is important for Chief Information Security Officers, Chief Risk Officers, Chief Compliance Officers, Heads of Internal Audit, and their teams to have a cloud security transformation roadmap.

We suggest that the following principles, adopted in four core stages, should be your guide and reference when navigating the journey.

A successful digital transformation requires the orchestration of organizational, cultural, technical, and procedural changes and a risk and control approach that matures as you go.

Once there is an establishedoverall transformation program oversight path, ensure technology, operational, and security risk governance acts as a “check and balance” to the overall program.

aside_block[StructValue([(u’title’, u’CASE STUDY’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ec8d434a150>), (u’btn_text’, u”), (u’href’, u”), (u’image’, None)])]

aside_block[StructValue([(u’title’, u’CASE STUDIES’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ec8f484fe10>), (u’btn_text’, u”), (u’href’, u”), (u’image’, None)])]

aside_block[StructValue([(u’title’, u’CASE STUDY’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ec8c7bc3850>), (u’btn_text’, u”), (u’href’, u”), (u’image’, None)])]

Get the help you need at every stage of the journey

Google Cloud provides a secure-by-design foundation — core infrastructure that is designed, built, and operated with security in mind. Google Cloud can guide CISOs and their organizational stakeholders through the entire cycle of security transformation to adapt to an ever-evolving threat landscape.

GCP – Self-service analytics finally gets real for the Connected Sheet

Cloud, Google Cloud gcp

Today’s workplace is getting more and more data intensive. This is why, even in a modern data stack world full of incredible business intelligence tools, “good ol’ reliable” spreadsheets are still commonly used to wrangle data problems big and small.

Data is everywhere, so why aren’t insights?

While companies recognize the value data can provide for their business, tapping into it isn’t always easy. Spreadsheets are a fantastic tool for users to start turning raw data into something meaningful, which is why knowing how to use spreadsheets for data analysis remains critically important for users across industries — it’s the top use case cited by 63% of users when it comes to their sheets. And yet, more than half of users say they still struggle to make sense of their data in sheets. For every power user or data analyst who knows how to make the most of spreadsheets, there are countless business users, each with their own specific questions.

The increasing demands on today’s knowledge workers, the new decision makers, are only exacerbating these challenges. Not only do you need the skills to analyze and act on data quickly, but you need live access to this data so you’re making decisions based on up-to-date information in the moment, not what was happening earlier today, yesterday, or last week.

With the industry adopting cloud data warehouses and the rise of the data cloud, we are witnessing the rise of the connected sheet. Google Cloud BigQuery makes it easy for any information worker to tap into Google’s data cloud, leveraging the ease and use of a Connected Sheets experience.

But let’s face it, spreadsheet users come in all shapes and sizes. Not everyone has the skills of an analytics engineer. You’ve got everyone from spreadsheet experts who can perform VLOOKUPS in the blink of an eye, to the humble beginner just mastering how to write a simple SUM() formula. The truth is spreadsheets are everywhere, but aren’t for everyone. Simply bringing them to the cloud isn’t going to fix that.

As one of our ThoughtSpot customers said, “Sheets and Excel are pretty powerful, but not easy for everyone to use. It’s one thing to get the data into sheets, but once it’s there it’s difficult to drill, ask ad-hoc questions, or simply build charts and visualizations.”

Simply put: Spreadsheets can be hard. Creating a chart or data visualization from spreadsheets can be even more challenging. Drilling into spreadsheet data can be complicated, even for power users.

But it doesn’t have to be. We teamed up with Google Cloud to invent something new to bring the power of Connected Sheets to everyone. Introducing ThoughtSpot for Google Sheets.

Unleashing insights from data in sheets

ThoughtSpot for Sheets is a standalone native app plugin for Google Sheets that brings a true self-service analytics experience to your sheets data. With ThoughtSpot for Sheets, creating charts and visualizations is as simple as searching. All you need is some data in Sheets and a curious mind.

Let’s take a look.

Bringing data to Connected Sheets

There’s a plethora of ways to get data into Sheets. Many savvy users are turning to popular data integration tools like Supermetrics and Coefficient.io to connect live data from their business apps to Google Sheets for ad-hoc analysis. Other people like a quick and dirty copy and paste. However, with so much business data moving to data clouds, you can now query the trove of customer data in BigQuery through Sheets.

Getting started

ThoughtSpot for Sheets is simple to launch. Do a quick search on Google Workspace Marketplace or find it here. A simple click and you’ve installed the app. Depending on your organization, you might have to send a brief friendly message to authorize the plugin from your Google Workspace Admin.

With ThoughtSpot installed, you’re ready to connect to some data — just go to the Data tab and select “Connect to BigQuery”. Now you can start to explore tables in your business data set or connect to some of the public data sets available from Google Cloud’s Analytics Hub.

Once your Sheet is connected, the fun really begins.

No data modeling required. That’s right. No star schemas or snowflake schemas. Just insights.

With one click, you’ll be able to leap over the biggest hurdle people face with data. Simply select ThoughtSpot from your Extensions menu, and the app handles the rest. Depending on the data size, in just a few moments the app automatically reads, understands, and categorizes your data into different types (DATES, MEASURES, ATTRIBUTES) all in a easy to use modal that sits right on top of your Sheet for easy exploration.

Create content with search

Once data is in Sheets, ThoughtSpot makes it easy to ask questions of your data. Using the search bar or selecting from the side panel, you can start to create new visuals automatically. The user experience was designed for the data explorer, and the data explorer in all of us just waiting to be unleashed. The search engine acts as a powerful compiler and gives you access to unlimited combinations of searches to uncover opportunities hidden in your Sheets. With search you can call up endless combinations of measures, attributes, and even filters in your data, so no stone is left unturned.

We even built in powerful keywords to take your analysis further, like “top” which generates the top n items from a sorted result. If your search has more than one measure, ThoughtSpot generates the top n items from the first measure in the search.

With search you can even travel through time with ease. The search engine lets you easily aggregate your data daily, weekly, monthly or get even more granular with your time slices like [last 2 weeks], [last 35 days], etc.

Drill anywhere

True self-service BI should be limitless, without making drilling into the data difficult. So we put the data search engine to work again to let you drill anywhere in your data with a single click. Say “so long” to VLOOKUPS.

Don’t worry, you won’t get lost. We built in easy internal navigation that lets you easily undo/redo your query so you never lose track of that aha-moment.

Data storytelling with Slides & Sheets

For almost every user we surveyed, making sensible charts and visualizations from their spreadsheet data is a requirement, especially for data storytelling, yet more than half say doing so is challenging. In addition to being a valuable companion for Google Sheets, the app also integrates with Google Slides to make sharing, collaboration, and presenting data a breeze. When you are happy with the chart or visualization you created, you can Pin to Slides which then lets you create a new presentation or add to an existing deck.

Putting ThoughtSpot for Sheets to work

So in theory, this all sounds great. But it’s even more exciting when applied to the real world.

Sales

Whether Salesforce, Gong, Outreach, Qualified, or Drift, there are so many data sources to help give sales professionals the edge.

With ThoughtSpot for Sheets, you have easy to use sales analytics to manage pipeline health, go deeper with prospecting, or make sure your next QBR with your customer or manager is grounded in actionable data.

Marketing

Google Sheets has become the workhorse of any marketer. Data from Google Ads, web analytics, HubSpot, and advertising spend all find their way into Sheets, begging for marketing analytics so users can understand customer and user trends, optimize programs, and uncover the next winning campaign.

Research

Universities all over the world leverage Google Workspace as their productivity suite of choice. Students and researchers now have access to a powerful self-service experience, conveniently right in the Google Sheets they already use.

Explore public datasets available from BigQuery and Analytics Hub from crypto and census data to geopolitical ad spend to boost data fluency and research projects.

Spreadsheets 2.0 and the dawn of the Connected Sheet is upon us. Time to get more with your new best friend ThoughtSpot. Install the app free, and let us know what you think.

Happy data exploring.

Read More for the details.

2023 01 18

GCP – How to deploy Tink for BigQuery encryption on-prem and in the cloud

Cloud, Google Cloud gcp

Data security is a key focus for organizations moving their data warehouses from on-premises to cloud-first systems, such as BigQuery. In addition to storage-level encryption, whether using Google-managed or customer-managed keys, BigQuery also provides column-level encryption. Using BigQuery’s SQL AEAD functions, organizations can enforce a more granular level of encryption to help protect sensitive customer data, such as government identity or credit card numbers, and help comply with security requirements.

While BigQuery provides column-level encryption in the cloud, many organizations operate in hybrid-cloud environments. To prevent a scenario where data needs to be decrypted and re-encrypted each time it moves between locations, Google Cloud offers a consistent and interoperable encryption mechanism. This enables deterministically-encrypted data (which maintains referential integrity) to be immediately joined with on-prem tables for anonymized analytics.

To achieve a BigQuery-compatible encryption on-prem, customers can use Tink, a Google-developed open-source cryptography library. BigQuery uses Tink to implement its SQL AEAD functions. We can use the Tink library directly to encrypt data on-prem in a way that can later be decrypted using BigQuery SQL in the cloud, and decrypt BigQuery’s column level-encrypted data outside of BigQuery.

For our customers who want to use Tink with BigQuery, we have put together a few helpful Python utilities and samples in the BigQuery Tink Toolkit GitHub repo. Let’s first walk through an example of how to use Tink directly to encrypt or decrypt on-prem data using the same keyset used for BigQuery, followed by how the BigQuery Tink Toolkit can help simplify working with Tink.

To start, we need to retrieve the Tink keyset. We’ll assume that KMS-wrapped keysets are being used. These keysets need to be stored in BigQuery to use with BigQuery SQL. If needed, they can also be replicated to a secondary store on-prem.

code_block[StructValue([(u’code’, u’from google.cloud import bigquery, kmsrnrnbq_client = bigquery.Client()rnquery_job = bq_client.query(“””SELECT kms_resource_path, wrapped_keyset, associated_data FROM `my-keysets-table` WHERE column_name = “my-pii-column”;”””)rnresult_row = query_job.result()’), (u’language’, u’lang-py’), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaab3596410>)])]

Now that we have the encrypted keyset, we need to unwrap it to retrieve the usable Tink keyset. If Cloud KMS is not accessible from on-prem, the unwrapped keyset will need to be maintained in a secure keystore on-prem.

code_block[StructValue([(u’code’, u’kms_client = kms.KeyManagementServiceClient()rndecrypted_keyset_obj = kms_client.decrypt(rn {rn “name”: result_row.kms_resource_path.split(“gcp-kms://”)[1],rn “ciphertext”: result_row.wrapped_keyset,rn }rn)rnkeyset = decrypted_keyset_obj.plaintext’), (u’language’, u’lang-py’), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaab2c9c950>)])]

We can now use the keyset to generate a Tink primitive. This can be used to encrypt or decrypt data with the associated keyset. Note that different primitives should be used depending on whether the keyset is for a deterministic or nondeterministic key.

code_block[StructValue([(u’code’, u’import tinkrnfrom tink import aead, cleartext_keyset_handle, daeadrnrnbinary_keyset_reader = tink.BinaryKeysetReader(keyset)rnkeyset_handle = cleartext_keyset_handle.read(binary_keyset_reader)rnrn# If using a determinisitic keyset:rndaead.register()rncipher = keyset_handle.primitive(daead.DeterministicAead)rnrn# If using a nondeterministic keyset instead:rnaead.register()rncipher = keyset_handle.primitive(aead.Aead)’), (u’language’, u’lang-py’), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaab36bea90>)])]

Once we have our cipher, we can use it to encrypt or decrypt data as needed.

code_block[StructValue([(u’code’, u’plaintext = “Hello world!”rnassociated_data = result_row.associated_datarn# To encrpyt:rn# If using a determinisitic keysetrnciphertext = cipher.encrypt_deterministically(rn plaintext.encode(), associated_data.encode()rn )rnrn# If using a nondeterministic keyset insteadrnciphertext = cipher.encrypt(plaintext.encode(), associated_data.encode())rnrnrn# To decrypt:rn# If using a determinisitic keysetrnplaintext = cipher.decrypt_deterministically(rn ciphertext, associated_data.encode()rn )rnrn# If using a nondeterministic keyset insteadrnplaintext = cipher.decrypt(rn ciphertext, associated_data.encode()rn )’), (u’language’, u’lang-py’), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaac119d4d0>)])]

We have provided the CipherManager class to help simplify this process, which handles four actions:

Retrieving the required keysets from a BigQuery table

Unwrapping those keysets

Creating a Tink cipher for each column

Providing a consistent interface to call encrypt and decrypt.

We have also included a sample Spark job that shows how to use CipherManager to encrypt or decrypt columns for a given table. We hope these come in handy – happy Tinkering.

Read More for the details.

2023 01 18

GCP – Using IPv6 Unique Local Addresses for private connectivity in Google Cloud

Cloud, Google Cloud gcp

Editor’s note: Google Cloud supports a wide range of IPv6 capabilities. For an overview, check out this blog.

When people talk about the IPv4 Address exhaustion problem, it’s usually in the context of the public IPv4 space. When IPv6 was introduced, the primary goal was to fix the address exhaustion problem by allocating internet-accessible addresses to every device. However, IP exhaustion issues in large enterprises are often related to the private RFC1918 address space. Large enterprises use the RFC1918 address space for their internal networks. And lately, technological advancements such as 5G, internet of things, mobile applications, connected devices, serverless and container-based services have expedited the exhaustion of the IPv4 private address space.

One particularly interesting IPv6 capability you’ll find in Google Cloud is support for ULA (Unique Local Addresses), an IPv6 address space defined in RFC4193 that is analogous to the private IP space used in IPv4 defined in RFC 1918.

Even though the IPv6 address space is gigantic compared to the IPv4 address space, enterprises require a private network that is not exposed to threats that you might find on the public internet. Like IPv4, IPv6 retains the notion of private addressing as a separate private address space that will not conflict with the public address space. IPv6 ULA addresses are routable within the scope of private networks, but not publicly routable on the global IPv6 internet, thus providing isolation for private workloads from the internet and other cloud customers. Further, you can allocate and use these addresses without arbitration by a central registration authority.

Google Cloud allows you to create internal IPv6 ULA subnets for private communication within a VPC. Any workloads with IPv6 ULA addresses allocated from these subnets are meant to be used for private communication within your network. Any workload that is meant to be publicly available can leverage external IPv6 with GUA (Globally Unique Addresses). Additionally, multi-nic VM instances may be dual-homed with both ULA (internal) and GUA (external) addresses.

Using ULA addressing for private communication in Google Cloud provides the following advantages over the traditional IPv4 RFC 1918 address space.

1. VPCs with billions and billions of IPv6 addresses

A /48 ULA range is allocated to each VPC and each subnet in the VPC will be allocated a /64 IPv6 address range from this /48 ULA range. Each VPC with a /48 ULA range can accommodate 65,536 /64 subnets. A dual-stack subnet is assigned both IPv4 and IPv6 address ranges. Each /64 subnet can accommodate 4 billion unique VM interfaces. Each IPv6-enabled VM will be assigned a /96 address range from the subnet, which provides you with 4 billion unique IPv6 addresses for each VM interface.

2. Non-overlapping private IPv6 address space

When you create a ULA-enabled VPC, you are assigned a /48 range for your VPC, which can be used across all Google Cloud regions. You can use this aggregate range to simplify specification of ACLs, firewalls, and access controls in your on-prem or cross-cloud deployments.

Unlike RFC1918 addresses, which are meant to be reused across multiple networks, ULA addresses are meant to be unique. RFC4193 includes a description for a pseudo-random IP generator to help avoid overlaps. Google Cloud assigns all ULA addresses from the fd20::/20 range and ensures that each VPC network is assigned a unique /48 ULA prefix to avoid overlaps with other VPCs. This is beneficial when using VPC peering, which requires non-overlapping IPv6 address ranges for subnets in the peering VPCs. Ensuring uniqueness also eliminates the need for NAT to communicate between private networks. Google Cloud provides you the flexibility to choose a ULA range for your VPC that does not overlap with your on-prem/cross-cloud ULA ranges.

Support for IPv6 ULA addressing opens up a world of possibilities for enterprises with large, complex systems, like those based on containers and microservices. To learn more about how to get started with IPv6 in Google Cloud, check out the documentation. We can’t wait for all the interesting things you will build using IPv6 ULA address space.

Read More for the details.

2023 01 18

GCP – A reference architecture for transforming insurance claims processing with Google Cloud

Cloud, Google Cloud gcp

Transforming the customer experience of insurance claims processing is a key driver for transforming the insurance claims process itself. This needs a holistic approach to addressing different aspects of the claims processing workflow and modernizing the data that claims processing heavily relies on. Transforming claims processing can help reduce the manual touches from the process and lower processing time and costs, resulting in reduced claims’ payouts and higher policyholder satisfaction.

Studies by Verisk and Coalition Against Insurance Fraud show that auto insurers lose 14% of premiums a year due to claims leakage (about $29B). The annual cost of insurance fraud was as high as $60B and fraud occurs in about 10% of property-casualty insurance losses. Another study by PwC shows that 41% of customers are likely or more likely to switch providers due to a lack of digital capabilities. Inefficiencies in claims processing have a direct impact on the business and customer experience.

We describe how Google Cloud technologies working in unison through implementing the defined reference architecture can help organizations transform their Insurance claims processing to reap measurable business benefits and enhance the policyholder experience.

Introducing the Google Cloud insurance claims reference architecture

Google Cloud’s reference architecture for transforming insurance claims processing is available through the Google Cloud architecture diagramming tool (you can find it under Reference Architectures – Financial Services – Insurance Claims).

The reference architecture uses Google Cloud’s AI and data analytics platforms together with other Google Cloud technologies and supports the following use cases:

Claims submission / FNOL: Report claims online within minutes and enabling policyholders to upload claims documents at scale to the insurer’s app receiving the FNOL.

Claims assignment and analysis: Classify submitted documents, analyze images, perform claims segmentation.

Claims decision, repair analysis: Self-service damage assessment, real-time claims status, automated or quick settlement.

Claims leakage analytics: Analyze and report on claims data to assess and identify potential and actual claims leak.

Reporting and insights: Fraud monitoring, real-time insights into claims workflows, use claims insights in marketing and sales.

How to use the reference architecture

The reference architecture provides a framework for using Google Cloud to transform the insurance claims process holistically. The architecture can be implemented as-is in its entirety or addressed one function at a time.

Every insurance provider is in a different stage of their claims transformation journey, with a differing appetite to risk and unique budgetary needs. That’s why the reference architecture can be adopted in one of the following modes – Tactical (start small with prioritized functions), Strategic (identify multiple functional concerns and address them together for a measurable impact), Transformational (go big).

What can you achieve with the reference architecture?

The Google Cloud insurance claims reference architecture provides the following features:

Claims submission

Consume FNOL submission via any channel – forms, paper PDFs, scanned images.

Ingest the documents at scale for any form of claims data submission.

Perform inline data transformation to store data for near and long term.

Claims analysis

Advanced analytics to intelligently accelerate claims processing.

Parse form fields and tables from ingested documents.

Extract text from PDFs and images.

Use AI to classify claims documents, perform claims segmentation and run fraud detection models.

Asses damage in real time in claims scenarios requiring assessment of damage to property or vehicles.

Claims decisions

Make a decision on claims either automatically or significantly faster than today.

Validate claims, perform adverse selection modeling scenarios.

Estimate the cost of repair in damage assessment scenarios.

Predict payments and automatically settle claims in case of STP (straight through processing).

In case of non-STP, provide all necessary intelligence to associates to make a quick decision.

Claims leakage analytics

Analyze and report on claims data to assess and identify potential and actual claims leak.

Create models and visualizations to gain insights into several parameters like average settlement cost trends, number of reserve changes per claim, average time of first contact of claimant, subrogation recovery trends and more such analytics.

Reporting and data leverage

Store and unify content at scale, so it’s available when needed where needed and to whoever needs it.

Archive content for regulatory compliance at a fraction of the cost of primary storage.

Unify select claims insights and share them via APIs with internal teams, such as marketing or sales for more personalized offerings.

Transforming the claims processing workflow can provide multiple benefits, for example:

Faster multi-channel claims submission process and responsiveness can greatly increase customer satisfaction.

Data analytics and AI capabilities can create efficiencies that reduce the costs and increase the speed of claims adjustment.

Effective claims validation and management validation can improve the accuracy of claims’ payments.

Real-time insights can help cross-sell and up-sell insurance products based on policy holders’ needs.

Why transform insurance with Google Cloud

Further to the reference architecture we presented here, Google Cloud is uniquely positioned to transform an insurance business holistically. Here are a few technology enablers from Google Cloud that can aid in such a holistic transformation.

Google BigLake allows you to unify data from claims, underwriting and other insurance functions at scale, so it’s available to all lines of business including marketing and contact center.

Holistic data ecosystem that lets you build a datamesh to leverage data across various business lines like claims, underwriting and marketing with BigQuery for analytics, DataPlex to build a datamesh and Vertex AI for advanced AI and MLOps.

Google Earth Engine, a geospatial processing service, helps perform geospatial processing at scale, powered by BigQuery for property risk exposure assessment and portfolio management. For claims adjusters and underwriters, it provides an interactive platform for geospatial driven analytics.

Public datasets support claims analysts and actuaries for several scenarios including dynamic insurance pricing model using this dataset.

Read More for the details.

2023 01 18

GCP – Log Analytics in Cloud Logging is now GA

Cloud, Google Cloud gcp

Solving big problems usually takes a combination of the right people and the right tools. SRE, DevOps, and IT operations teams in organizations both big and small have used Google Cloud’s built-in logging service, Cloud Logging, to troubleshoot faster, recognize trends easier, and scale operations more effectively. Additionally, customers have been building homegrown solutions that combine the powers of BigQuery and Cloud Logging to help them address operational and security challenges at massive scale. Last year we introduced Log Analytics, powered by BigQuery, so more customers can bring logs and advanced analysis together without having to build the connection themselves.

Today, we are announcing the general availability of Cloud Logging’s Log Analytics (powered by BigQuery), a capability that allows you to search, aggregate and transform all log data types including application, network and audit log data at no additional cost for existing Cloud Logging customers. We are also launching three new Log Analytics capabilities.

Multi-region support for US and EU region

Improved query experience to save and share queries

Support for custom retention up to 10 years

To get started, upgrade your existing log buckets to Log Analytics supported buckets.

Same logs, same cost, more value with Log Analytics

Log Analytics brings entirely new capabilities to search, aggregate, or transform logs at query time directly into Cloud Logging with a new user experience that’s optimized for analyzing logs data.

Centralized logging – By collecting and centrally storing the log data in a dedicated Log Bucket, it allows multiple stakeholders to manipulate their data from the same datasource. You don’t need to make duplicate copies of the data.

Reduced cost and complexity – Log Analytics allows reuse of data across the organization, effectively saving cost and reducing complexities.

Ad hoc log analysis – It allows for ad-hoc query-time log analysis without requiring complex pre-processing.

Scalable platform – Log Analytics can scale for observability using the serverless BQ platform and perform aggregation at petabyte scale efficiently

By leveraging BigQuery, Log Analytics breaks down data silos helping security, networking, developer and even business teams collaborate using a single copy of data.

New features in this release

1. Multi-region support for Log Analytics buckets

In addition to GA, we are also announcing multi-region support for Log Analytics with log buckets in the US and EU. These new multi-regions are available for log buckets that use Log Analytics and for those that don’t. This means that you can now store and analyze your logs in the region that is most convenient for you, improving performance and reducing latency.

2. Improved query experience

We are also improving the query experience by allowing users to save, share and re-use recent queries. This means that you can easily reuse and share your most important queries, saving time and making it easier to get the insights you need from your logs.

Log Analytics feature: Save & Share Query

3. Retain logs up to 10 years in a Log Analytics bucket

We are rolling out the ability to support custom log retention. You can now store logs in the Log Analytics supported bucket for beyond 30 days. Standard custom log retention pricingwill apply.

Get started today

Now that Log Analytics is Generally Available, you can upgrade your log buckets to use Log Analytics and know that it’s covered under the Cloud Logging SLA. Upgrade your log bucket today to start taking advantage of Log Analytics.

If this is the first time you’re hearing about Log Analytics, we’ve got you covered with some materials to get you up to speed. Take a look at our blog the top 10 reasons to get started with Log Analytics, watch a recent on-demand information session we did aimed at developers, and learn more about the overall challenges we’re helping you solve in this video: Streamline software development with better insights and diagnostics.

Read More for the details.

2023 01 18

GCP – Get migrating in 2023: join our upcoming fireside chat featuring Forrester

Cloud, Google Cloud gcp

Editor’s note: just looking to register for the fireside chat on January 31st? Here you go!

You probably already know that cloud migration continues to be a huge priority for organizations across the globe. But what may surprise you is all the ways priorities, projects, and outcomes have changed in the last few years. For example, in 2020, the thing that was most top of mind was identifying the best-fit workloads to run in the public cloud. But what about now?

To answer that question (and more) we’re hosting a free ‘fireside chat’ on January 31st at 9:00AM PST where our guest Bill Martorelli, Principal Analyst at Forrester, and Priyanka Vergadia, Staff Developer Advocate at Google Cloud will do a deeper dive on the findings of our “State of Public Cloud Migration” study for 2022.

During the chat, Bill and Priyanka will discuss key findings from this Google-commissioned Forrester Consulting study, which includes insights from 300+ decision makers and answers questions like:

Why are more global organizations accelerating their cloud migrations?

What factors do global organizations prioritize when selecting cloud vendors?

What are the IT and business benefits that global organizations gain after migration?

For instance, circling back to our original example, it turns out that identifying best-fit workloads to run in the public cloud is not the top priority anymore. In fact, many organizations have already moved past that phase of their migrations, with their top priorities now focusing on post-migration performance and cost optimization strategies.

It’s sure to be a great time, and a fantastic way to kick off your new year, especially if cloud migration is one of your New Year’s resolutions. Hope to see you there!

Read More for the details.

2023 01 17

GCP – Built with BigQuery: How to accelerate data-centric AI development with Google Cloud and Snorkel AI

Cloud, Google Cloud gcp

In Deloitte’s annual “State of AI in the Enterprise” survey, 94% of business leaders identified AI as critical to their organizations’ success over the next five years. That survey also uncovered a 29% increase in the number of organizations struggling to achieve meaningful AI-driven business outcomes. Part of this challenge lies in the ability to capitalize on existing data, in its various formats spread throughout the organization. For example, up to 80% of enterprise information assets are scattered across the organization in text, PDFs, emails, web pages, and other unstructured formats. This includes a wealth of valuable insights embedded within contracts, buried within patient files, recorded in chat transcripts, noted in EHR/CRM text fields, and present in other formats. This wealth of unstructured data is often untapped, as some business leaders may be unaware of the value or unsure how to leverage it.

Challenges: The need to put unstructured data to use more rapidly

Accessing data across various locations and file types and then operationalizing that data for AI usage is usually a cumbersome, manual, time-consuming, and costly process. Individually labeling files to build an adequate dataset to train a machine learning (ML) model is notoriously slow, while human errors and inconsistencies also tend to degrade data quality and negatively impact ML model performance.

Often, analyzing enterprise data requires the expertise of analysts, clinicians, lawyers or other domain specific experts. In highly-regulated industries such as financial services and healthcare, privacy regulations, standards, and other access restrictions make the challenges posed in using unstructured data proportionally higher.

Solution approach

Snorkel AI has teamed with Google Cloud to help organizations transform raw, unstructured data into a format that can be used to train actionable AI-powered models for insights and decision making. By combining Google Cloud services such as BigQuery and Vertex AI with Snorkel AI’s data-centric AI platform for programmatic data curation and preparation, organizations can accelerate AI development 10-100x [1]. Tapping into the value of unstructured data stored in BigQuery and making that data ready for ML training empowers enterprises to incorporate all types of data for training AI models.

Snorkel AI’s data-centric approach unlocks new ways of preparing ML training workloads

Snorkel AI addresses one of the biggest blockers to preparing data for AI development: the massive hand-labeled training datasets needed to prepare data for supervised training of ML models. Snorkel AI overcomes this bottleneck through using a programmatic labeling approach implemented in Snorkel Flow, a novel data-centric AI platform.

Leveraging business logic, and using foundation models as a means of generating labels, data science and ML teams can use Snorkel Flow’s labeling functions to programmatically label data using various sources, including previously-labeled datasets that may have been poorly labeled while encoding knowledge or heuristics from subject matter experts. Snorkel Flow can leverage these multiple data and knowledge sources to label large quantities of unstructured data at scale.

In addition to data scientists, other users in the ML lifecycle, such as ML engineers, can leverage Snorkel Flow to rapidly improve training data quality and model performance using integrated error analysis and model-guided feedback mechanisms to develop more accurate AI applications.

The data-centric AI workflow within Snorkel Flow operates as follows:

Data scientists, ML engineers, and subject matter experts programmatically label large amounts of data in minutes to hours by creating labeling functions.

Upon creating labeling functions, Snorkel Flow generates a probabilistic labeled dataset that is used to train a model within the platform.

Next, data scientists use guided error analysis to analyze the model’s performance deficits. They look for the gaps that facilitate creation of more targeted and relevant labeling assignments. In other words, data scientists and other users specifically work on places where the model is most wrong, or on particular high-value examples, or on commonly confused classes of data.

Next, users collaboratively iterate on these gaps with internal experts, refining or adding labeling functions as needed to label even more data with which they can again feed into the model for analysis.

Users repeat this iteration even after deploying a model and monitoring a slice of production data.

As a result of this loop, the metrics improvements in an AI application are often orders-of-magnitude greater than what can be achieved with model-centric AI and hand-labeled data.

Solution details

Unified access to data stored on Google Cloud

With training data curation and preparation unblocked via programmatic labeling of unstructured data, data scientists can harness the full power of Google’s end-to-end BigQuery ML and/or Vertex AI platforms to fast-track the development of analytics and AI applications. Google Cloud customers can easily deploy Snorkel Flow on their Google Cloud infrastructure using Google Kubernetes Engine (GKE), then consume unstructured, semi-structured or structured data from Google Cloud data services such as BigQuery and Google Cloud Storage (GCS). See the below figure for data sources and integrations.

BigQuery is a serverless, cost-effective, and cross-cloud analytics data warehouse built to address the needs of data-driven organizations. BigQuery breaks down silos across clouds, allowing enterprises to centralize all of their data – structured, semi-structured, and unstructured – in a single secure repository. BigQuery support for unstructured data management includes built-in capabilities to secure, govern, and share unstructured data.

Snorkel Flow + Google Cloud BigQuery

The Snorkel Flow platform integrates natively with BigQuery to streamline and simplify AI development:

With a few clicks, data scientists can immediately pull relevant data from BigQuery directly into Snorkel Flow using the integrated BigQuery connector.

Data can then be labeled programmatically using a data-centric AI workflow in Snorkel Flow to quickly generate high-quality training sets over complex, highly variable data. Snorkel Flow includes templates to classify and extract information from unstructured text, native PDFs, richly formatted documents, HTML data, conversational text, and more.

Newly labeled datasets can then be used to either train custom ML models or fine-tune pre-built models.

Labeled data can be loaded back into the BigQuery environment as structured data.

Real-world impact

Top U.S. banks, healthcare, insurance, and other Fortune 500 organizations have used Snorkel Flow to extract information from complex documents such as 10-K reports, clinical trial protocols, technical manuals, rent rolls, legal contracts, and more. One Fortune 500 telecom provider and long-time Google Cloud customer, for example, uses Snorkel Flow to classify encrypted network data flows into key application categories. Using Snorkel Flow’s comprehensive data exploration and error analysis tools, the telco successfully trained 200,000 labels in a matter of hours, achieving 25% better accuracy compared to an internal ground truth baseline.

Google and Snorkel AI have collaborated on a Snorkel research project for Google’s internal use. Google used early versions of Snorkel’s core technology to tackle data labeling for content, product, and event classification problems that were not amenable to manual labeling due to the rapid variations in the labels. Using Snorkel, Google condensed a six month process involving thousands of hand-labeled examples into just 30 minutes and built content classification models that achieved an average performance improvement of 52% [2, 3, 4].

Better together: Snorkel AI + Google Cloud

Together, Google Cloud and Snorkel AI enable Fortune 500 enterprises, federal agencies, and other AI innovators to operationalize unstructured data to build and and accelerate AI applications to solve their most critical challenges

To learn more, schedule a custom demo tailored to your use case with Snorkel AI ML experts or watch one of the below recent presentations:

Accelerate AI development by eliminating the pain of manual labeling, delivered by Snorkel AI co-founder Henry Ehernberg as part of a Google Cloud BigQuery Innovation event

Promises and Compromises of Responsible Generative AI Model Adoption in the Enterprise, delivered by Google Director, Cloud Partner Engineering, Dr. Ali Arsanjani at Snorkel’s Foundation Model Summit

[1] Snorkel AI documented customer results reflect 45x, 52%, 98% and similar improvements vs hand-labeling https://snorkel.ai/case-studies/
[2] Case study on Google’s use of Snorkel’s core technology: https://snorkel.ai/google-content-classification-models-case-study/
[3] Harnessing Organizational Knowledge for Machine Learning: https://ai.googleblog.com/2019/03/harnessing-organizational-knowledge-for.html
[4] Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale: https://arxiv.org/abs/1812.00417

Read More for the details.

2023 01 17

GCP – The search bar got a workout on Black Friday 2022

Cloud, Google Cloud gcp

Editor’s note: To kick off the new year and in preparation for NRF The Big Show, we invited partners from across our retail ecosystem to share stories, best practices, and tips and tricks on how they are helping retailers transform during a time that continues to see tremendous change. Lucidworks had previously published “4 lessons learned from 2022’s Cyber 5.” Please enjoy this updated entry from our partner.

Ecommerce retailers that got customers in and buying on Black Friday won an early piece of wallet share. According to Adobe Analytics, shoppers spent more than $9B dollars this Black Friday, a 2% increase from 2021.

For many retailers, Cyber Monday outperformed Black Friday sales. However, Lucidworks saw several of their retail customers buck the Cyber Monday lore and watched website activity peak on Black Friday. For one athletic apparel retailer, search query volume was 60% higher on Black Friday compared to Cyber Monday. Meanwhile, customers on other retail sites stuck to tradition and saw a 96% uptick from Black Friday to Cyber Monday.

Here are four trends Lucidworks, a Google Cloud partner and search solution provider for some of the world’s biggest ecommerce brands, saw from a top athletic apparel retailer as well as insights from Lucidworks leaders.

1. Mobile Wins (Again)

Nearly 75% of unique individual shopper sessions took place on mobile devices over the course of the Cyber 5, the five days from Thanksgiving to Cyber Monday. On Black Friday, 92% of all unique shopper sessions took place on a mobile device. However, on Cyber Monday, mobile shopping was only 66%. Conversion rates on mobile still lag those on desktop. Statista notes conversion rates were approximately 2.3 percent lower than desktop.

As one could imagine, travel and mobile commerce are strongly tied. As brands get better at connecting every channel, brand engagement and purchase behavior across devices increase. The increased connective tissue with the shopper, particularly through mobile apps, increases the accuracy of targeted offers and products as well.

2. Customers Love a Morning Browse on Black Friday

Search queries for one of Lucidworks’ apparel retail customers topped out at a cool 59 million on Black Friday, with the most searches happening from 10am-11am ET. The number of searches at the same hour on Cyber Monday were nearly three times lower. Unique shopper sessions also peaked on Black Friday, with Thanksgiving day in a close second. Black Friday sessions eclipsed Cyber Monday by about 35%. Search and discovery solutions like Lucidworks, hosted on Google Cloud, are able to handle the scale. Lucidworks leverages Google Kubernetes Engine (GKE) to help manage the Lucidworks search engine at scale through containerized deployments. This enables Lucidworks to develop, test, and release new features quickly and to isolate its workloads. Further, the accessibility, stability, and strong foundation of security provided by Google Cloud are essential during the peak shopping season.

“Promotions across the web, on email, social, and on the site can play a big role in how early shoppers start their browsing and buying,” says Jenny Gomez, Lucidworks VP of Marketing. “Many retailers opt to keep the promotions rolling from Thanksgiving Day through Cyber Monday. Facets, landing pages, and recommendations are powerful tools for the customer experience. But after this year’s Cyber 5, we saw that search still rules the roost. The core to a great digital experience starts with the search bar, the place where shoppers act with the most intent; better search, happier customers, bigger Cyber 5 conversion haul.”

The ability for retailers to accelerate relevance and reduce search abandonment will continue to be a priority. Recently, Lucidworks announced that they will integrate their Fusion technology with Google Cloud’s Retail Search solution to deliver an elevated online product discovery experience. When combined with Google Cloud Retail Search, Lucidworks Fusion orchestrates Google’s search and shopper intelligence trained on queries and events to enable instant sharper relevance across the shopper journey.

4. Virtual Shopping Carts Flooded the Site

The number of total customer carts peaked on Black Friday. There were six times the number of carts as compared to the Tuesday of that week. Most customers had somewhere between 1-5 items in the cart over the course of Cyber 5. Regardless of items per cart, the number of products across all carts peaked on Black Friday.

Increased cart rate and cart size relative to theincrease of traffic are good indications that shoppers are finding relevant items. Shopping carts are also frequently used by gifting shoppers as a tool to get confirmation on product availability and the final cost after total fees and discounts. Shopping cart sharing can also help purchase rates as it allows gifters to collaborate with others, removing the guesswork out of the purchasing as well as enabling group purchases.

5. Shoppers Relied on Recommendations

Shoppers relied on recommendations on product detail pages over the course of Cyber 5. Based on what customers were looking at, they were served relevant “You may also like” recommendations. The number of total clicks on recommended items peaked on Black Friday, with Thanksgiving in a close second. In addition, shoppers clicked on and then bought recommended items on Black Friday more than 55% more often than on Cyber Monday.

Applying a recommender strategy based on touchpoints and context is key. As new visitors may lack familiarity with a retailer’s catalog, recommenders for “you may also like”, “similar items”, and “complete the look” that are placed on a product detail page or the shopping cart page are particularly effective in driving purchases for these new shoppers. For known or repeat shoppers, personalized recommendations such as “may we suggest” or “just for you” based on the customer’s engagement—past and present—are a great way to drive additional purchases.

Cyber 2023

Is it already time to start preparing for Cyber 2023? Not until you have an understanding of how your website performed during this year’s holiday shopping season. Look through the data to understand the customer journey. Did they get stuck with dead-end searches? Were they clicking on recommendations on the product detail page? Did they journey from mobile to desktop and back? Once you can answer questions like these you can optimize everything from the search bar, to suggestions, to landing pages, and more. Now pat yourself on the back for making it through Cyber 5 2022.

Need a hand optimizing your site so shoppers can find what they need easily? Drop us a line and check out the Lucidworks data discovery and search solution available on the Google Cloud Marketplace.

Read More for the details.

2023 01 17

GCP – DORA’s implementation period starts now. What we’re doing to prepare for the new law

Cloud, Google Cloud gcp

Today is the start of the two year implementation period for the EU Digital Operational Resilience Act (DORA). Financial entities in the European Union (EU) and their critical ICT providers must be ready to comply with DORA by January 17, 2025. At Google Cloud, we firmly believe that DORA will be vital to accelerating digital innovation in the European financial services sector. We have been engaging with policymakers on DORA since September 2020. We are now excited to collaborate with customers and regulators to operationalize the new DORA requirements ahead of the deadline.

As we approach the 2025 deadline, we intend to continue to support our customers with new resources and updates to our Compliance Resource Center. The first of these resources is our new DORA Customer Guide, which contains helpful information about how our customers can navigate the DORA regulations.

What DORA does for the European financial sector

DORA standardizes how financial entities report cybersecurity incidents, test their digital operational resilience, and manage Information and Communications Technology (ICT) third-party risk across the financial services sector and EU member states. In addition to establishing clear expectations for the role of ICT providers, DORA will also allow EU financial regulators to directly oversee critical ICT providers. Where the criteria are met, this would apply to cloud service providers like Google Cloud.

How Google Cloud is preparing for DORA

Over the last two years, our team has been engaging with policymakers and regulators to understand their perspectives on how the new law could improve digital operational resilience in the European financial sector.

Now that DORA is finalized, a cross-functional team at Google Cloud (including subject matter experts from Risk and Compliance, Security, Legal, Government Affairs, and Product) is reviewing the details and preparing compliance plans where needed. These plans build upon our strong foundation in areas like security, resilience, and third-party risk management that already enable our EU financial services customers to address their rigorous regulatory expectations.

We plan to use the implementation period to further enhance our capabilities in each of the DORA focus areas, including:

Oversight: We’re preparing for potential designation as a critical ICT provider and the annual engagements that will follow, including oversight plans, inspections, and recommendations. We’re confident that this structured dialogue will help to improve risk management and resilience both for our customers and across the sector. We will approach a relationship with our lead overseer with the same commitment to ongoing transparency, collaboration, and assurance that we approach our customers and their regulators with today.

Incident reporting: We’re very focused on how we can support customers with the incident reporting requirements under DORA. In particular, we’re looking at ways that our industry-leading information security operation and sophisticated security monitoring tools and solutions could be even more helpful to customers. With the addition of Mandiant to the Google Cloud family, we now also offer proven global expertise in comprehensive incident response and technical assurance to help organizations mitigate threats and reduce business risk before, during and after an incident. We are excited about how these capabilities can help our customers with DORA compliance.

Digital operational resilience testing: We firmly believe that cyber resilience must be tested. If done well, activities like threat led penetration testing can be powerful tools. Given the clear benefits of pooled testing in the public cloud context, this is something we’re very interested in exploring. Our customers have had continued success with pooled audits of Google Cloud. This gives us confidence that a similar collaborative and scalable approach can enable robust and effective testing.

Third-party risk management: Google Cloud’s contracts for financial entities in the EU already address the contractual requirements in the EBA outsourcing guidelines, the EIOPA cloud outsourcing guidelines, the ESMA cloud outsourcing guidelines, and additional member state requirements. We recognize that DORA also contains requirements for contracts with ICT providers. We are reviewing these closely to understand how they may impact our contracts for financial entities in future.

Looking ahead

Like our customers, Google Cloud is thinking about the key DORA issues now. However, we understand that the details in these areas still need to be fully defined in forthcoming regulatory and implementing technical standards. We are committed to engaging in the discussions about these standards in the same transparent and constructive way that we participated in the DORA dialogue.

Our goal is to make Google Cloud the best possible service for sustainable, digital transformation for European organizations on their terms — and there is much more to come.

Read More for the details.

2023 01 17

GCP – Reliability and SRE in the 2022 State of DevOps Report

Cloud, Google Cloud gcp

When a software change is deployed — after being designed, coded, tested, packaged, and tested some more — a journey comes to an end. At the same time, a new journey begins: your customer’s relationship with your service. It’s here, in the domain of operations, that abstract risks like launch schedule slippage give way to tangible risks like lost revenue, degraded trust, and tarnished reputation. Only when it’s available to users can software contribute to (or threaten!) the success of your organization. And so, throughout the past several years, the DevOps Research and Assessment (DORA) project has incrementally deepened our research into the reliability of services, through and beyond deployment, into ongoing operation.

Reliability is a broadly defined term, which refers to a team’s ability to meet their users’ expectations — for software services, it may encompass aspects of availability, latency, correctness, or other characteristics that influence the consistency and quality of user experience. Google’s practice of Site Reliability Engineering (SRE), which has been embraced and extended by a global community of reliability engineering practitioners, is an approach to operations that prioritizes user-oriented measurement, shared responsibility, and collaborative, blameless learning. Starting with the 2021 Accelerate State of DevOps Report, we began asking survey respondents detailed questions about reliability engineering in their organizations. We continued and expanded our investigation in 2022, and found further evidence that modern reliability engineering is widespread: a majority of respondents report that they employ SRE-style practices. With this extensive body of data to draw from, this year we pushed further into analyses of the impact of reliability and its interaction with other dynamics present in our model of technology’s influence on organizational success.

Reliability matters

When reliability is poor, improvements to software delivery have no effect — or even a negative effect — on organizational outcomes

Reliability is more than beneficial: it’s essential. As in prior studies, we find that software delivery performance (as measured by the “four key metrics” of change lead time, deploy frequency, change failure rate, and failure recovery time) is predictive of organizational performance. However, this year’s analysis revealed a previously unseen nuance: the influence of software delivery on organizational performance is predicated on reliability. When reliability is high, high-performance software delivery predicts better outcomes for the organization. But when reliability is poor, improvements to software delivery have no effect — or even a negative effect — on organizational outcomes. This affirms a long-held belief among reliability engineers: “reliability is the most important feature of any system.” If a service or product doesn’t meet its users’ reliability expectations, it’s counter-productive to rapidly ship flashy new features, because users can’t properly experience them. Software delivery relies on a foundation of reliability to create value.

Reliability is a journey

Any experienced leader will tell you that progress is rarely linear: even with a discipline like SRE, widely practiced and with demonstrable benefits, the path to success is unlikely to follow a straight line. DORA describes the “J-Curve” of organizational transformation, a phenomenon in which durable success comes only after setbacks and lessons learned. This year, we compared the depth of teams’ reliability engineering practices to their impact on the services they provide: will an investment in SRE produce greater reliability? The answer is yes, but with a significant caveat: not at first. Comparing reliability outcomes across a range of levels of SRE adoption, the J-Curve is plainly visible. A team which practices SRE only lightly — at the beginning of their SRE journey, perhaps — is likely not only to not benefit, but to regress in terms of the reliability experienced by their users. However, after these practices have more deeply permeated, an inflection point is reached and we see strong reliability benefits from continuing to grow the reliability engineering capability.

Knowing that it will likely take time to realize the benefits of adopting SRE, it may be tempting to start the process as soon, and as broadly, as possible. But we offer a note of caution here: organization-wide cultural transformation initiatives typically fail from overreach. We studied this and reported findings in a previous report. And even if you manage to beat the odds and fully adopt SRE across multiple teams simultaneously, the cost may be unacceptable: the setbacks in reliability that you are likely to experience early on, amplified across an entire organization all at once, could have catastrophic consequences. Therefore the SRE principle of gradual change should also be applied to the adoption of SRE itself.

Reliability is about people

Reflecting back on over a decade of SRE practice and theory, the Enterprise Roadmap to SRE underlines the importance of culture, suggesting that Site Reliability Engineering is in fact emergent from culture. Tools and frameworks are important; language is essential. But only a trustful, psychologically safe culture can support the environment of continuous learning which enables SRE to manage today’s complex, dynamic technology environments. DORA’s research in 2022 demonstrates the interplay between culture and reliability: we found that “generative” culture, as defined by the Westrum model, is predictive of higher reliability outcomes. And reliability has benefits not only for a system’s users, but for its makers as well: teams whose services are highly reliable are 1.6 times less likely to suffer from burnout.

Got a story to share about your DevOps journey? Submit it to Google Cloud’s 2022 DevOps Awards by January 31, 2023!

Read More for the details.

2023 01 17

GCP – Run data science workloads without creating more data silos

Cloud, Google Cloud gcp

For organizations, it is important to build a data lake solution that offers flexible governance and the ability to break data silos while maintaining a simple and manageable data infrastructure that does not require multiple copies of the same data. This is particularly true for organizations trying to empower multiple data science teams to run workloads like demand forecasting or anomaly detection on the data lake.

A data lake is a centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data. It can store data in its native format and process any variety of it, ignoring size limits. For example, many companies have matrix structures, with specific teams responsible for some geographic regions while other teams are responsible for global coverage but only for their limited functional areas. This leads to data duplication and the creation of new data silos.

Managing distributed data at scale is incredibly complex. Distributed teams need to be able to own their data without creating silos, duplication, and inconsistencies. Dataplex allows organizations to scale their governance and introduce access policies that enable teams to operate on the portion of the data that is relevant to them.

Google Cloud can support your data lake modernization journey no matter where you are with people, processes, and technology. BigLake allows Google customers to unify their data warehouses and data lakes. Dataproc empowers distributed data science teams in complex organizations to run workloads in Apache Spark and other engines directly on the data lake while respecting policies and access rules.

This blog will show how Dataproc, Dataplex, and BigLake can empower data teams in a complex organizational setting, following the example of a global consumer goods company that has finance teams organized geographically. At the same time, other functions, such as marketing, are global.

Organizations are complex, but your data architecture doesn’t need to be

Our global consumer goods company has centralized their data in a data lake, and access policies ensure that each of their regional finance team has access only to the data that pertains to the appropriate location. While having access to global data, the marketing team does not have access to sensitive financial information stored in specific columns.

Dataproc with personal authentication enables these distributed teams to run data science and data engineering workloads on a centralized BigLake architecture with governance and policies defined in Dataplex.

BigLake creates a unified storage layer for all of the data and extends the BigQuery security model to file-based data in several different formats on Google Cloud and even on other clouds. Thanks to Dataproc, you can process this data in open-source engines such as Apache Spark and others.

In this example, our global consumer goods company has a centralized file-based repository of sales data for each product. Thanks to BigLake, this company can map these files in their data lake to tables, apply row and column level security and, with Dataplex, manage data governance at scale. For the sake of simplicity, let’s create a BigLake table based on a file stored in Cloud Storage containing global ice cream sales data.

As seen in the architecture diagram above, BigLake is not creating a copy of the data in the BigQuery storage layer. Data remains in Cloud Storage, but BigLake allows us to map it to the BigQuery security model and apply governance through Dataplex.

To satisfy our business requirement to control access to the data on a geographical basis, we can leverage row-level access policies. Members of the US Finance team will only have access to US data, while members of the Australia Finance team will only have access to Australian data.

Dataplex allows us to create policy tags to prevent access to specific columns. In this case, a policy tag called “Business Critical: Financial Data” is associated with discount and net revenue so that only finance teams can access this information.

Data Science with Dataproc on BigLake data

Dataproc allows customers to run workloads in several open-source engines, including Apache Spark. We will see in the rest of this blog how users can leverage Dataproc personal authentication to run data science workloads on Jupyter notebooks directly on the data lake, leveraging the governance and security features provided by BigLake and Dataplex.

For example, a member of the Australia finance team can only access data in their geographical area based on the row-level access policies defined on the BigLake table. Below, you can see the output of a simple operation reading the data from a Jupyter notebook running Spark on a Dataproc cluster with personal authentication:

As a reminder, even if we use the BigQuery connector to access the data via Spark, the data itself is still in the original file format on Cloud Storage. BigLake is creating a layer of abstraction that allows Dataproc to access the data while respecting all the governance rules defined on the data lake.

This member of the Australia finance team can leverage Spark to build a sales forecasting model, predicting sales of ice cream in the next six months:

Now, suppose a different user who is a member of the US Finance team tries to run a similar forecasting of ice cream sales based on the data she has access to, given the policies defined in BigLake and Dataplex. In that case, she will get very different results:

Sales of ice cream in the United States are expected to decline, while sales of ice cream in Australia will increase, all due to the different seasonal patterns in the Northern and Southern hemispheres. More importantly, each local team can independently operate on their regional data stored in a unified data lake, thanks to Dataplex on BigLake tables’ policies and Dataproc’s ability to run workloads with personal authentication.

Finally, users in the Marketing department will also be able to run Spark on Jupyter notebooks on Dataproc. Thanks to policy tags protecting financial data, they can only leverage the columns they have the right to access. For example, despite not having access to discount and revenue data, a marketing team member could leverage unit sales information to build a segmentation model using k-means clustering in Apache Spark on Dataproc.

Learn More

In this blog, we saw how Dataproc, BigLake, and Dataplex empower distributed data science teams with fine-grained access policies, governance, and the power of open-source data processing frameworks such as Apache Spark. To learn more about open-source data workloads on Google Cloud and governance at scale, please visit:

Create a lake in Dataplex

Create and manage BigLake tables

Dataproc Serverless Spark

Dataproc personal cluster authentication

Use policy tags to control column access in BigLake

Read More for the details.

2023 01 17

GCP – Managing Dialogflow CX Agents with Terraform

Cloud, Google Cloud gcp

Dialogflow CX is a powerful tool in Google Cloud that you can use to design conversational agents powered by Natural Language Understanding (NLU) to transform user requests into actionable data. You can integrate voice and/or chat agents in your app, website, or customer support systems to determine user intent and interact with users.

If you’ve ever wanted to get started with Dialogflow CX, you might have seen or ran through the quickstart steps to build a shirt ordering agent that you can ask for the store location, get store hours, or make a shirt order.

While going through the quickstart steps, you might find yourself wanting to codify all of the Dialogflow CX components and settings, which would help you quickly spin up agents and manage their configuration programmatically. In fact, you might already be using infrastructure as code tooling and best practices to manage virtual machines in Compute Engine, Kubernetes clusters in GKE, or topics and subscriptions in Pub/Sub. You can also use the same infrastructure as code approach with your Dialogflow CX agents: Terraform and Google Cloud to the rescue!

You can use the Terraform modules for Dialogflow CX along with the sample Terraform + Dialogflow CX configuration files to reproduce the chatbot/agent described in the “build a shirt ordering agent” quickstart. Try them out and spin up a Dialogflow CX agent with a single command in your own Google Cloud account!

Setup

There are a few things that you’ll need to set up before you run the sample Terraform configuration files for Dialogflow CX.

Enable the Dialogflow CX API.

Install and initialize the Google Cloud CLI.

Install Terraform.

Usage

Once you’ve completed the setup on your local machine, you’re ready to spin up your own fully-configured Dialogflow CX agent in seconds:

Clone the CCAI samples repository and cd into the dialogflow-cx/shirt-order-agent/ directory.

Edit the values in variables.tf to specify your Google Cloud project ID along with your desired region and zone.

Run terraform init to initialize the directory that contains the Terraform configuration files.

Run terraform apply, the command that spins everything up!

Once you run terraform apply and confirm the proposed plan, you’ll see messages about all of the components that were provisioned, including the agent, pages, intents, flows, and more:

code_block[StructValue([(u’code’, u’google_dialogflow_cx_agent.agent: Creating…rngoogle_dialogflow_cx_agent.agent: Creation complete after 2srngoogle_dialogflow_cx_entity_type.size: Creating…rngoogle_dialogflow_cx_page.store_location: Creating…rngoogle_dialogflow_cx_intent.store_hours: Creating…rngoogle_dialogflow_cx_page.store_hours: Creating…rngoogle_dialogflow_cx_page.order_confirmation: Creating…rngoogle_dialogflow_cx_intent.store_location: Creating…rngoogle_dialogflow_cx_intent.store_hours: Creation complete after 1srngoogle_dialogflow_cx_page.store_location: Creation complete after 1srngoogle_dialogflow_cx_page.order_confirmation: Creation complete after 1srngoogle_dialogflow_cx_page.store_hours: Creation complete after 1srngoogle_dialogflow_cx_intent.store_location: Creation complete after 1srngoogle_dialogflow_cx_entity_type.size: Creation complete after 1srngoogle_dialogflow_cx_page.new_order: Creating…rngoogle_dialogflow_cx_intent.order_new: Creating…rngoogle_dialogflow_cx_intent.order_new: Creation complete after 0srngoogle_dialogflow_cx_page.new_order: Creation complete after 0s’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e01680b1a50>)])]

Now that you’ve provisioned your agent in Dialogflow CX, you’re ready to view and test your agent in the Dialogflow CX Console!

How it works

We’re using the Terraform modules for Dialogflow CX to define a conversational agent and all of its components. We’ve reproduced the agent described in the build a shirt ordering agent quickstart.

All of the agent’s associated entity types, flows, intents, and pages are created and managed with Terraform, so you can edit your Terraform configuration files to change certain parameters, run terraform apply, and see your changes instantly reflected in the Dialogflow CX console.

You might notice that the flows.tf file actually uses a local-exec command within a null_resource block to make a REST API call instead of using a Terraform resource for Dialogflow CX to define the flow. This approach was used since Dialogflow CX creates a default start flow when the agent is created rather than being created and managed by Terraform. As a result, we can use a REST API call to PATCH the default start flow and then modify its messages and routes. We can still use Terraform to templatize and trigger the REST API command, which means that you can manage any setting that is also available in the Dialogflow CX REST API, or even add custom callbacks to other Google Cloud services if needed.

Summary

It’s convenient to be able to manage conversational agents as code using Terraform in Google Cloud. We get all of the benefits of Dialogflow CX with the convenience of Terraform to manage everything in a stateful and version-control friendly way.

Now that you’ve captured all of your Dialogflow CX agent settings and configuration in Terraform, you are ready to check your Terraform scripts into version control, spin up and destroy agents as you please using terraform apply and terraform destroy, or even store remote Terraform state in Google Cloud using the GCS backend.

Take a look at the Terraform + Dialogflow CX sample code along with the Terraform modules for Dialogflow CX so you can spin up your own Dialogflow CX agents with a single command. If you found this Terraform code sample useful, be sure to star, watch, or ask questions in our CCAI samples repository on GitHub!

aside_block[StructValue([(u’title’, u’Terraform + Dialogflow CX sample code’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e01680bd690>), (u’btn_text’, u’TRY IT OUT!’), (u’href’, u’https://github.com/GoogleCloudPlatform/contact-center-ai-samples/tree/main/dialogflow-cx/shirt-order-agent’), (u’image’, None)])]

Read More for the details.

2023 01 13

GCP – Built with BigQuery: Solving scale and complexity in retail pricing with BigQuery, Looker, Analytics Hub and more

Cloud, Google Cloud gcp

Context and Background

Maintaining optimal pricing in the retail industry can be challenging when relying on manual processes or platforms that are not equipped to handle the complexity and scale of the task. The ability to quickly adapt to factors affecting pricing has also become a critical success factor for retailers when pricing their products.

As a whole, the retail industry is working to absorb and respond to changes in the way customers buy and receive products, and to how pricing affects competitive advantage. For example, within some vertical domains, customers’ expectations for the price of products differ when buying online as compared to in-store, but are expected to align in others. Additionally, because of the ease with which shoppers can assess the prices across competitors, retailers are looking for ways to retain their most valuable customers; loyalty pricing, private label strategies, and bespoke promotional offers are seen as key aspects of possible solutions in this regard.

Whether in an everyday, promotional or clearance sense, the need to maintain optimal pricing requires a forward-looking mechanism that employs AI/ML capabilities based on multiple sources of input to provide prescriptive decision-making. Such an AI/ML platform can encourage specific buying behaviors aligned to a retailer’s strategy, for example, to rebalance inventory, expand basket sizes or increase private label brand sales. A pricing process can be thought of as having four stages:

Transfer and processing of information about a retailer’s operational structure, customer behavior related to its products, and other elements impacting supply and demand

Synthesis of information that represents the relationships between prices for products available across a retailers’ sales channels and business outcomes vis-a-vis financial metrics

Decision making, about price-related activities such as price increases / decreases or promotional offers, driven by human or software systems

Actions to actualize pricing decisions and to inform stakeholders affected by changes, driven by human or software systems

In the Solution Architecture section we will examine these stages in detail.

Use-cases: Challenges and Problems Resolved

The complexity of pricing in the retail industry, particularly for Fast Moving Consumer Goods (FMCG) retailers, can be significant. These retailers often have over 100,000 items in their assortments being sold at thousands of stores, and must also consider the impact of online shopping and customer segmentation on pricing decisions. Different buying behaviors across these dimensions can affect the recommendations of a pricing system, and it is important to take them into account in order to make accurate and effective pricing recommendations. An AI/ML-driven platform can provide greater agility and manage complexity to make more informed pricing decisions.

Speed is a critical factor in the retail industry, particularly for retailers selling Specialty Goods who face intense competitive pricing pressure in certain key products. In this environment, the ability to respond quickly to changes in the market and customer demand can be the difference between staying relevant and losing business to competitors. Automation using AI/ML, enabling real-time, on-demand price changes and promotions is a key factor in the evolving retail industry, particularly in the context of ecommerce and digital in-store systems like Electronic Shelf Labels (ESLs). These systems provide on-demand price changes and promotions that can positively alter customer behavior by increasing the basket size during a session.

To make this possible, the decision-making and delivery mechanisms behind these systems need to be driven by a flexible, programmatically accessible AI/ML engines that learn and adapts over time.

High level Architecture of Revionics Using GCP

Revionics’ product, Platform Built for Change, is a new platform that aims to address the significant changes occurring in the retail market by providing a flexible, scalable, intelligent and extensible solution for managing pricing processes. A foundational design principle for the platform is that it can be easily adapted, through configuration rather than code changes, to support a wide range of approaches and states of maturity in pricing practices. By externalizing dependencies of changes from the underlying code, the platform allows retailers to make changes more easily and quickly adapt to new requirements.

Figure 1: Revionics Solution on Google Cloud

The above diagram shows Revionics solution where we can see GCP serverless technologies used across all different layers, from ingestion to export. The key services used are:

Data Storage: Google BigQuery, GCS (blob storage), MongoDB Atlas

Data Processing: Google BigQuery, Google DataProc, Google Dataflow

Data Streaming: Kafka, Google PubSub

Orchestration: Cloud Composer (Airflow), Google Cloud Functions

Containerization & Infra Automation: GKE (Kubernetes)

Analytics: Google Looker

Data Sharing: Google Analytics Hub

Observability: Google Cloud Logger, Prometheus, Grafana

Solution Architecture

We will discuss the key problems, challenges to solve and how the various stages of the solution; Ingest, Process, Sync and Export have enabled Revionics to address the need for speed, scale, and automation, all while solving increasing complexity and evolving challenges in Retail pricing.

1. Transfer and processing of information

As Revionics is a SaaS provider, supporting retailers at the first stage of the pricing process – the transfer and processing of information – essentially boils down to overcoming one major challenge: wide variability. In our domain, variability comes from several data sources shown in Figure 2 below:

Each retailer’s pricing practice and technical environment exist in various states of maturity and sophistication

Entire sources of data may be included in some cases, but excluded in others

API usage and streaming may be plausible with some customers, whereas SFTP transfers are the only available means for others

Data quality, completeness and correctness vary by retailer according their upstream processes

Both large batch and near real-time use cases need to be supported

The Data transformation logic that feeds into science modeling will differ according to a combination of grouping and configuration choices based on the retailer’s operations and objectives. Essentially, there is no single “golden data pipeline”.

Figure 2: The workflow of aggregating multiple data feeds for pricing

In order to describe how the variability challenges are addressed, let’s drill into the Ingest and Processing portions of the architecture in the above diagram of a Customer Workflow example. There are three primary concepts:

a feed is a representation of a streamed or scheduled batch data source which has a number of methods for handling file formats and for plugging into various data technologies

a pipeline represents a combination of transformations taking in feeds and other pipelines

a DAG (directed acyclic graph) is generated by the configuration or wiring together of feeds and pipelines as well as supporting methods that execute validation and observability tasks. The generated DAG represents the full workflow for ingesting and processing information in preparation for Revionics’ Science platform

Let’s explore the benefits of the solution that leverages GCP, as depicted in the DAG Generation flow diagram in Figure 3, below:

The logical flow is a combination of Templates, Configuration and a Library of modular processing methods to generate workflows for ingesting, combining, transforming and composing data in a variety of ways. These are abstracted in a set of human readable configurations, simplifying the setup and support accessible to non-engineers. The DAG Generator outputs a JSON file for the entire workflow that can easily be understood.

The Data Platform natively delivers support capabilities such as validations and observability. Configurable validation checks are insertable at various levels for inspecting schema, looking for anomalies, and running statistical checks. Similarly, event logs, metrics and traces are collected by Cloud Logger and consolidation within BigQuery, to easily explore building dashboards or building ML models.

From an architectural standpoint, there is very minimal intervention needed to scale, operate and manage infrastructure. The workflow logic is represented in an execution agnostic fashion by a DAG JSON file that orchestrates method calls and artifact creation such as Tables, Views, Stored Procedures etc. The DAGs become instruction sets for Airflow, ultimately executed on a Composer Cluster (serverless). DataProc reads from GCS or SQL procedures on BigQuery to do all the heavy lifting in terms of combining data, aggregating, ML feature prep, etc.

Figure 3: Logical DAG Generation Flow

The bulk of event-driven processing is achieved using Cloud Functions or DataFlow, PubSub or Kafka and Dataflow. The architecture provides a high level of reliability and zero degradation in performance as requirements for scale and speed vary. The platform thus provides the ability for the team to focus on building novel approaches to challenging problems and being cost conscious on infrastructure spend.

2. Synthesis of information (balancing scale and skill)

At the heart of Revionics’ pricing process is an AI/ML engine used to synthesize the combined data signals that express the sources and contexts for demand of retail goods over zones, stores and customer segments.

Trained models learn to explain the influence of a multitude of features, such as seasonal effects, inflation trends, cannibalization across products, and competitive pricing on the quantity of products sold at a particular location and/or to a particular customer segment. These models are then used to forecast the impact of changes in price or in the structure of promotional offers given all known contexts prevalent during the time interval of interest.

The forecasts drive optimization processes that balance one or more business objectives (e.g. profit, margin, revenue or total units) while adhering to constraints based on the retailer’s operations and desired outcomes.

For example, in Figure 4, we are showing the data science modeling aspects employed by Revionics, leveraging the benefits of the GCP platform:

A retailer may want to optimize profit on its private label brands, while maintaining a minimum and maximum relative price differential to certain premium brands in a product category.

A retailer may look to set a markdown or clearance schedule while capturing as much revenue as possible using only discount multiples of 10%.

A retailer may want to optimize the discount level of a brand of flat screen TVs to maximize profitability over the whole category along with its related products.

Figure 4: Science Modeling & Price Recommendations

The primary technical challenge within the AI/ML domain for pricing is in balancing scalability with predictive skill during modeling, forecasting and optimization. Keep in mind that our median customer has 500,000 independent models (see Figure 5) that need to be trained per the histogram below; each training run is inherently iterative and computationally intensive. While the details of the science are beyond the scope of this blog, from a system perspective, Revionics architecture combines:

Two proprietary AI/ML frameworks, Probabilistic Programming Language (PPL) and the Grid, for expressing pricing domain behavior, and dynamically provisioning infrastructure to orchestrate separable modeling jobs based on statistical dependencies. PPL for expressing AI/ML models of pricing domain behavior; and the Grid infrastructure for orchestrating millions of parallel, separable modeling jobs.

Google-led open source platforms: TensorFlow for its rich machine learning library and framework and TensorFlow Probability for probabilistic modeling methods

Several of Google Cloud serverless services for data storage, compute, messaging, containerization & logging – GCS, BigQuery, PubSub, GKE & Logs Explorer.

Again, what is noteworthy about this solution is the breadth of challenges and capabilities that don’t have to be solved by Revionics because the services sit on infrastructure where sizing, configuration, monitoring, deployment and scaling are managed as a platform service.

Figure 5: Multiple ML Models Per Organization

Revionics’ modeling framework leverages a Hierarchical Bayesian methodology that can optimize each individual retailer’s product, store and customer relationships. This is a key aspect of differentiation from a predictive skill perspective, as complex relationships between entities can be preserved with a learned reduction of an otherwise intractable problem space.

3. Decision making and Action

The DAG generation-based architectural pattern used in the last two steps of the pricing process, handles the need for Automated Intelligence described earlier. Note, the purpose of the final steps in the pricing process are:

To expose outputs from the data science modeling and price recommendations to teams or systems for decision making about price-related activities such as increases / decreases or promotional offers

To take actions, or to actualize pricing decisions, as well as to inform stakeholders affected by the changes being made

The Automated Intelligence enabled by the Google Cloud Platform enables us to scale these steps by providing well-designed APIs that allow users to integrate with and build on a platform’s output. However, AI/ML -based applications have specific challenges that need to be overcome in pricing.

One of these challenges is to build trust through explainability and greater transparency of the decisions behind the models. Because AI/ML software is non-deterministic, poorly understood by non-practitioners, and often replaces human processes, confidence isn’t easy to engender.

Visibility is the best asset for creating trust, which is why Revionics uses BigQuery and Looker at the center of our approach. With these technologies, reports and visualizations are interwoven into all aspects of the Revionics solution, creating a clear line of sight from data to decisions – this gives users visibility:

Forecasted business results from recommendations – for example, by making it easy for a user to understand the weighting of profit and revenue in a multi-objective optimization resulting in a particular price recommendation

Statistical confidence around decision variables – by showing visualizations and metrics such as 95% credibility intervals around price elasticity, for example

AI/ML model output analytics over time – including histograms, statistical metrics, outlier detection and the like that express the health of the models

Figure 6: Analytics Embedded in Solution

Automated Intelligence. The Looker + BigQuery combination is particularly effective because:

Performance in high-dimensional analytics. Its ability to maintain performance at scale in the context of high-dimensional analytics without the need for manual intervention.

Personalization. Its capacity for personalization and business relevance views and reports that reflect the specific conditions and metrics representative of the business.

Collaboration. In order to gain the confidence of people whose jobs are not pricing (e.g. category managers, merchandisers, executives), users have to be able to engage with and share analytical content in a completely frictionless way. In the Revionics solution, people within the organization who have no user login nor experience with the tool can view any analytical asset as well as engage with in-app comment threads.

Triggers. Additionally, analytical assets, such as reports and datasets, can also be scheduled or triggered for export.

Enhancing Performance and Managing Cost. Due to the scale of the data that is needed to drive downstream processes, APIs or even distributed streaming-based egress approaches are not always ideal. To resolve this, Revionics is exploring the use of Google’s Analytics Hub. The service gives an ability to create and securely share BigQuery datasets, tables and views that are simply available within the customer’s environment, which is an incredibly powerful tool for increasing the impact of Automated Intelligence. The benefit of exchanging data via the Analytics Hub is that we can preserve flexibility for users at scale in the system they are already likely to be using to drive their analytics, stream, and execute large transformations. In addition, Analytics Hub provides the levers to create exchanges and listings in Revionics’ SaaS solution without having to move data, thereby being incredibly cost optimized.

In particular, use cases that require very granular data from Revionics’ pricing system to be combined with a retailer’s other source data are very well served here. In a more tangible sense, we foresee users automating eCommerce-related capabilities, merchandising processes and digital marketing systems, in addition to any number of operational use cases that we have yet to conceive.

Outcomes

By building on the Google Cloud Platform and data cloud, Revionics has built and hosted solutions that have yielded numerous benefits. Some of the notable outcomes are:

Enhanced speed and agility: By replacing customer-specific stored procedures and scripts with human readable configurations, the solution has lowered the barrier for variable data transformation logic and better validation logic. All of these have made the solution easier to configure and more agile.

Improved stability: As the reliance on data quality is fairly high, several constructs in the solution have ensured data hygiene has improved leading to meeting SLAs and reduced downtime. Collectively, the customer support issues have lowered.

Rapid Data processing: Below graphic shows the multi-fold improvement in various parts of the data processing pipeline with progressively increasing volumes.

Faster Technical implementations: The time to value has been quicker enabled by design and performance. For instance: Test cycles and customer feedback sped up leading to quality standards being achieved faster. Historical loads have run significantly faster.

Increased accuracy: The forecast accuracy grew by a greater percent while maintaining training and optimization over the first phase of customers migrating to the new platform.

Decision making: The rising statical and decision confidence led to higher-impact results and better SUS (system usability scores).

Click here to learn more about Revionics.

The Built with BigQuery advantage for ISVs

Google is helping tech companies like Revionics build innovative applications on Google’s data cloud with simplified access to technology, helpful and dedicated engineering support, and joint go-to-market programs through the Built with BigQuery initiative, launched in April as part of the Google Data Cloud Summit. Participating companies can:

Get started fast with a Google-funded, pre-configured sandbox.

Accelerate product design and architecture through access to designated experts from the ISV Center of Excellence who can provide insight into key use cases, architectural patterns, and best practices.

Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption.

BigQuery gives ISVs the advantage of a powerful, highly scalable data warehouse that’s integrated with Google Cloud’s open, secure, sustainable platform. And with a huge partner ecosystem and support for multi-cloud, open source tools and APIs, Google provides technology companies the portability and extensibility they need to avoid data lock-in.

Click here to learn more about Built with BigQuery.

We thank the Google Cloud and Revionics team members who co-authored the blog: Revionics: Aakriti Bhargava, Director Platform Engineering. Google: Sujit Khasnis, Cloud Partner Engineering

Read More for the details.

2023 01 13

GCP – Cultural drivers of DevOps success

Cloud, Google Cloud gcp

While there are as many ways of defining DevOps as there are people, most would agree that at the most basic level, DevOps is about tools, practices, and how people work together to deliver software quickly, reliably, and safely. A big part of how people work together has to do with their organization’s culture.

Research conducted by the DORA team over the past eight years has consistently shown that culture is foundational to an organization’s success and the well-being of its employees. Our data has shown that high performing organizations, those that meet their performance and profitability goals, are more likely to have a generative culture – the 2022 State of DevOps report’s findings continue to support this claim.

This year, we examined three main areas related to the impact of culture on DevOps:

If the shift in work arrangements that has occurred since the start of the Covid-19 pandemic has had an impact on organizational performance.

Whether higher team churn impacts organizational performance.

The impact of employee burnout.

Work arrangement shifts

Flexible work arrangements have now become the norm at many organizations. Employees across many industries have the opportunity to choose between in-person, hybrid or fully-remote work arrangements that best fit their unique needs. Our research indicates that this shift has been a good one. Specifically, we find that flexible work arrangements are associated with higher organizational performance compared to organizations with more rigid work arrangements. This is good news! These findings demonstrate that being employee centric can have tangible and direct benefits to organizations.

Employee churn

Constant churn can impact productivity and morale as new team members need time to onboard. And those who stay might need to adapt to changes in their workload and team dynamics. We found that stable teams — teams whose composition hadn’t changed much over the last 12 months, were more likely to exist within high-performing organizations. Our research also showed that stable teams were more likely to report producing quality documentation compared to teams that experienced more churn. A team that is constantly dealing with change may have a harder time keeping up with practices that lead to quality documentation.

Employee burnout

Lastly, we focused again on burnout – the feeling of dread, apathy, and cynicism surrounding work. Experiencing burnout can lead people to have lower levels of job satisfaction and increases in turnover – not to mention the increased risk of poor mental and physical health outcomes such as increased risk for depression and anxiety, heart disease, and suicidal thoughts1. Our findings from this year showed that rigid work arrangements increase the likelihood of employee burnout by 30%.

Taken together, these three findings underscore the importance of creating a healthy and inclusive culture for employees both at the organizational and team level.

While we continue to emphasize the importance of culture, we acknowledge that changing or even improving an organization’s culture is no easy task. We recommend that organizations seek to first understand their employees’ experiences and subsequently invest resources in addressing culture-related issues as part of DevOps transformation efforts.

To learn more about how culture impacts organizational performance check out the 2022 State of DevOps Report. And if you have a story about how you are implementing DORA devops practices, don’t forget to share your transformation journey by applying to the2022 DevOps Awards by January 31, 2023!

Maslach C, Leiter MP. Understanding the burnout experience: recent research and its implications for psychiatry. World Psychiatry. 2016 Jun;15(2):103-11. doi: 10.1002/wps.20311. PMID: 27265691; PMCID: PMC4911781.

Read More for the details.

2023 01 13

GCP – Connecting Google Kubernetes Engine to Cloud SQL using the Auth Proxy Operator

Cloud, Google Cloud gcp

We are constantly looking for ways to simplify the developer experience on Google Cloud.

Google Kubernetes Engine (GKE) is a simple way to automatically deploy, scale, and manage Kubernetes. Cloud SQL is a fully managed relational database service for MySQL, PostgreSQL, and SQL Server. Developers often deploy their applications to GKE and store their data in Cloud SQL, so connecting GKE to Cloud SQL is typically one of the first big steps in deploying a full stack application. The Kubernetes operator simplifies that process.

How to connect from GKE to Cloud SQL

Generally, the easiest way to connect to Cloud SQL is with a language-specific Cloud SQL connector. There are Cloud SQL connectors for Java, Python, and Go — with more to come in the future. If your application is written in one of those languages, we recommend starting with a connector. Otherwise, the Cloud SQL Auth proxy is likely the right choice for your applications running on Google Kubernetes Engine. If you’re willing to join us on the leading edge, the Kubernetes operator is now in Public Preview.

Switching to the Cloud SQL Auth Proxy Kubernetes Operator

The Cloud SQL Proxy Operator is currently in Public Preview. Here are a few exciting benefits for those ready to make the switch:

Configure a Cloud SQL Auth Proxy in 8 lines of YAML — saving you about 40 lines of YAML configuration (or thousands for large clusters)

Simple configuration of a single Cloud SQL Proxy specific resource — allowing multiple Kubernetes applications to share the same proxy

Best practices by default — we maintain the operator and update it to the latest recommendations

Automatic deployment when the proxy configuration changes (coming in the GA release)

Here’s an example of what configuration might look like before the operator. Note how much simpler and more elegant the new operator makes deployment.

code_block[StructValue([(u’code’, u’apiVersion: v1rnkind: Deploymentrnspec:rn template:rn spec:rn containers:rn – name: cloud-sql-proxyrn args: rn – –http-port=9801rn – –http-address=0.0.0.0rn – –health-checkrn – –structured-logsrn – my-project:us-central1:one?unix-socket=/csql/pgrn env:rn – name: DB_SOCKET_PATHrn value: /csql/pgrn image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.0.0-preview.2rn imagePullPolicy: IfNotPresentrn livenessProbe:rn failureThreshold: 3rn httpGet:rn path: /livenessrn port: 9801rn scheme: HTTPrn periodSeconds: 30rn successThreshold: 1rn timeoutSeconds: 1rnrnu2193 40 more lines of YAML u2193′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e129d8c3350>)])]

And here is what the configuration looks like for the same project after adding the operator:

code_block[StructValue([(u’code’, u’apiVersion: cloudsql.cloud.google.com/v1alpha1rnkind: AuthProxyWorkloadrnmetadata:rn name: authproxyworkload-samplernspec:rn workloadSelector:rn kind: “Deployment”rn name: “gke-cloud-sql-app”rn instances:rn – connectionString: “my-project:us-central1:one”rn unixSocketPathEnvName: “DB_SOCKET_PATH”rn socketType: “unix”rn unixSocketPath: “/csql/pg”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e129d8c3890>)])]

We want your feedback

While the Cloud SQL Proxy Kubernetes operator is in Public Preview, we want to hear what could make it even better for you. We are working on this project in our public GitHub Repository. You can find the code, quickstart, and contribution guidelines there. We’d love to accept your patches and contributions to this project. We’re hoping with all of the typing we save you on YAML, you might have enough time to create issues or make a pull request. Then someday, we can give your fingers the much-needed vacation they deserve.

Read More for the details.

2023 01 13

GCP – Transfer data from AWS to GCP using Storage Transfer Service

Cloud, Google Cloud gcp

Overview

Storage Transfer Service enables users to quickly and securely transfer data to, from, and between object and file storage systems, including Google’s Cloud Storage, Amazon S3, Azure Blob Storage, and on-premises data.

This blog walks you through the process of transferring data from AWS S3 to Google’s Cloud Storage in a secure manner using identity federation.

Identity Federation creates a trust relationship between Google Cloud and AWS. It allows you to access resources directly, using a short-lived access token, and eliminates the maintenance and security burden associated with long-term credentials such as the service account keys. Using Identity federation, you do not have to worry about rotating keys or explicitly revoking the keys when Storage Transfer Service is not in use.

Steps to configure storage transfer job to transfer data from AWS S3 to GCS

This section walks you through the process to set up infrastructure to transfer data from Amazon Web Services to Google Cloud securely.

Configurations on Google Cloud

Enable the Storage Transfer API under APIs and Services.

Open Cloud Shell in the Google Cloud project that you want to configure the transfer job in

Run gcloud auth print-access-token to generate the Authorization: Bearer token which will be used in the next step

Run the following command in the cloud shell to generate the service account:

code_block[StructValue([(u’code’, u”curl –location –request GET ‘https://storagetransfer.googleapis.com/v1/googleServiceAccounts/<Replace Project number>’ –header ‘X-Goog-User-Project: <Replace Project ID>’ –header ‘Authorization: Bearer <token>'”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4b8b34a210>)])]

Replace project number, project ID and token.

The output of this command will be in the format:

code_block[StructValue([(u’code’, u'{rn “accountEmail”: “<service account email>”,rn “subjectId”: “<service account subject ID>”rn}’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4b765f6890>)])]

NOTE: Make a note of the subjectId as this will be used in the AWS IAM role trust relationship policy.

Create a Cloud Storage bucket in Google Cloud

Give the service account “accountEmail” that you generated in the previous step the following IAM permissions:

Storage Object Viewer

Configurations on Amazon Web Services

In the AWS IAM console, create an IAM policy with the following:

code_block[StructValue([(u’code’, u'{rn “Version”: “2012-10-17”,rn “Statement”: [rn {rn “Effect”: “Allow”,rn “Action”: [rn “s3:Get*”,rn “s3:List*”,rn “s3:Delete*”rn ],rn “Resource”: “*”rn }rn ]rn}’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4b765f6490>)])]

NOTE: This policy can be further restricted to a single S3 bucket. s3:GetBucketLocation permission will be needed to fetch the object location.

After the policy is created, head to IAM roles tab and follow the steps below:

Select Create role

Select Web Identity

Select “Google” as the identity provider with audience as “accounts.google.com”

In the next step, to add permissions, Select the IAM policy that you created in step 1

Update the role name and Create the role

In the IAM Role console, Select the role you created and Click on the Trust relationships tab

Click Edit Trust Policy, update the following and update the policy:

code_block[StructValue([(u’code’, u'{rn “Version”: “2012-10-17”,rn “Statement”: [rn {rn “Effect”: “Allow”,rn “Principal”: {rn “Federated”: “accounts.google.com”rn },rn “Action”: “sts:AssumeRoleWithWebIdentity”,rn “Condition”: {rn “StringEquals”: {rn “accounts.google.com:sub”: “<Replace with Google Cloud service account subject ID>”rn }rn }rn }rn ]rn}’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4b765f6c10>)])]

Create an S3 bucket on AWS with objects to be transferred to Google Cloud

Storage transfer job configuration

Head to the Storage Transfer Service (STS) on the Google Cloud project

Select Create a transfer job

Source – Amazon S3, Destination – Google’s Cloud Storage

In the next step, enter the S3 bucket name and AWS IAM role ARN

Next, select the Cloud Storage bucket

Next, choose the settings that works best for your use case

Select the schedule and create the STS job.

Notice that the objects from the S3 bucket are being copied over to the GCS bucket. When you run it again, Storage Transfer Service does incremental transfers, skipping the data that was already copied.

Custom scheduler for Storage Transfer Service

STS currently supports a minimum sync schedule of 1 hour. Triggering a Cloud scheduler via Cloud Functions is a work around technique to reduce the sync schedule to minutes/custom schedule.

Event-driven STS for Cloud Storage

Storage Transfer Service now offers event-driven transfer, a serverless and easy-to-use replication service. STS can listen to event notifications in AWS or Google Cloud to automatically transfer data that has been added or updated in the source location. Event-driven transfers are supported from AWS S3 or Cloud Storage to Cloud Storage.

This feature is a good fit for use cases where you have a changing data source (e.g., new object insertion) that needs to be replicated to the destination in a matter of minutes.

You can trigger an event driven replication from AWS S3 to Google Cloud for ongoing data analytics and/or machine learning.

Event driven configuration on Storage Transfer Service:

The image below indicates a new file being transferred from AWS S3 to Cloud Storage via event driven STS:

Enabling AWS Event Notifications for SQS

On AWS S3:

Go to the bucket “Properties” tab and create “Event Notification”

Select All Object Create Events

Update the SQS ARN

Add SQS and S3 permissions to the IAM role being configured in the Storage Transfer Job

Once the setup is complete, the S3 bucket should be enabled to deliver notifications to the configured SQS queue. And the configured role should be able to access both SQS queue and S3 bucket for event-driven transfers.

On AWS SQS:

Create SQS queue for event driven transfers

In Access Policy, select “Advance” and add the sample policy below. This grants Amazon S3 permissions to publish messages to the SQS queue.

code_block[StructValue([(u’code’, u'{rn “Version”: “2012-10-17”,rn “Id”: “example-ID”,rn “Statement”: [rn {rn “Sid”: “example-statement-ID”,rn “Effect”: “Allow”,rn “Principal”: {rn “Service”: “s3.amazonaws.com”rn },rn “Action”: “SQS:SendMessage”,rn “Resource”: “arn:aws:sqs:us-east-1:123456789101:test-queue”,rn “Condition”: {rn “StringEquals”: {rn “aws:SourceAccount”: “123456789101”rn },rn “ArnLike”: {rn “aws:SourceArn”: “arn:aws:s3:*:*:source-bucket”rn }rn }rn }rn ]rn}’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4b76616c50>)])]

Note: Replace SQS ARN, Source account number and S3 bucket ARN

Make a note of the SQS ARN to configure it in the Storage Transfer Service Event Driven tab.

Create a Storage transfer job and observe your objects being replicated from AWS S3 to GCS bucket.

This completes the event driven set up for Storage Transfer Service.

Summary

In this blog, we used Storage Transfer Service to securely transfer data from AWS S3 to Google’s Cloud Storage. We also discussed the event-driven STS feature that can listen to event notifications in AWS to automatically transfer data that has been added or updated in the source location.

If you would like to learn more, check out the Storage Transfer Service documentation.

Read More for the details.

Google Cloud

High Level Design

Environment setup

Deploy Google Managed Prometheus (GMP) Query Interface in the cluster

Install Flagger in the Kubernetes cluster

Bootstrap the environment

Create a Cloud Deploy Pipeline

Canary Deployment

How it works

Next Steps

Google Kubernetes Engine Gateway controller is now GA for single cluster deployments

Get the help you need at every stage of the journey

We’re here to help

Data is everywhere, so why aren’t insights?

Unleashing insights from data in sheets

Bringing data to Connected Sheets

Getting started

Create content with search

Drill anywhere

Data storytelling with Slides & Sheets

Putting ThoughtSpot for Sheets to work

Sales

Marketing

Research

1. VPCs with billions and billions of IPv6 addresses

2. Non-overlapping private IPv6 address space

An Introduction to IPv6 on Google Cloud

Introducing the Google Cloud insurance claims reference architecture

How to use the reference architecture

What can you achieve with the reference architecture?

Why transform insurance with Google Cloud

Unify data lakes and warehouses with BigLake, now generally available

Same logs, same cost, more value with Log Analytics

New features in this release

Get started today

Migrate and modernize intelligently with Google Cloud Migration Center

Challenges: The need to put unstructured data to use more rapidly

Solution approach

Snorkel AI’s data-centric approach unlocks new ways of preparing ML training workloads

Solution details

Unified access to data stored on Google Cloud

Snorkel Flow + Google Cloud BigQuery

Real-world impact

Better together: Snorkel AI + Google Cloud

1. Mobile Wins (Again)

2. Customers Love a Morning Browse on Black Friday

4. Virtual Shopping Carts Flooded the Site

5. Shoppers Relied on Recommendations

Cyber 2023

Crate and Barrel boosts online customer experience with better site search powered by Lucidworks on Google Cloud

What DORA does for the European financial sector

How Google Cloud is preparing for DORA

Looking ahead

Reliability matters

Reliability is a journey

Reliability is about people

Organizations are complex, but your data architecture doesn’t need to be

Data Science with Dataproc on BigLake data

Learn More

Setup

Usage

How it works

Summary

Context and Background

Use-cases: Challenges and Problems Resolved

High level Architecture of Revionics Using GCP

Solution Architecture

Outcomes

The Built with BigQuery advantage for ISVs

Built with BigQuery: Zeotap uses Google BigQuery to build highly customized audiences at scale

Work arrangement shifts

Employee churn

Employee burnout

How to connect from GKE to Cloud SQL

Switching to the Cloud SQL Auth Proxy Kubernetes Operator

We want your feedback

Overview

Steps to configure storage transfer job to transfer data from AWS S3 to GCS

Configurations on Amazon Web Services

Storage transfer job configuration