Azure – General availability: Computed properties in Azure Cosmos DB for NoSQL
Computed properties make it easy to write NoSQL query logic once and reference it many times!
Read More for the details.
Computed properties make it easy to write NoSQL query logic once and reference it many times!
Read More for the details.
Cross region DR allows you to switch to another region in case of an outage.
Read More for the details.
Azure Cosmos DB for NoSQL built-in vector search powered by DiskANN is in public preview.
Read More for the details.
The Azure Cosmos DB Go SDK is now available with improved support for global distribution.
Read More for the details.
Easily change the capacity mode of an Azure Cosmos DB account from serverless to provisioned capacity using Azure portal or Azure CLI without any hassle.
Read More for the details.
We’re excited to announce Azure Cosmos DB integration with Vercel Generally Available.
Read More for the details.
Facturation.net is a leading Canadian provider of a cloud-based invoicing and billing software. With many customers in Canada, its mission is to help doctors and their administrative staff with their medical billing and provide customized software and services.
“As our business grew, we realized that our existing Azure infrastructure was no longer meeting our needs. We were facing increasing costs and complexity, and we wanted to find a more scalable and cost-effective solution,” said Jacob Verret, CTO at Facturation.net. “After evaluating several options, we decided to migrate our production VMs to Google Cloud.”
The challenge was figuring out how to migrate Facturation.net production virtual machines (VMs) in Azure over a single weekend without any impact to our production workloads. The company needed a solution that was simple, reliable, and cost-effective.
After researching its options, Facturation.net decided to use Migrate to Virtual Machines (M2VM) on Google Cloud.
M2VM, which is part of the Google Cloud Migration Center, allows you to migrate virtual machines from VMware, AWS, and Azure environments to Compute Engine in a few minutes. With M2VM, you can migrate VMs safely, at high velocity, and at scale.
M2VM is designed to be simple and easy to use. It has a user-friendly interface and requires no special skills or expertise. The service provides several benefits that make it a great choice for cloud-to-cloud migrations, including:
Fast and efficient migration: M2VM uses a high-performance data transfer engine that can migrate VMs quickly and efficiently.
Proven reliability: Google Cloud customers have used M2VM to migrate hundreds of thousands of VMs from on-premises and cloud environments.
Enterprise-grade security: M2VM uses industry-standard security protocols to protect your data during migration.
Cost-effective: As a managed service, M2VM is a cost-effective solution that can help you save money on your cloud migration. It has no additional costs on top of those associated Google resources.
The Facturation.net team used the following step-by-step approach to migrate their VMs using M2VM:
The team created a detailed migration plan that included a list of the VMs to migrate, the target Google Cloud project, and the desired migration start and end times. They also made sure that the VMs were properly prepared for migration, including having the latest security patches installed and disabling any unnecessary services.
They enabled M2VM by creating a migration project in the Google Cloud console, added an Azure source, and defined the target project on Google Cloud for the migration process.
To start the migration process, the team selected the VMs to be migrated and the target Google Cloud project. M2VM then created a replication job for each VM. They monitored the progress of the replication jobs in the M2VM console.
Once the replication jobs were complete, they tested the migrated VMs to make sure they were working properly and then performed a cutover to switch traffic from the Azure VMs to the new Google Cloud VMs.
After the transition was complete, the team cleaned up the Azure resources that were no longer needed.
“We were very happy with the results of our migration using M2VM. The process was straightforward, and the intuitive user interface allowed us to migrate our VMs without any problems. On top of that, it was incredibly fast — we were able to migrate all of our production VMs over to Google Cloud in a single weekend,” said Alexander Grange, senior cloud architect at Facturation.Net, who led the migration effort.
In addition, Facturation.net did not experience any data loss or corruption during the migration process, and since M2VM uses industry-standard protocols, the team was confident that the company’s data was safe and secure during the migration process. The cost of migration was also very reasonable, allowing Facturation.net to save money on its cloud migration.
Overall, thanks to Migration to Virtual Machines and the Google Cloud team, Facturation.net was able to move rapidly from Azure to Google Cloud — without impacting production.
Read More for the details.
Meeting client timelines and quality standards is a long-standing challenge for Google Cloud Partners, especially when navigating large legacy codebases, unfamiliar languages, and multiple tools.
Partner projects frequently involve code integration across multiple languages — including legacy ones — posing challenges for manual analysis and limiting the effectiveness of automation tools. Further, ensuring reliable error detection is crucial for partners assisting Google Cloud developers.
And despite using DevOps practices for faster solution development, partners rarely have holistic visibility into customer environments, making it harder to minimize errors and expedite delivery.
Today, generative AI (gen AI) is reshaping software development for Google Cloud partners, offering a strategic solution to balance speed, quality, and security. This blog post explores how Google Cloud partners can use Google’s gen AI alongside other Google Cloud services to redefine and enhance DevOps practices.
At Google Cloud, we understand that partners need powerful tools with the flexibility to integrate into existing workflows. Our gen AI is integrated across the Google Cloud platform so you can use it to enhance your DevOps practices quickly and efficiently.
For example, Gemini Code Assist completes your code as you write, and generates whole code blocks or functions on demand. Code assistance is available in multiple IDEs such as Visual Studio Code, JetBrains IDEs (IntelliJ, PyCharm, GoLand, WebStorm, and more), Cloud Workstations, Cloud Shell Editor, and supports 20+ programming languages, including Go, Java, Javascript, Python, and SQL.
We always encourage our partners to review our Responsible AI guidelines as they use Google gen AI to enhance DevOps practices.
Now, let’s take a look at how to incorporate generative AI large language models (LLMs) with partner development operations so you can accelerate and transform how you develop solutions for end customers.
Most partners have sophisticated in-house development operations tools and processes for their customers. With our new reference architecture, you can modernize software delivery operations further by integrating Google gen AI products into those tools and processes.
The reference architecture below shows gen AI products working in conjunction with Google Cloud products to augment solution development operations for partner engagements.
Reference architecture – a gen AI powered DevOps automation solution
Partners can use this reference architecture to build a solution to streamline DevOps software delivery.
First, Google gen AI LLMs integrate with Continuous Integration and Continuous Deployment (CI/CD) pipelines, enabling automated testing and risk checks against your evolving codebase.
Then, Google’s monitoring tools provide the visibility needed to spot production issues faster and centralize troubleshooting, and generated code suggestions from Gemini Code Assist can be pushed into a Git repository for CI/CD deployments into client environments.
Partners can deploy this solution in their own environments to deploy and manage client environments, or deploy it into customer environments as well.
This reference architecture provides a comprehensive solution for automating code generation, review, and deployment; its key components are as follows
1. Vertex AI: The heart of the gen AI platform
Gen AI LLMs: Vertex allow developers to interact with cutting-edge generative AI models for code generation, completion, translation, debugging, and fine-tuning. Developers can automate repetitive tasks, improve their productivity, and create innovative solutions. The following generative LLM variants work together within this solution to help developers:
Code-Gecko: Code-Gecko is an intelligent code-completion tool that integrates with Gemini 1.0 Pro. It provides real-time suggestions leveraging Gemini’s context understanding for efficient coding and to generate relevant code suggestions for developers.
Gemini Code Assist & Gemini 1.0 Pro: Gemini Code Assist bridges the IDE and Gemini 1.0 Pro, allowing developers to interact with Gemini. It facilitates user queries, provides contextual information, and generates code snippets based on the user’s requirements within the solution.
Gemini 1.5 Pro: Gemini 1.5 Pro is an upgrade to Gemini 1.0 Pro, performing more complex tasks and comprehensive code analyses. It supports coherent and contextually aware responses, multi-turn interaction, and code generation with documentation. In this design, Gemini 1.5 Pro provides a powerful AI assistant to help with complex coding use cases.
Vertex AI Search: Vertex AI Search leverages semantic search and ML models to index code snippets and documentation. In this architecture, it enables fast retrieval of code created by LLMs and facilitates various task-automation processes, such as code generation, testing, and documentation creation.
Vertex AI Agents is a cutting-edge agent building and conversational AI service that automates risk checks and generates fresh code suggestions. It integrates with CI/CD pipelines and allows developers to interact with the codebase through a chat-like interface.
2. Cloud Storage: The code repository
CodeBase: In this design, Cloud Storage serves as the central repository for all project artifacts, including codebases, unit tests, documentation, and model training data including generated codes from the LLMs.
Triggers and build processes: Cloud Storage integrates with Cloud Build, enabling automatic trigger of builds and CI/CD pipelines upon changes to the codebase, to test and validate generated code.
3. BigQuery: Enhancing insights
Document metadata: BigQuery stores structured metadata about code artifacts in Cloud Storage, providing insights into code documentation, relationships between source code files and generated tests, and code authorship.
Analysis and insights: BigQuery enables analysis of this metadata along with logs and metrics collected from other components, providing insights into model usage patterns and areas for improvement.
4. Security and observability
IAM (Identity and Access Management): IAM provides granular role-based access control, ensuring appropriate permissions for developers, services, and the gen AI models themselves.
VPC (Virtual Private Cloud): VPC isolates the development environment, enhancing security by limiting access points and defining network controls and firewalls.
Cloud Logging and Cloud Monitoring: These services work together to collect logs from various components and track key performance indicators, aiding in troubleshooting, monitoring model performance, and detecting potential issues.
Our expertise in Search and keeping results grounded and factual translates to more effective solutions. With this architecture, you can leverage Google gen AI offerings to expedite solution development, streamline workflows, while automating tasks like coding and code review. The end result is quick and error-free delivery to clients and improved software development efficiency. By using this gen AI solution, you differentiate your organization with innovative, ready-to-use tools, demonstrating your commitment to adding value for your clients with cutting-edge technology.
We urge you partners to broaden your competencies in Google gen AI, and integrate it across your workflows. Reach out to your Partner account team for the training options available to you. If you are interested in becoming a Google Cloud Partner, reach out to us here.
Read More for the details.
Dataflow Streaming Engine users know that there’s no such thing as an “average” streaming use case. Some customers have strict latency requirements that must be met even during traffic spikes. Others are more concerned with cost and want to run their streaming pipelines as efficiently as possible. The question is: Do you prefer lower peak latency or lower streaming costs for your workload?
For example, with a network threat detection use case, being able to identify and react to cyberattacks in real-time is crucial. In such real-time scenarios, it’s preferable to keep latency as low as possible by provisioning resources more aggressively. In contrast, other use cases like product analytics can tolerate a 30-60 sec delay, so keeping the costs in check by provisioning resources more conservatively is likely the correct decision.
An autoscaler’s algorithm significantly influences peak latency and pipeline costs. A more aggressive autoscaling strategy helps maintain low latency by promptly adding resources to process data when traffic increases. On the other hand, a less aggressive autoscaling approach aims to minimize costs by managing resources conservatively. The impact can be substantial. Consider the following typical streaming job, ingesting Pub/Sub events to BigQuery:
This specific Pub/Sub to BigQuery Dataflow pipeline demonstrates how a user can choose to reduce latency by 82% or reduce cost by 24.5% by changing the autoscaling hint value to 0.3 for minimal latency or to 0.7 for minimal cost.
Your specific streaming pipeline’s curve may have a very different shape than this one test pipeline’s, but the core idea still applies: by changing the autoscaling utilization hint value, Dataflow streaming users can set and modify their preferences for lower latency or cost.
At the core of autoscaler decision-making is the CPU utilization of the workers performing the pipeline’s computations. Dataflow’s streaming autoscaler tends to scale up when the current worker’s utilization exceeds the acceptable upper bound for worker CPU utilization, and scale down when current worker utilization drops below the utilization lower bound. You can set the autoscaling utilization hint to a higher or lower value using a Dataflow service option. Setting this value allows you to encode the decision about cost vs. latency for your use case.
The hard part is to find the optimum hint value for a specific streaming pipeline. As a rule of thumb, you should consider reducing the autoscaling utilization hint to achieve lower latency when the pipeline:
Scales up too slowly: The autoscale lags behind traffic spikes and backlog seconds start to grow.
Scales down too much: Current worker CPU utilization is low and the backlog grows.
Conversely, you should consider increasing the autoscaling utilization hint to reduce cost when you observe excessive upscaling and want to reduce costs if high worker utilization and higher latency are acceptable for the use case.
There are no “universal” minimum cost or latency autoscaling utilization hint values. The shape of the cost-latency curve (like the one you see in the example graph above) is specific to a particular streaming job. Also, the optimum values may change or “drift” over time as the properties of the pipeline change due to, for example, variations in the traffic pattern.
Dataflow’s autoscaling UI provides insights on when it’s worth adjusting the autoscaling behavior. You can modify the autoscaling hint value in real time without having to stop the job to address the changing data load and meet your business requirements. The current worker utilization metric is an important heuristic that you may want to align the autoscaling hint value to start with.
To make it easier for you to evaluate and fine-tune the autoscaling performance to your preferences, we’ve also introduced additional dashboards and metrics in the Dataflow autoscaling UI.
In particular, you may want to start by observing the following four graphs:
Autoscaling: shows current and target worker counts and displays time-series autoscaling data, along with min / max and target number of workers
Autoscaling rationale: explains the factors driving autoscaling decisions (upscale, downscale, no change)
Worker CPU utilization: shows current user worker CPU utilization and customer hint value (when it is actively used in the autoscaling decision1). This is an important factor in the autoscaling decisions.
Max backlog estimated seconds chart gives an indication of pipeline latency. This is another major factor in the autoscaling decisions2.
Thank you for reading this far! We are thrilled to bring these capabilities to our Dataflow Streaming Engine users and see different ways you use Dataflow to transform your business. Get started with Dataflow right from your console. Stay tuned for future updates and learn more by contacting the Google Cloud Sales team.
1. It is important to note that Customer Hint for Autoscaling is just one of the factors impacting autoscaler decisions. The autoscaler algorithm can override it due to other factors. Read more here
2. When backlog is high, the autoscaling utilization hint would be ignored by autoscaling policy to keep latency low. Read more here
Read More for the details.
Written by: Mark Swindle
While investigating recent exposures of Amazon Web Services (AWS) secrets, Mandiant identified a scenario in which client-specific secrets have been leaked from Atlassian’s code repository tool, Bitbucket, and leveraged by threat actors to gain unauthorized access to AWS. This blog post illustrates how Bitbucket Secured Variables can be leaked in your pipeline and expose you to security breaches.
Bitbucket is a code hosting platform provided by Atlassian and is equipped with a built-in continuous integration and continuous delivery/deployment (CI/CD) service called Bitbucket Pipelines. Bitbucket Pipelines can be used to execute CI/CD use cases like deploying and maintaining resources in AWS. Bitbucket includes an administrative function called “Secured Variables” that allows administrators to store CI/CD secrets, such as AWS keys, directly in Bitbucket for easy reference by code libraries.
CI/CD Secrets: CI/CD Secrets serve as the authentication and authorization backbone within CI/CD pipelines. They provide the credentials required for pipelines to interact with platforms like AWS, ensuring pipelines possess the appropriate permissions for their tasks. Secrets are often extremely powerful and are beloved by attackers because they present an opportunity for direct, unabated access to an environment. Maintaining confidentiality of secrets while balancing ease of use by developers is a constant struggle in securing CI/CD pipelines.
Bitbucket Secured Variables: Bitbucket provides a way to store variables so developers can quickly reference them when writing code. Additionally, Bitbucket offers an option to declare a variable as a “secured variable” for any data that is sensitive. A secured variable is designed such that, once its value is set by an administrator, it can no longer be read in plain text. This structure allows developers to make quick calls to secret variables without exposing their values anywhere in Bitbucket. Unless…
CI/CD pipelines are designed just like the plumbing in your house. Pipes, valves, and regulators all work in unison to provide you with reliable, running water. CI/CD pipelines are a complicated orchestration of events to accomplish a specific task. In order to accomplish this, these pipelines are highly proficient at packaging and deploying large volumes of data completely autonomously. As a developer, this creates countless possibilities for automating work, but, as a security professional, it can be a cause for anxiety and heartburn. Perhaps it’s a line of code with a hardcoded secret sneaking into production. Maybe it’s a developer accidentally storing secrets locally on their machine. Or maybe, as we have seen in recent investigations, it’s a Bitbucket artifact object containing secrets for an AWS environment being published to publicly available locations like S3 Buckets or company websites.
Bitbucket secured variables are a convenient way to store secrets locally in Bitbucket for quick reference by developers; however, they come with one concerning characteristic—they can be exposed in plain text through artifact objects. If a Bitbucket variable—secured or not secured—is copied to an artifact object using the artifacts: command, the result will generate a .txt file with the value of that variable displayed in plain text.
Mandiant has seen instances in which development teams used Bitbucket artifacts in web application source code for troubleshooting purposes, but, unbeknownst to the development teams, those artifacts contained plain text values of secret keys. This resulted in secret keys being exposed to the public internet where they were located and subsequently leveraged by attackers to gain unauthorized access.
Once a secured variable—such as an AWS Key—is copied to a .txt file in plain text, the secret has been leaked, and it’s up to the pipeline as to where that secret flows and how long until an attacker finds it.
The following are steps to recreate the secret leak in a Bitbucket environment. One important note—the commands detailed in this guide illustrate only one possibility, but there are several other methods that export secured variables to artifacts in Bitbucket. Administrators and developers should closely review any references to artifact objects in their bitbucket-pipelines.yml file or any other files in the repository.
This can be done at the repository level or the workspace level as long as they are set to “secured variable.”
The following lines of code execute the command printenv to copy all environment variables from Bitbucket to a .txt file called environment_variables.txt. This is a common practice in development when troubleshooting because developers need to review a wide range of variables for legitimate development purposes. Once the .txt file is created, the code passes it to a Bitbucket artifact object where it can be used by future stages in the pipeline, if necessary.
After exporting the .txt file, secrets can be read in plain text among all the variables in the Bitbucket environment. One note on this step—it is possible you will need to extract components of a .tar file as an additional step here. In this event, extract the .tar file using your data extraction tool of choice.
Once the secrets are printed to the environment_variables.txt file, they are free to flow out of Bitbucket through the pipeline and become exposed. Any combination of development mistakes, malicious intent, or accidental disclosure can lead to secret exposure and misuse by a threat actor.
Bitbucket Pipelines is a great platform for storing, collaborating, and deploying code. Bitbucket, however, is not a dedicated secrets manager, and storing secrets directly in Bitbucket introduces opportunities for secrets to be leaked. Safely protect your secrets when using Bitbucket Pipelines by:
Storing secrets in a dedicated secrets manager and then referencing those variables in the code stored in your Bitbucket repository
Closely reviewing Bitbucket artifact objects to ensure they are not exposing secrets as plain text files
Deploying code scanning throughout the full lifecycle of your pipeline to catch secrets stored in code before they are deployed to production
This is not an indictment against Bitbucket. Instead, it’s a case study in how seemingly innocuous actions can snowball into serious problems. We use the word “leak” for a specific reason. All it takes is one keystroke, one line of code, or one misconfiguration for a slow, seemingly untraceable drip of secrets to flow through your pipeline out into the world.
Read More for the details.
Amazon Lightsail now supports switching between dual-stack and IPv6-only bundles by removing or adding dynamic public IPv4 addresses on instances.
Previously, you would have to select an IPv6-only bundle and start a new instance from scratch to move from dual-stack to IPv6-only bundles. With this feature, you can switch between dual-stack and IPv6-only plans on a running instance using ‘change networking type’ feature instead of recreating your applications on a new Lightsail instance. This feature makes it easier to test if your application is supported on IPv6-only bundles and only use IPv4 addresses when needed.
You can use this feature on the the Lightsail Console (accessed from AWS Console), AWS Command Line Interface (CLI) and AWS SDKs in all AWS Regions supporting Lightsail. To learn more about this migration functionality, please see documentation here.
Read More for the details.
Organizations are increasingly looking to drive outcomes by harnessing real-time analytics. In the current AI era, it is crucial to deliver up-to-date information to AI systems that help make informed decisions, identify trends and anomalies, and implement proactive and effective interventions. To fully realize the benefits of real-time intelligence for visibility, predictions, and activation, you need to implement streaming infrastructure that is easy-to-use, robust, scalable, and cost efficient.
Google invented modern stream data processing when we published the original Dataflow paper describing our Dataflow service. The unique way Dataflow implements concepts such as windowing, triggers, checkpointing, and more, ensures the continued processing of all kinds of data, including late-arriving data. Google has been named a Leader in the Forrester Wave™ Streaming Data Platforms, Q4 2023 report. Principal Analyst at SanjMo and Former Gartner VP, Sanjeev Mohan also recognized how Dataflow is well integrated with many other Google Cloud products to provide a full platform for real-time applications.
Using Google Cloud’s data, AI and real-time solutions, many enterprises are delivering and actioning real-time insights to drive significant business impact:
Spotify leverages Dataflow for large-scale generation of ML podcast previews and plans to keep pushing the boundaries of what’s possible with data engineering and data science to build better experiences for their customers and creators.
Puma increased their average order value by 19% by better understanding how to tailor content to customers and with access to real-time inventory levels up to 4x faster, helping shoppers find the right products at the nearest stores.
Compass works with local governments in Australia to improve the safety of their roads with real-time monitoring across 1.5M+ datasets processed daily from connected vehicles.
Tyson Foods is using Google Cloud for the next generation of smart factories using unstructured data, such as images or videos, to train vision models to monitor real-time IoT-connected sensors and optimize patterns. They rely on BigQuery for secure, repeatable, and scalable enterprise solutions.
Over the years, we’ve extended our streaming capabilities and democratized access to streaming in a number of ways. This includes enhancements to Dataflow providing flexibility over GPU and CPU usage, pipeline enrichment in real-time, new managed IO services and at-least once processing; new capabilities in BigQuery with continuous real-time query processing integrated with AI, and a new Apache Kafka service.
We added new features in Dataflow ML to make the most common machine learning use cases easier, more performant, and more cost effective. Dataflow’s new right fitting allows users to mix-and-match compute types to only use GPUs when necessary, reducing cost. The new Enrichment transform provides real-time ML feature enrichment that gracefully handles spikes and unexpected behavior in a Dataflow pipeline, reducing toil and accelerating your ability to leverage the latest data in your ML models.
The new IcebergIO connector streams data directly into Apache Iceberg data lake tables. IcebergIO is the first of many IOs that will be taking advantage of the new Managed IO feature in Dataflow. Dataflow Managed IO provides additional benefits like automatically updating the connector with newer versions or applying patches without any action required.
Dataflow streaming provides an exactly-once guarantee, meaning that the effects of data processed in the pipeline are reflected exactly once, even for late data. For lower latency and lower cost streaming data ingestion, we introduced the new at-least-once processing in which the input record is processed at least once – which is particularly helpful when the data source already provides those guarantees.
At Next ‘24, we announced the preview of continuous queries in BigQuery. Leveraging the infrastructure and techniques that power Dataflow, users can now directly create stream processing jobs to create real-time change streams based on the latest data coming into BigQuery. In addition, these real-time streams can be operated on with any AI or ML functions, including LLM operations using Vertex AI. Customers can do this with simple SQL, dramatically lowering the barrier for organizations and users to realize the benefits of real-time intelligence and streaming infrastructure.
At Next ‘24, we also extended support for all three major open source data lake formats in BigLake, including Apache Iceberg, Apache Hudi and Delta Lake natively integrated with BigQuery. This includes a fully managed experience for Iceberg, enabling support for streaming across all data types and even across clouds using BigQuery Omni. We also released a new whitepaper, BigQuery’s Evolution toward a Multi-Cloud Lakehouse, which is to be presented at the 2024 SIGMOD event.
Finally, at Next 2024, we announced the forthcoming release of a managed Apache Kafka service called Apache Kafka for BigQuery. This is a full end-to-end managed service for Apache Kafka that will automate operational and security work that comes with running such a service yourself. It is compatible with your existing applications and integrated into BigQuery to facilitate quick and easy loading of your Kafka streaming data into BigQuery via BigQuery’s high performance streaming ingest called the Storage Write API. You can express interest to be notified about the preview.
Refer to the documentation to learn more about Dataflow and BigQuery. If you are new to Dataflow, take the foundational training. We’re very excited to bring you all the latest innovations and can’t wait to see what you build with our real-time analytics solutions.
Read More for the details.
AWS Database Migration Service (AWS DMS) now supports AWS S3 parquet files as a source. Using AWS DMS, you can now migrate data in parquet format from S3 to any supported AWS DMS target provided the S3 Parquet data was generated by DMS. AWS DMS supports both full load and Change Data Capture (CDC) migration modes for S3 Parquet source endpoints using AWS DMS console, AWS CLI, or AWS SDKs in all regions where DMS is available.
Read More for the details.
Amazon QuickSight now supports connectivity to Redshift data source using an IAM role through GetClusterCredentialswithIAM. This is an enhancement to the previously launched feature for Redshift RunasRole which now makes the Database user/Database Group parameters optional thereby implicitly tying the temporary user identity to the IAM credentials. This feature now enables customers to use LakeFormation-Managed Redshift Data Share feature to support Cross Account use case as documented here. Administrators can get started by creating an AWS Identity and Access Management (IAM) role with permissions that will be applied when a QuickSight user or API call runs a query on the data source. The IAM role is then assigned to a Redshift data source. With this role, a QuickSight user or API call has the role’s fine-grained permissions applied when running a query on that data source. This new feature is available in the following QuickSight regions: US East (N. Virginia and Ohio), US West (Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Sydney and Tokyo), Europe (Frankfurt, Stockholm, Paris, Ireland and London), Canada (Central), South America (São Paulo), and the AWS GovCloud (US-West) Region. For more details, see Run queries as an IAM role in Amazon QuickSight.
Read More for the details.
Have you ever wondered if there is a more automated way to copy Artifact Registry or Container Registry Images across different projects and Organizations? In this article we will go over an opinionated process of doing so using serverless components in Google Cloud and its deployment with Infrastructure as Code (IaC).
This article assumes knowledge of coding in Python, basic understanding of running commands in a terminal and the Hashicorp Configuration Language (HCL) i.e. Terraform for IaC.
In this use case we have at least one container image residing in an Artifact Registry Repository that has frequent updates to it, that needs to be propagated to external Artifact Registry Repositories inter-organizationally. Although the images are released to external organizations they should still be private and may not be available for public use.
To clearly articulate how this approach works, let’s first cover the individual components of the architecture and then tie them all together.
As discussed earlier, we have two Artifact Registry (AR) repositories in question; let’s call them “Source AR” (the AR where the image is periodically built and updated, the source of truth) and “Target AR” (AR in a different organization or project where the image needs to be consumed and propagated periodically) for ease going forward. The next component in the architecture is Cloud Pub/Sub; we need an Artifact Registry Pub/Sub topic in the source project that automatically captures updates made to the source AR. When the Artifact Registry API is enabled, Artifact Registry automatically creates this Pub/Sub topic; the topic is called “gcr” and is shared between Artifact Registry and Google Container Registry (if used). Artifact Registry publishes messages for the following changes to the topic:
Image uploads
New tags added to images
Image deletion
Although the topic is created for us, we will need to create a Pub/Sub subscription to consume the messages from the topic. This brings us to the next component of the architecture, Cloud Run. We will create a Cloud Run deployment that will perform the following:
Parse through the Pub/Sub messages
Compare the contents of the message to validate if the change in the Source AR warrants an update to the Target AR
If the validation conditions are met, then the Cloud Run service moves the latest Docker image to the Target AR
Now, let’s dive into how Cloud Run integrates with the Pub/Sub AR topic. For Cloud Run to be able to read the Pub/Sub messages we have two additional components; an EventArc trigger and a Pub/Sub subscription. The EventArc trigger is critical to the workflow as it is what triggers the Cloud Run service.
In addition to the components described above, the below prerequisites need to be met for the entire flow to function correctly.
Cloud SDK needs to be installed on the users’ terminal so that you can run gcloud commands.
The project Service Account (SA) will need “Read” permission on the Source AR.
The Project SA will need “Write” permission on the Target AR.
VPC-SC requirements on the destination organization (if enabled)
Egress Permissions to the target repository from the SA running the job
Ingress permission for the account running the ‘make’ commands (instructions below) and writing to Artifact Registry or Container Registry
Ingress Permissions to read the PUB/SUB GCR Topic of the source repository
Allow [project-name]-sa@[project-name].iam.gserviceaccount.com needs VPC-SC Ingress for the Artifact Registry method
Allow [project-name]-sa@[project-name].iam.gserviceaccount.com needs VPC-SC Ingress for CloudRun method
var.gcp_project
Var.service_account
Below we talk about the Python code, Dockerfile and the Terraform code which is all you need for implementing this yourself. We recommend that you open our Github repository while reading the below section where all the Open Source code for this solution lives. Here’s the link: https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/devops/inter-org-artifacts-release
What we deploy in Cloud Run is a custom Docker container. It comprises of the following files:
App.py: This file contains the variables for the source and target containers as well as the execution code that will be triggered to run based on the Pub/Sub messages and contains the following Python code.
Copy_image.py: this file contains the copy command app.py will leverage in order to run the gcrane command required to copy images from source AR to target AR.
Dockerfile: This file contains the instructions needed to package gcrane and the requirements needed to build the Cloud Run image
Since we have now covered all of the individual components that are associated with this architecture, let’s walk through the flow that ties all the individual components together.
Let’s say your engineering team has built and released a new version of the Docker Image “Image X”, per their release schedule and added the “latest” tag to it. This new version is sitting in the Source AR and when the new version gets created, the AR Pub/Sub topic updates the message that reflects that a new version of the “Image X” has been added to the source AR. This automatically causes the EventArc trigger to poke the Cloud Run service to scrape the messages from the Pub/Sub subscription.
Our Cloud Run service will use the logic written in the App.py image to check if the action that happened in Source AR matches the criteria specified (Image X with tag “latest”). If the action matches and warrants a downstream action, Cloud Run triggers Copy_image.py to execute the gcrane command to copy the image name and tag from the Source AR to the Target AR.
In the event that the image or tag does not match the criteria specified in App.py, (for eg. Image Y tag: latest) the Cloud Run process will give back an HTTP 200 reply with a message “The source AR updates were not made to the [Image X]. No image will be updated.” confirming no action will be taken.
Note: Because the Source AR may contain multiple images and we are only concerned with updating specific images in the Target AR we have integrated output responses within the Cloud Run services that can be viewed in the Google Cloud logs for troubleshooting and diagnosing issues. This also prevents unwanted publishing of images not pertaining to the desired image(s) in question.
Versatility: The Source and Target AR’s were in different Organizations
Compatibility: The Artifacts were not in a Code/Git repository compatible with solutions like Cloud Build.
Security: VPC-SC perimeters limit the tools we can leverage while using cloud native serverless options.
Immutability: We wanted a solution that could be fully deployed with Infrastructure as Code.
Scalability and Portability: We wanted to be able to update multiple Artifact Registries in multiple Organizations simultaneously.
Efficiency and Automation: Avoids a time-based pull method when no resources are being moved. Avoids human interaction to ensure consistency.
Cloud Native: Alleviates the dependency on third-party tools or solutions like a CI/CD pipeline or a repository outside of the Google Cloud environment.
If your Upstream projects where the images are coming from all reside in the same Google Cloud Region or Multi-region, a great alternative to solve the problem is Virtual repositories.
We have provided the Terraform code we used to solve this problem.
The following variables will be used in the code. These variables will need to be replaced or declared within a .tfvars file and assigned a value based on the specific project.
var.gcp_project
Var.service_account
In conclusion, there are multiple ways to bootstrap a process for releasing artifacts across Organizations. Each method would have its pros and cons, the best one for the approach would be determined by evaluating the use case at hand. The things to consider here would be, if the artifacts can reside in a Git repository, if the target repository is in the same Organization or a child Organization and if CI/CD tooling is preferred.
If you have gotten this far it’s likely you may have a good use case for this solution. This pattern can also be used for other similar use cases. Here are a couple examples just to get you started:
Copying other types of artifacts from AR repositories like Kubeflow Pipeline Templates (kfp)
Copying bucket objects behind a VPC-SC between projects or Orgs
Our solution code can be found here: https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/devops/inter-org-artifacts-release
GCrane: https://github.com/google/go-containerregistry/blob/main/cmd/gcrane/README.md
Configuring Pub/Sub GCR notifications: https://cloud.google.com/artifact-registry/docs/configure-notifications
Read More for the details.
The rise of generative AI has brought forth exciting possibilities, but it also has its limitations. Large language models (LLMs), the workhorses of generative AI, often lack access to specific data and real-time information, which can hinder their performance in certain scenarios. Retrieval augmented generation (RAG) is a technique within natural language processing that uses a two-step process to provide more informative and accurate answers. First, relevant documents or data points are retrieved from a larger dataset based on their similarity to the posed question. Then, a generative language model uses this retrieved information to formulate its response. As such, RAG comes in very handy in data analytics, leveraging vector search in datastores like BigQuery to enhance LLM capabilities. Add to that functionality found in BigQuery ML, and you can do all of this from a single platform, without moving any data!
In this blog, we take a deeper look at LLMs and how RAG can help improve their results, as well as the basics of how RAG works. We then walk you through an example of using RAG and vector search inside BigQuery, to show the power of these technologies working together.
LLMs work have a lot of general knowledge and can answering general questions, as they are usually trained on a massive corpus of publicly available data. However, unless the model has been trained on a specific domain or discipline, they have limited domain knowledge. This limits their ability to generate relevant and accurate responses in specialized contexts. In addition, since they are trained on static datasets, they lack real-time information, making them unaware of recent events or updates. This can lead to outdated and inaccurate responses. Furthermore,
in many enterprise use cases, it’s crucial to understand the source of information used by LLMs to generate responses. LLMs often struggle to provide citations for their outputs, making it difficult to verify their accuracy and trustworthiness.
Simply put, RAG equips large language models (LLMs) with external knowledge. Instead of relying solely on their training data, LLMs can access and reference relevant information from external sources like databases or knowledge graphs. This leads to more accurate, reliable, and contextually relevant responses.
RAG shines when working with massive datasets, especially when you can’t fit an entire corpus into a standard language model’s context window due to length limitations. RAG injects additional supporting information into a prompt, leveraging data from multiple types of sources, effectively augmenting LLMs with an information retrieval system. The key to RAG lies in the retrieval query. This query determines how the LLM fetches relevant information from the external knowledge source.
Vector search is a powerful technique for finding similar items within large datasets based on their semantic meaning rather than exact keyword matches. It leverages machine learning models to convert data (like text, images, or audio) into high-dimensional numerical vectors called embeddings. These embeddings capture the semantic essence of the data, allowing BigQuery to efficiently search for similar items by measuring the distance between their vectors. As a result, vector search plays a crucial role in efficiently retrieving relevant information. By converting text and data into vectors, we can find similar pieces of information based on their proximity in the vector space a high-dimensional mathematical representation where each data point (like a piece of text, image, or audio) is encoded as a numerical vector, also known as an embedding. These vectors capture the semantic meaning and relationships between data points, allowing BigQuery to perform similarity searches based on the proximity of vectors within this space. This allows the LLM to quickly identify the most relevant documents or knowledge graph nodes related to the user’s query, leading to more accurate, reliable, and contextually relevant responses.
As mentioned earlier RAG is a technique within natural language processing that uses a two-step process to retrieve relevant information from a knowledge base, followed by a generative model that synthesizes answers based on this information, thereby providing more informative and accurate responses. A RAG system is usually broken down into two major parts, each with two components:
Retrieve and select: Employ semantic search to pinpoint the most relevant portions of your large dataset based on the query. This can be done via:
Vector search:
Your data (documents, articles, etc.) is converted into vectors, mathematical representations capturing their semantic meaning.
These vectors are stored in a specialized database like BigQuery, optimized for vector similarity search.
Querying:
When a user submits a query, it is also converted into a vector.
The vector search engine then compares this query vector to the stored document vectors, identifying the most similar documents based on their semantic closeness.
Augment and answer: Feed the extracted relevant parts to the language model, enhancing its answer generation process.
3. Augmenting the prompt:
The most relevant documents retrieved through vector search are appended to the original user query, creating an augmented prompt.
This augmented prompt, containing both the user query and relevant contextual information, is fed to the LLM.
4. Enhanced response generation:
With access to additional context, the LLM can now generate more accurate, relevant, and informed responses.
Additionally, the retrieved documents can be used to provide citations and source transparency for the generated outputs.
Given the number of moving parts, traditionally RAG systems require a number of frameworks and applications to come together. There are many RAG implementations and most of these stay as POCs or MVPs, given how daunting a task it is to meet all the challenges above. Solutions often end up being too convoluted and challenging to manage, undermining organizations’ confidence in bringing it into production within enterprise systems. You can end up needing an entire team to manage RAG across multiple systems, each of them with different owners, etc.!
BigQuery simplifies RAG in a number of ways: its ability to do vector search, querying the data and the vector, augmenting the prompt and then enhancing response generation — all within the same data platform. You can apply the same access rules across the RAG system, or in different parts of it in a straightforward fashion. You can join the data, whether it is coming from your core enterprise data platform, e.g., BigQuery, or if it is coming from vector search through BigQuery and an LLM. This means security and governance are maintained, and that you don’t lose control of access policies. Furthermore, when complemented with BigQuery ML, you can get to production with less than 50 lines of code.
In short, using RAG with BigQuery vector search and with BigQuery ML offers several benefits:
Improved accuracy and relevance: By providing LLMs with relevant context, RAG significantly improves the accuracy and relevance of generated responses. Users can augment the knowledge within the LLMs with their enterprise data without data leaving the system they are operating in.
Real-time information access: Vector search allows LLMs to access and utilize real-time information, ensuring that responses are up-to-date and reflect the latest knowledge.
Source transparency: The retrieved documents can be used to provide citations and source information, enhancing the trustworthiness and verifiability of LLM outputs.
Scalability and efficiency: BigQuery offers a scalable and efficient platform for storing and searching large volumes of vectorized data, making it ideal for supporting RAG workflows.
Lets showcase the integration of RAG and vector search within BigQuery with a pipeline that extracts common themes from product reviews for Data Beans, a fictional coffee franchise that we have used in previous demos. This process has three main stages:
Generating embeddings: BigQuery ML’s text embedding model generates vector representations for each customer review.
Vector index creation: A vector index is created in the embedding column for efficient retrieval.
RAG pipeline execution, with the runtime inside a BigQuery stored procedure
3.1 Query embedding: The user’s query, such as “cappuccino,” is also converted into a vector representation using the same embedding model.
3.2 Vector search: BigQuery’s vector search functionality identifies the top K reviews that are most similar to the query based on their vector representations.
3.3 Theme extraction with Gemini: The retrieved reviews are then fed into Gemini, which extracts the common themes and presents them to the user.
You can find more about the Data Beans demo here.
In this example below, we have customer reviews as text within the BigQuery table. Customer reviews are free text by definition, but what if we were to try to extract the common themes for each of our menu items, for example common feedback about the cappuccino and the espresso?
This problem requires some data from enterprise data platform, products and combines it with text information coming directly from customer interactions; each of the menu items are stored in a BigQuery table. This problem is a good candidate for RAG within BigQuery, and can be solved in just three steps.
Step 1 – We start by generating the embeddings directly within BigQuery using the ML.GENERATE_TEXT_EMBEDDING function. This lets us embed text that’s stored in BigQuery tables; in other words, we are creating a dense vector representation of a piece of text. For example, if two pieces of text are semantically similar, then their respective embeddings are located near each other in the embedding vector space.
Step 2 – Once we have the embedding table we can generate a vector index with the CREATE OR REPLACE VECTOR INDEX statements in BigQuery.
Step 3 – In the final step, we write the RAG logic and wrap everything into a BigQuery stored procedure. By combining vector search, query embedding and extracting the themes in a single stored procedure, RAG can be implemented in less than 20 lines of SQL. This allows us to pass the terms that we would like to retrieve into BigQuery through a single function call:
Finally we can call our procedure and get the results back:
With this approach, data always stays within a single environment, and teams do not need to manage a number of application frameworks.
This example demonstrates how to perform RAG entirely within BigQuery, leveraging its built-in capabilities for text embedding, vector search, and generative AI. To summarize the technical aspects of the de,o:
Text embedding: BigQuery ML’s ML.GENERATE_TEXT_EMBEDDINGS was used to generate text embedding vectors for customer reviews.
Vector index: We created a vector index on the embedding column for efficient similarity search.
RAG implementation: A stored procedure was created to encapsulate the entire RAG process, including query embedding, vector search, and generation using Gemini.
Takeaways
BigQuery offers a powerful and unified platform for implementing RAG, eliminating the need for complex multi-service architectures.
Vector indexes enable efficient similarity search within BigQuery, facilitating effective retrieval of relevant information.
Stored procedures can streamline and automate complex AI processes within BigQuery.
RAG, combined with vector search and BigQuery, offers a powerful solution for overcoming the limitations of LLMs, allowing them to access domain-specific knowledge, real-time information, and source transparency, and paving the way for more accurate, relevant, and trustworthy generative AI applications. By leveraging this powerful trio, businesses can unlock the full potential of generative AI and develop innovative solutions across various domains.
And while larger context windows in LLMs like Gemini may reduce the need for RAG in some cases, RAG remains essential for handling massive or specialized datasets, and providing up-to-date information. Hybrid approaches combining both may offer the best of both worlds, depending on specific use cases and cost-benefit tradeoffs.
To learn more about BigQuery’s new RAG and vector search features, check out the documentation. Use this tutorial to apply Google’s best-in-class AI models to your data, deploy models and operationalize ML workflows without moving data from BigQuery. You can also watch a demonstration on how to build an end-to-end data analytics and AI application directly from BigQuery while harnessing the potential of advanced models like Gemini.
Googlers Adam Paternostro, Navjot Singh, Skander Larbi and Manoj Gunti contributed to this blog post. Many Googlers contributed to make these features a reality.
Read More for the details.
One of the benefits of Google Kubernetes Engine (GKE) is its use of a fully-integrated network model, which means that the Pod addresses are routable within the VPC network. But as your usage of GKE scales across your organization, you might find it difficult to allocate network space in a single VPC, and rapidly run out of IP addresses. If your organization needs multiple GKE clusters, it becomes a tedious task to carefully plan your network to ensure enough unique IP ranges are reserved for clusters within the VPC network.
You can use the following design to completely reuse the IP space across your GKE clusters. In this blog post, we present an architecture that leverages Private Service Connect to hide the GKE Cluster ranges, but connects the networks together using a multi-nic VM that functions as a network appliance router. This keeps the GKE cluster networks hidden but connected, allowing the reuse of the address space for multiple clusters.
Platform components:
Each cluster is created in an island VPC.
A VM router with multiple NICs is used to enable connectivity between the existing network and the island network.
The sample code uses a single VM as router, but this can be expanded to an managed instance group (MIG) for a highly available architecture.
Connectivity for services:
For inbound connectivity, services deployed on GKE are exposed through Private Service Connect endpoints. This requires reserving static internal IP addresses in the enterprise network and creating DNS entries to enable routing.
For outbound connections from the cluster, all traffic is routed through the network appliance to the enterprise network.
1. Deploy basic infrastructure
You can provision the initial networking and GKE cluster configuration through the following public Terraform code. Follow the readme to deploy a cluster, and adjust cluster’s IP addresses as needed.
2. Expose a service on GKE
Create a new subnet for the PSC Service Attachment. This subnet must be in the same region as the GKE cluster, and be created with –purpose PRIVATE_SERVICE_CONNECT. The code also creates a PSC subnet, you can adjust the CIDR as needed.
Then, deploy a sample workload on GKE:
Now, expose the deployment through an internal passthrough Network Load Balancer:
Create a ServiceAttachment for the load balancer:
3. Connect a PSC endpoint
Retrieve the ServiceAttachment URL:
Using this URL, create a Private Service Connect endpoint in the connected VPC network.
4. Test access
From the connected network, you can access the service through curl:
This design has the following limitations:
It is subject to quotas regarding the number of VPC networks per project. You can view this through the command gcloud compute project-info describe –project PROJECT_ID
Limitations for Private Service Connect apply.
You can create a service attachment in GKE versions 1.21.4-gke.300 and later.
You cannot use the same subnet in multiple service attachment configurations.
At the time of this blog, you must create a GKE service that uses an internal passthrough Network Load Balancer to use PSC.
Due to these limitations, the Google Cloud resources created above (Subnet, internal passthrough Network Load Balancer, Service Attachment, PSC Endpoint) must be deployed for each GKE service that you want to expose.
This design is best suited for enterprises looking to save IP address space for GKE clusters by reusing the same address space for multiple clusters in a single project.
Since this design requires that these Google Cloud resources be deployed for every GKE service, consider using an Ingress Gateway (for example: Istio Service Mesh or Nginx) to consolidate the required entrypoints into GKE.
To learn more on this topic please checkout the links below:
Documentation: GKE Networking Models
Documentation: Hide Pod IP addresses by using Private Service Connect
Documentation: Publish services using Private Service Connect
Read More for the details.
Today, AWS End User Computing announced Amazon WorkSpaces Web is now called Amazon WorkSpaces Secure Browser. With WorkSpaces Secure Browser, users can access private websites and software-as-a-service (SaaS) web applications, interact with online resources, or browse the internet from a disposable container. The service helps reduces the risk of data exfiltration by streaming web content – no HTML, document object model (DOM), or sensitive company data is transmitted to the local machine. Isolating the device, corporate network, and internet from each other helps reduce the surface area for browser born attacks while protecting sensitive company data.
Read More for the details.
AWS Control Tower customers can now submit up to 100 control operations concurrently. These operations can span multiple organizational units, reducing the operational burden from repetitive execution. Enabling multiple controls at scale provides a consistent, standardized configuration across multiple AWS accounts. To monitor the status of the ongoing and queued control operations, customers can either navigate to the new ‘Recent Operations’ page in the AWS Control Tower console or use the new ‘ListControlOperations’ API. AWS Control Tower library today has more than 500 controls that map to different control objectives, frameworks and services. Customers can now choose to enable multiple controls for a specific control objective such as ‘Encrypt data at rest’ in a single control operation to facilitate accelerated development and faster adoption of best practice controls.
Read More for the details.
Today, AWS announces a new filter in AWS Resource Explorer to search for resources that support tags. This allows you to understand which resources can and cannot be tagged in order to better evaluate your tagging coverage in your organization or account. Currently, customers can use tag:none to view resources in their account that do not have tags so they can determine their tagging coverage across all resources. This query may return resources that cannot be tagged. Now, customers can use resourcetype.supports:tags in their search query to only return resources that are taggable.
Read More for the details.