GCP – Simplify your streaming pipelines with new Pub/Sub Single Message Transforms
Today, we’re introducing Pub/Sub Single Message Transforms (SMTs) to make it easy to perform simple data transformations right within Pub/Sub itself.
This comes at a time when businesses are increasingly reliant on streaming data to derive real-time insights, understand evolving customer trends, and ultimately make critical decisions that impact their bottom line and strategic direction. In this world, the sheer volume and velocity of streaming data present both opportunities and challenges. Whether you’re generating and analyzing data, ingesting data from another source, or syndicating your data for others to use, you often need to perform transforms on that data to match your use case. For example, if you’re providing data to other teams or customers, you may have the need to redact personally identifiable information (PII) from the messages before sharing data. And if you’re using data you generated or sourced from somewhere else – especially unstructured data – you may need to perform data format conversions or other types of data normalization.
Traditionally, the options for these simple transformations within a message involve either altering the source or destination of the data (which may not be an option) or using an additional component like Dataflow or Cloud Run, which incurs additional latency and operational overhead.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e93b19501c0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Pub/Sub SMTs
An overarching goal of Pub/Sub is to simplify streaming architectures. We already greatly simplified data movement with Import Topics and Export Subscriptions, which removed the need to use additional services for ingesting raw streaming data through Pub/Sub into destinations like BigQuery. Pub/Sub Single Message Transforms (SMTs), designed to be a suite of features making it easy to validate, filter, enrich, and alter individual messages as they move in real time.
The first SMT is available now: JavaScript User-Defined Functions (UDFs), which allows you to perform simple, lightweight modifications to message attributes and/or the data directly within Pub/Sub via snippets of JavaScript code.
Key examples of such modifications include:
-
Simple transforms: Perform common single message transforms such as data format conversion, casting, adding a new composite field.
-
Enhanced filtering: Filter based on message data (not just attributes), and regular expression based filters
-
Data masking and redaction: Safeguard sensitive information by employing masking or redaction techniques on fields containing PII data.
In order to stay true to Pub/Sub’s objective of decoupling publishers and subscribers, UDF transforms can be applied independently to a topic, a subscription, or both based on your needs.
JavaScript UDFs in Pub/Sub provide three key benefits:
-
Flexibility: JavaScript UDFs give you complete control over your transformation logic, catering to a wide variety of use cases, helping deliver a diverse set of transforms.
-
Simplified pipelines: Transformations happen directly within Pub/Sub, eliminating the need to maintain extra services or infrastructure for data transformation.
-
Performance: End-to-end latencies are improved for streaming architectures, as you avoid the need for additional products for lightweight transformations.
Pub/Sub JavaScript UDF Single Message Transforms are easy to use. You can add up to five JavaScript transforms on the topic and/or subscription. If a Topic SMT is configured, Pub/Sub transforms the message with the SMT logic and persists the transformed message. If a subscription SMT is configured, Pub/Sub transforms the message before sending the message to the subscriber. In the case of an Export Subscription, the transformed message gets written to the destination. Please see the Single Message Transform overview for more information.
Getting started with Single Message Transforms
JavaScript UDFs as the first Single Message Transform is generally available starting today for all users. You’ll find the new “Add Transform” option in the Google Cloud console when you create a topic or subscription in your Google Cloud project. You can also use gcloud CLI to start using JavaScript Single Message Transforms today.
We plan to launch additional Single Message Transforms in the coming months such as schema validation/encoding SMT, AI Inference SMT, and many more, so stay tuned for more updates on this front.
Read More for the details.