GCP – Telegraph Media Group unlocks insights with a Single Customer View on Google Cloud
In today’s data-driven world, organizations across industries are seeking ways to gain a deeper understanding of their customers. A Single Customer View (SCV) — also known as a 360-degree customer view — has emerged as a powerful concept in data engineering, enabling companies to consolidate and unify customer data from multiple siloed sources. By integrating various data points into a single, comprehensive view, organizations can unlock valuable insights, drive personalized experiences, and make data-informed decisions. In this blog post, we will take a look at how Telegraph Media Group (TMG) built a SCV using Google Cloud and what we learned from our experience.
TMG is the publisher of The Daily Telegraph, The Sunday Telegraph, The Telegraph Magazine, Telegraph.co.uk, and the Telegraph app. We operate as a subscription-based business, offering news content through a combination of traditional print media and digital channels, including a website and various mobile applications. TMG initially operated a free-to-air, advertising-based revenue model, but over time, this model became increasingly challenging. Like many news media publishers, we saw long-term trends, such as a declining print readership, diminishing ad yields for content publishers, and volatility in ad revenue — all of which make revenue projections uncertain and growth unpredictable.
In 2018, we set out a bold vision to become a subscriber-first business, with quality journalism at our heart, to build deeper connections with our subscribers at scale. By embracing a subscription approach, TMG aimed to establish a more predictable revenue stream and enhance its advertising offerings, which yield higher returns. Our goal was to reach one million subscriptions within five years, and we reached our milestone in August 2023.
The SCV platform we have engineered leverages two primary data resources: a customer’s digital behavior across all digital domains and TMG’s subscription data. Additionally, it integrates data from third-party sources, such as partner shopping websites and engagement products like fantasy football or puzzles. These diverse data sources play a vital role in enriching the platform’s understanding of our audience and delivering a comprehensive news experience.
We can conceptualize the entire process of building the SCV in a few stages:
Data collection: The initial stage involves gathering data from various sources and loading it into BigQuery, which serves as a data lake. Data is extracted from a variety of different sources, using multiple methods, such as databases, APIs, or files, and ingested into BigQuery for centralized storage and future processing.
Data transformation: In this stage, the data retrieved from BigQuery is processed and transformed according to defined business rules. The data is cleansed, standardized, and enriched to ensure its quality and consistency. The data is stored in a new dataset within BigQuery in a structured format known as the SCV data model, where it can be easily accessed and analyzed.
Data presentation: Once the data has been transformed and stored in the BigQuery data lake, it can be organized into smaller, specialized datasets, known as data marts. These data marts serve specific user groups or departments and provide them with tailored and pre-aggregated data for consumption by third-party activation tools, such as email marketing systems, alongside reporting and visualization tools that enable internal decision-making processes.
Stage 1: Data collection
All of TMG’s subscription data is stored in Salesforce. We implemented a streamlined process to gather this data and store it in BigQuery.
First, we utilize a pipeline comprising containerized Python applications that run on Apache Airflow (specifically, Cloud Composer) every minute. This pipeline retrieves updated data from the Salesforce API and transfers it to Pub/Sub, a messaging service within Google Cloud.
Second, we created a real-time pipeline with DataFlow that reads data from Pub/Sub and promptly updates various tables in BigQuery, enabling us to gain real-time insights into the data.
We also perform a daily batch ingestion from Salesforce to ensure data integrity. This practice allows us to have a comprehensive and complete view of the data, compensating for any potential data loss that may occur during real-time ingestion.
We employ a similar approach to ingesting data from both Adobe Analytics, which monitors user behavior on TMG websites and apps, and Adobe Campaign, which tracks user behavior on communication channels. Since real-time availability is not essential for these datasets, batch processing is deemed sufficient for their ingestion and processing. Additionally, similar ingestion methods are applied to other data sources to ensure a consistent and unified data pipeline.
Stage 2: Data transformation
We employ the open-source Data Build Tool (DBT) for transforming our data, utilizing the power of BigQuery through SQL. By leveraging DBT, we translate all of our business rules into SQL for efficient data transformation. The DBT pipelines are containerized applications that run on an hourly basis and are orchestrated using Cloud Composer, which is built on Apache Airflow. As a result, the output of these data pipelines is a relational model that resides in BigQuery, delivering streamlined and organized data for further analysis and processing. During data transformation, we employ several important pipelines, including:
Salesforce Snapshot: This pipeline generates a snapshot of the Salesforce data from both real-time and batch tables. The snapshot reflects the latest available data in Salesforce and serves as a valuable source for other pipelines in the transformation process.
Source: This pipeline creates a table to store the source data, including source original customer ID and new customer ID. This information plays a crucial role in identifying customers in the original data source.
Customer: This pipeline creates a table that captures and presents detailed information, providing a comprehensive view of their attributes and characteristics.
Contact: This pipeline creates multiple tables that store various contact details of the customers.
Content Interaction: This pipeline generates a table that captures the digital behavior of customers, including their interactions with different content, enabling deeper analysis of customer engagement and preferences.
Subscription Interaction: This pipeline creates a table that tracks and stores subscription-related events and their associated details, providing insights into customer subscription behavior and patterns.
Campaign Interaction: This pipeline creates a table that stores detailed information about events related to communication behavior within different channels, enabling analysis of customer engagement with campaigns and marketing initiatives.
Stage 3: Data presentation
Similar to the transformation layer, DBT pipelines play a crucial role in transforming data from the SCV data model into different data marts. These data marts serve as valuable resources for further analysis and are also consumed by third-party applications. Presently, the three main consumers of the SCV data are Adobe Campaign, Adobe Experience Platform, and Permutive.
Adobe Campaign utilizes the SCV data to effectively target customers by sending relevant campaigns and personalized offers. By leveraging the comprehensive customer insights derived from this data, Adobe Campaign optimizes customer engagement and facilitates targeted marketing efforts.
Adobe Experience Platform leverages the SCV data to deliver tailored experiences to customers visiting the website. By utilizing the rich customer information available, the Adobe Experience Platform customizes the website experience to cater to individual customer preferences, enhancing customer satisfaction and engagement.
Permutive primarily relies on SCV demographic data to target customers with tailored advertisements on the Telegraph website and application. Permutive creates customer segments and integrates with Google Ad Manager to deliver personalized ads.
Prior to the implementation of the SCV, these consumers depended on various data sources, which often resulted in using data that was several days old. This delay imposed limitations on their ability to target customers multiple times within a day. However, with the integration of the SCV, they now have direct access to near real-time data, allowing them to consume and utilize the data as frequently as every 30 minutes. This significant improvement in data freshness empowers TMG to deliver more timely and relevant experiences to our target audiences.
Challenges of creating a SCV
Building a Single Customer View brings forth various challenges, particularly in constructing a data model that meets current requirements while remaining adaptable for future needs. To address this, we prioritize careful extension of the data model, aiming to incorporate new requirements within the existing framework whenever possible. Additionally, determining the appropriate data to include in the SCV is also critical. While businesses may desire to include all customer data, we recognize the importance of avoiding noise and include only relevant and valuable data to maintain a clean SCV.
Managing customer preferences for communication is another significant issue to resolve. Within the SCV, for example, customer preferences dictate how TMG is authorized to engage with them. However, these preferences can vary across different channels, including third-party platforms, potentially conflicting with the preferences stored in our first-party data. To mitigate this, we establish and implement hierarchical rules to carefully define permissions for each communication channel, ensuring compliance and minimizing legal implications.
Efficiently matching customers from third-party data to TMG’s first-party data is also crucial for unifying customers across multiple sources. To tackle this challenge, we employ a combination of exact and fuzzy matching techniques. We implement fuzzy matching using BigQuery User-Defined Functions (UDFs), which allows us to apply various algorithms. However, processing fuzzy matching on large data volumes can be time-consuming. We are actively exploring different approaches to strike a balance between accuracy and processing time, optimizing the matching process and facilitating more efficient customer data integration.
In conclusion, implementing a SCV on Google Cloud empowers TMG to leverage customer data effectively, helping us to drive growth, enhance customer satisfaction, and stay competitive. By harnessing the rich insights derived from a SCV, companies can make data-informed decisions and deliver personalized experiences that resonate with their customers. Overcoming the challenges inherent in building an SCV enables businesses to unlock the full potential of their data and achieve meaningful outcomes.
Read More for the details.