GCP – Meeting the need for speed with a cloud data warehouse
In our work with Google Cloud customers, we hear great stories of growth, change, and cloud success. We worked closely with Trendyol Group, a fast-growing e-commerce company based in Turkey, on their data warehouse migration project. Trendyol employs about 2,000 people, and the company’s e-commerce site gets about 50 billion page views per year, 5 billion visits per year, and 50 million monthly unique users. For this digital-native company, data has been at the heart of their business.
Trendyol Group was facing unprecedented growth, and the Trendyol data warehouse (DWH) team had been challenged with the performance and scalability of their existing Vertica data warehouse, especially during the holiday shopping season and other busy retail seasons. Performance issues had become critical over the past 18 months, and had business impact. The DWH team realized that not being able to process data and deliver internal reports and dashboards on time was causing lost revenue and inaccurate supplier management. For example, the business couldn’t react quickly when suppliers made bad decisions or sold a product that didn’t actually have inventory.
The capacity limitations of the on-premises data warehouse forced the IT team to constantly tune performance and plan and scale capacity instead of focusing on business insights. Trendyol’s reporting team serves more than 600 users with roughly 2,000 workbooks and 7,000 views in Tableau. Prior to migration, Trendyol stored over 30 TB of data in their Vertica environment. In addition, there were over 300 slowly changing dimensions (SCDs) in the ETL pipelines, requiring the team to update 10% of the data every day, which led to an 11 TB truncate-insert during the ELT process.
The size of the data was weighing down the business. Business users couldn’t meet SLAs for their Monday morning financial reports required by the executives. To meet those busy periods, their IT team had to spend time tuning workloads by killing long-running queries to ensure timely completion of the reports. For example, business users couldn’t run queries that spanned a three-year period for aggregations due to capacity issues; they could only do a one-year time frame. By the time that business users accessed the report, the data was already stale. Then, when Thursdays came and users weren’t running as many queries, the DWH team found themselves with excess capacity.
With the impact of COVID-19, Trendyol needed to be able to react quickly and cut off non-compliant products or suppliers to be able to meet the sudden increase in demand.
The DWH team knew they needed to auto-scale the workloads in a cost-effective way. They extended their Vertica environment for one more year while they started evaluating cloud data warehousing alternatives.
Cloud data warehouse decision criteria
The Trendyol team decided to look into a number of vendors, including Snowflake and Google Cloud. Their decision criteria for a cloud data warehouse included:
-
Instant scalability. Given the variability in their analytical workload, this was a critical need so they could have capacity on demand to run the Monday morning reports.
-
Reduced operational costs. Since the retail business is seasonal, Trendyol needed to keep their costs low, in line with demand.
-
Uptime SLAs. Their analytical platform needed to be highly available to meet business needs, especially in these critical times. BigQuery now offers a 99.99% SLA.
-
Backup and recovery. This is important so the team can look back in time, in case there are errors in processing.
-
Security. This is a key requirement for them, since they need to restrict access to personally identifiable information (PII) and sensitive data depending on roles.
-
Ease of use. It was very important that business users could transition to the new cloud data warehouse platform without a learning curve and could be productive immediately.
Evaluating cloud data warehouses
BigQuery’s comprehensive documentation and simple management interface let the Trendyol team set up BigQuery and fine-tune queries for their proof-of-concept trial. Other data warehouse vendors’ trials required a consultant to optimize and tune the environment. They were able to move the data into BigQuery by themselves and it just worked. They also used BigQuery features like time travel, which met backup and recovery requirements out of the box, and integrated Cloud Identity and Access Management (Cloud IAM) roles that met security requirements easily.
The most important feature in BigQuery for Trendyol was the separation of storage and compute, so that they would not have to pay for compute when not in use. Furthermore, it was easy to scale their workload up and down without the need for startup or shutdown time, which other tools required. The DWH team ran through a comprehensive evaluation of alternative data warehouse tools with a variety of benchmarks to represent their main workloads, including ELT, end-to-end BI, integration with BI, and a number of different OLAP queries. BigQuery was the preferred option for price and performance for each of the workloads.
Here are three example queries—OLAP-type queries with joins involving a billion rows (Regex, 20+ analytical functions).
1. Ad hoc queries representing power users: Join four tables; high cartesian joins: 632m, 162m, 13m, 23k Regex functions, 2x dist. count
2. ELT publish layer with analytical functions: Join five tables; rows: 800m, 13m, 11m, 10m, 360k 20+ analytical functions, first/last_value group by
3. Example of publish layer: Join 13 tables (including subqueries), rows: 274m, 262m, 135m, 13m,10x group by
Testing results from Trendyol
- Concurrent queries: BigQuery was the most cost-effective and faster compared to the alternative solution. BigQuery allowed testing increased slots and sharing resources seamlessly across reservations, which wasn’t possible in the alternative solution.
- DML statements performance: Similar across platforms, including CTAS/updates/inserts, but BigQuery was the most cost-effective.
- End-to-end runtime: With BI run, BigQuery was faster.
- Ingestion times: BigQuery was an order of magnitude faster.
- Data ingestion benchmark: 492 Parquet files with size of 63GB (400 million rows, 50 columns, snappy compressed)
- ELT’s SCD phase: With one of the largest dimensions, creates more than 2.1 million updates and approximately 1 million inserts.
Overall, BigQuery provided the best price for performance, and its predictable flat-rate pricing was key for the decision. In the past, the DWH team had purchased capacity ahead of time, and often thought that it would end up being utilized, but didn’t—creating significant cost and unpredictability. The team would now be able to predict how much capacity they would use at the end of each month. And, the ability to scale up and down in minute intervals with Flex Slots was not available from any other vendor.
Migrating to BigQuery
The Trendyol DWH team separated the migration into three main categories:
-
BI/Tableau migration was done in two weeks. The team changed 50 source tables accessed by 2,000 worksheets and 7,000 weave reports. Since Tableau has a native connection to BigQuery, it was easy to migrate. They used the same table and column names in BigQuery, matching the ones Tableau reports are using, and it just worked. They also avoided using custom SQL in Tableau, eliminating the need to rewrite most of the reports. The team found BigQuery’s ANSI SQL-compliant dialect to be compatible with most of their requirements. Additionally they had some custom SQL with a good amount of regular expressions, which were easily addressed by writing around 10 UDFs.
-
ETL: More than 1,500 ETL jobs are scattered across three tools (Attunity, custom Python scripts, and Kafka Connect). The team has been doing ETL on-prem, and now, in the second phase of the migration, they’ve started migrating ETL to BigQuery.
-
Data: There was 22 TB to start in Vertica that the team moved to BigQuery. They used Attunity for SQL Server and Kafka Connect for cloud-based sources. In addition, custom Python code integrated natively with the BigQuery JDBC driver. Within three months, the team ingested 600 TB into BigQuery, an order of magnitude larger than they had expected.
Currently, the Trendyol team stores 650 TB of data and 300+ SCDs in BigQuery and processes 11 TB of data daily. They mix and match flat-rate, slot reservations, and Flex Slots to get the best pricing at any given time. For example, they can now handle fluctuations in demand by purchasing Flex Slots on an on-demand basis. The data team can now concentrate on creating value rather than spending time on operationalizing the data warehouse.
The relationship between the IT and business teams has been transformed, too. There are now plenty of compliments on speed and scalability. The data team can now produce reports in an hour on Monday morning, meeting their SLAs comfortably. The ODS pipeline previously took two to three hours, depending on the day. Trendyol’s BigQuery migration has helped restore trust between the IT and business teams, enable data-driven decision making, save on costs, and meet customer needs quickly.
Learn more about Trendyol and about BigQuery’s data warehouse migration program.
Read More for the details.