GCP – Data warehouse migration tips: preparation and discovery
Data warehouses are at the heart of an organization’s decision making process, which is why many businesses are moving away from the siloed approach of traditional data warehouses to a modern data warehouse that provides advanced capabilities to meet changing requirements. At Google Cloud, we often work with customers on data warehouse migration projects, including helping HSBC migrate to BigQuery, reducing more than 600 reports and several related applications and data pipelines. We’ve even assembled a migration framework that highlights how to prepare for each phase of migration to reduce risk and define a clear business case up front to get support from internal stakeholders.
While we offer a data management maturity model, we still receive questions, specifically around how to prepare for migration. In this post, we’ll explore a few important questions that come up during the initial preparation and discovery phases, including the impact of modernizing a data warehouse in real life and how you can better prepare for and plan your migration to a modern data warehouse.
Tackling the preparation phase
An enterprise data warehouse has many stakeholders with a wide range of use cases, so it’s important to identify and involve the key stakeholders early in the process to make sure they’re aligned with the strategic goals. They can also help identify gaps and provide insight on potential use cases and requirements, which can help prioritize the highest impact use cases and identify associated risks. These decisions can then be approved and aligned with business metrics, which usually revolve around three main components:
People. To make sure you’re getting input and buy-in for your migration, start with aligning leadership and business owners. Then, explore the skills of the project team and end users. You might identify and interview each functional group within the team by conducting workshops, hackathons, and brainstorming sessions. Remember while discussing issues to consider how to secure owner sign-off by setting success criteria and KPIs, such as:
-
Time saved
-
Time to create new reports
-
Reporting usage increase
-
Talent acquired through innovation
Technology. By understanding the current technical landscape and classifying existing solutions to identify independent workloads, you can more easily separate upstream and downstream applications to further drill down into their dependency on specific use cases. For example, you can cluster and isolate different ETL applications/pipelines based on different use cases or source-systems being migrated to reduce the scope as well as underlying risks. Similarly, you can couple them with upstream applications and make a migration plan which moves dependent applications and related data pipelines together.
In addition to understanding current migration technologies, it’s key that you are clear on what you are migrating. This includes identifying appropriate data sources with an understanding of your data velocity, data regionality, and licensing, as well as identifying business intelligence (BI) systems with current reporting requirements and desired modernizations during the migration. For example, you might want to move that daily report about sales to a real-time dashboard. You might also want to decide if any upstream or downstream applications should be replaced by a cloud-native application and could be driven by KPIs below:
-
TCO of new solution vs. functionality gains
-
Performance improvements and scalability
-
Lower manageability
-
Risk of lock-in vs. using open source
Process. By discussing your process options, you can uncover dependencies between existing components and data access and governance requirements, as well as the ability to split migration components. For example, you should evaluate license expiration dependencies before defining any migration deadlines. Processes should be established to make effective decisions during migration and ensure optimal progress inline, using KPIs such as:
-
Risk of data leakage and misuse
-
Revenue growth per channel
-
New services launched vs. cost of launching them
-
Adoption of ML-driven analytics
A strong understanding of the processes you intend to put in place can open up new opportunities for growth. For example, a well-known ecommerce retailer wanted to drive product and services personalization. Their existing data warehouse environment did not provide predictive analytics capabilities and required investments in new technology. BigQuery ML allowed them to be agile and apply predictive analytics, unlocking increased lifetime value, optimized marketing investment, improved customer satisfaction, and increased market share.
Entering the discovery phase
The discovery process is mainly concerned with two areas: business requirements and technical information.
1. Understanding business requirements
The discovery process of a data warehouse migration starts with understanding business requirements and usually has a number of business drivers. Replacing legacy systems has implications in many fronts, ranging from new team skill set requirements to managing ongoing license and operational costs. For example, upgrading your current system might require all of your company’s data analysts to be re-trained, as well as new additional licenses to be purchased. Quantifying these requirements, and associating them with costs, will allow you to make a pragmatic, fair assessment of the migration process.
On the other hand, proposing and validating potential improvement gains by identifying gaps in the current solution will add value. This can be done by defining an approach to enhance and augment the existing tools with new solutions. For example, for a retailer, the ability to deliver new real-time reporting will increase revenue, since it provides significant improvements in forecasting and reduced shelf-outs.
This retailer realized that shelf-outs were costing them millions in lost sales. They wanted to find an effective solution to predict inventory needs accurately. Their legacy data warehouse environment had reached its performance peak, so they wanted a cloud offering like BigQuery to help them analyze massive data workloads quickly. As a result of migrating, they were able tostream terabytes of data in real time and quickly optimize shelf availability to save on costs and get other benefits like:
-
Incremental revenue increase with reduced shelf-outs
-
2x accuracy vs. previous predictive model
Business challenges that were previously perceived as too difficult to solve can be identified as new opportunities by re-examining them using new technologies. For example, the ability to store and process more granular data can aid organizations in creating more targeted solutions. A retailer may look into seasonality and gauge customer behavior if Christmas Day falls on a Monday versus another day of the week. This can only be achieved with the ability to store and analyze increased amounts of data spanning across many years.
Last but not least: Educating your users is key to any technology modernization project. In addition to learning paths defined above this can be done by defining eLearning plans for self study. In addition, staff should have time to be hands-on and start using the new system to learn by doing. You can also identify external specialized partners and internal champions early on to help bridge that gap.
2. Technical information gathering
In order to identify the execution strategy, you’ll want to answer the following question: Will your migration process focus on a solution layer or an end-to-end lift-and-shift approach? Going through some of the points below can make this decision simpler:
-
Identify data sources for up and downstream applications
-
Identify datasets, tables and schemas relevant for use cases
-
Outline ETL/ELT tools and frameworks
-
Define data quality and data governance solutions
-
Identify Identity and Access Management (IAM) solutions
-
Outline BI and reporting tools
Further, it is important to identify some of the functional requirements before making a decision around buy or build. Are there any out-of-the-box solutions available in the market that meet the requirements, or will you need a custom-built solution to meet the challenges you’ve identified? Make sure you know whether this project is core to your business, and would add value, before deciding on the approach.
Once you’ve concluded the preparation and discovery phase, you’ll have some solid guidance on which components you’ll be replacing or refactoring with a move to a cloud data warehouse.
Visit our website to learn more about BigQuery.
Thanks to Ksenia Nekrasova for contributions to this post.
Read More for the details.