GCP – How the new Google Cloud to Neo4j Dataflow template streamlines data movement
Neo4j provides a graph database that offers capabilities for handling complex relationships and traversing vast amounts of interconnected data. Google Cloud complements this with robust infrastructure for hosting and managing data-intensive workloads. Together, Neo4j and Google Cloud have developed a new Dataflow template, Google Cloud to Neo4j (docs, guide), that you can try from the Google Cloud console.
In this blog post, we discuss how the Google Cloud to Neo4j template can help data engineers and data scientists who need to streamline the movement of data from Google Cloud to Neo4j database, to enable enhanced data exploration and analysis with the Neo4j database.
Importing BigQuery and Cloud Storage data to Neo4j
Many customers leverage BigQuery, Google Cloud’s fully managed and serverless data warehouse, and Cloud Storage to centralize and analyze diverse data from various source systems, regardless of formats. This integrated approach simplifies the complex task of managing data from different sources while maintaining stringent security measures. With the ability to store and process data efficiently in one location, organizations can analyze, forecast, and predict trends, yielding valuable insights for informed decision-making. BigQuery is the linchpin for aggregating and analyzing data. Read on to see how the Google Cloud to Neo4j Dataflow template streamlines the movement of data from BigQuery and Cloud Storage to Neo4j’s Aura DB, a fully managed cloud graph database service running on Google Cloud.
Using the Dataflow template
Unlike typical data integration methods like Python-based notebooks and Spark environments, Dataflow simplifies the process entirely, and doesn’t require any coding. It’s also free during idle periods, and leverages Google Cloud’s security framework for enhanced trust and reliability of your data workflows.
Dataflow is a strong solution for orchestrating data movement across diverse systems. As a managed service, Dataflow caters to an extensive array of data processing patterns, enabling customers to easily deploy batch and streaming data processing pipelines. And to simplify data integration, Dataflow offers an array of templates tailored to various source systems.
Fig 1: Architecture Diagram of Dataflow from Google Cloud to Neo4j
With the Google Cloud to Neo4j template, you can opt for the flex or classic template. For this illustration, we employ the flex template, which leverages just two configuration files: the Neo4j connection metadata file, and the Job Description file.
The Neo4j partner GitHub repository provides a wealth of resources that show how to use this template. The repository houses sample configurations, screenshots and all the instructions required to set up the data pipeline. Additionally, there are step-by-step instructions that guide you through the process of transferring data from BigQuery to a Neo4j database.
Once you have these two configuration files (Neo4j connection metadata file and job configuration file), you are ready to use the dataflow template to move data from Google Cloud to Neo4j. Here is the screenshot of the dataflow configuration page:
You can find the detailed documentation on this Dataflow template on the Neo4j documentation portal. Please refer to the following links: Dataflow Flex Template for BigQuery to Neo4j and Dataflow Flex Template for Google Cloud to Neo4j.
Simplify data migration between Google Cloud and Neo4j
The Google Cloud to Neo4j Dataflow template makes it easier to use Neo4j’s graph database with Google Cloud’s data processing suite. To get started, check out the following resources:
Explore Neo4j within the Google Cloud Marketplace.Review the Google Cloud documentation on the Dataflow template.Walkthrough the step-by-step guide for setting up your pipeline and creating Neo4J config files that can be passed into the pipeline.Jump to Cloud Console to create your first job now!
Read More for the details.