GCP – Introducing Bigtable tiered storage: Save more data, longer, for less
Today, we’re pleased to introduce a new Bigtable storage tier for efficient management of massive datasets, now available in preview. This fully managed, cost-effective system automatically moves less frequently accessed data from high-performance SSDs to infrequent access storage, lowering your total cost of ownership. With tiered storage in Bigtable, you can access and modify data across both hot and cold tiers via a single interface. You don’t have to sacrifice data to cost controls, you can afford to keep the full picture of the application and you no longer have to compromise on finding critical historical insights.
Bigtable’s tiered storage architecture
Bigtable, Google Cloud’s key-value and wide-column store is ideal for fast access to structured, semi-structured, or unstructured data, including time-series data from sensors, equipment, and operations in industries such as manufacturing and automotive.
High-volume data streams — including electric vehicle (EV) battery data, factory-floor machine status, and automotive telemetry from software-defined vehicles (SDVs) and in-vehicle infotainment (IVI) systems — are essential for driving business and technical objectives. These objectives range from driver personalization and optimized equipment maintenance schedules to logistics optimization and predictive maintenance. However, efficiently storing such vast quantities of data can become costly, particularly when it’s not frequently accessed.
Introducing Bigtable tiered storage
Bigtable’s new tiered storage feature can help you manage your storage costs while meeting regulatory data storage requirements. It automatically moves older, infrequently used data to a less expensive storage tier — where it remains available when needed — without impacting access to your more recent, frequently used data.
Bigtable’s new “infrequent access” storage tier works alongside your existing SSD storage, allowing you to store both frequently and infrequently used data in the same table and manage it all in one place. This feature works with Bigtable’s autoscaling to optimize your Bigtable instance resource utilization. Moreover, data in the infrequent access storage tier is still accessible alongside existing SSD storage through the same Bigtable API.
Key benefits of Bigtable tiered storage
-
Unified management: Manage data in a single Bigtable instance without manually exporting infrequently accessed data to archival storage. With Bigtable tiered storage, you can reduce operational overhead and avoid manual data organization and migration.
-
Automatic tiering: Set an age-based tiering policy, and Bigtable automatically moves data between SSD and infrequent access tiers. Retain data for longer to meet regulatory compliance requirements while retaining data access.
-
Cost optimization: Move and store historical data to infrequent access to lower storage costs. Infrequent access storage is up to 85% less expensive than SSD storage. This can significantly reduce overall storage expenses, as well as the operational overhead of manual data migrations.
-
Increased storage capacity: Infrequent access storage increases the total storage space of your Bigtable node. This lets you store more data per node than you can with the standard Bigtable SSD node. A Bigtable node with tiered storage has 540% more capacity than a regular SSD node.
-
Data accessibility for analytics and reporting: Use Bigtable SQL to query infrequently used data. You can then build Bigtable logical views to present this data in a format that can be queried when needed. This feature is useful for giving specific users access to historical data for reports, without giving them complete access to the table.
Operational time-series data: an example
Bigtable is well-suited for time-series data such as sensor readings or vehicle telemetry, and this data’s variety, speed, and volume makes it suitable for Bigtable tiered storage. This data pattern includes:
-
Varying schema: Systems often have multiple data sources with different structures. Bigtable’s flexible structure is helpful for managing these different sources.
-
Time-based access patterns: The most recent data is often required for real-time operations and dashboards, while historical data is valuable for analysis and long-term trends.
-
Archival needs: Data needs to be stored for long periods for compliance or analysis.
Consider a manufacturing plant that uses Bigtable for sensor data:
-
The challenge: The plant collects data from sensors every second. This information is important, but storing everything on an SSD device is expensive.
-
The solution: The plant uses Bigtable tiered storage with an age-based rule:
-
Last 30 days: Data is stored on SSD for quick access.
-
30 days to 1 year: Data is moved to the infrequent access storage tier for analysis.
-
Older than 1 year: Data is deleted due to the garbage collection policy on the table. This period is fully configurable and can be extended, for example, to six years.
Note: You can access your infrequent access storage tier through the same Bigtable API that you use to access SSD storage.
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud beta bigtable instances tables update MyExistingSensorDataTable \rn –instance=MyManufacturingInstance \rn –tiered-storage-infrequent-access-older-than=30d’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f77a124feb0>)])]>
-
Monitor performance: Use Bigtable’s monitoring tools to track storage use, speed, and data flow for both SSD and infrequent access tiers.
-
Adjust policy: Change the tiering policy based on your needs.
-
Structure the relevant sensor data as a logical view: Use SQL on the infrequent access storage, providing a relational data model to the historical sensor information.
-
The results:
-
Simplified operations by managing all data in one Bigtable instance
-
Historical data is stored for compliance
-
Reduced storage costs
Example cost savings with a 500TB NoSQL database using Bigtable tiered storage.
Best practices when using tiered storage
- Write your data with timestamps: Include accurate timestamps in your data to enable age-based tiering.
- Read your data using timestamp range filters: Use timestamp range filters to ensure your reads go to the correct storage tier. For SSD-only reads, timestamp range filters are required to maintain SSD performance.
- Monitor performance: Check performance metrics to find bottlenecks and adjust your tiering policy.
- Use autoscaling: Use autoscaling to change resources automatically based on your needs.
Get started today
Bigtable tiered storage helps manage costs and simplifies data management, especially for time-series data. It lets you keep important data accessible while managing the expenses of storing large historical datasets. This is helpful for businesses using large amounts of time-series data, such as those in manufacturing, automotive, and IoT. To learn more and get started, enable Bigtable tiered storage for your table.
Read More for the details.
