GCP – Announcing BigQuery Encrypt and Decrypt function compatibility with Sensitive Data Protection
Organizations collect vast amounts of data to create innovative solutions, perform ground breaking research, or optimize their designs. With this comes the responsibility to ensure data is adequately protected to meet regulatory, compliance, contractual or internal security requirements.
For organizations that want to move their data warehouses from on-premises to cloud-first systems, such as BigQuery, protecting sensitive data from unauthorized access or accidental exposure is crucial. Using encryption-based tokenization is a vital tool to create an additional layer of defense and fine-grained data control.
In addition to storage-level encryption, whether using Google-managed or customer-managed keys, BigQuery now has seamless integration with Sensitive Data Protection supporting native SQL functions that allow interoperable deterministic encryption and decryption between BigQuery and Sensitive Data Protection.
In short, this makes it easier to protect sensitive data across a variety of scenarios:
Protect sensitive data in BigQuery: Securely protect data containing personally identifiable information (PII), healthcare records, or financial data at query time while maintaining compliance with regulations.
Share sensitive data securely: Collaborate with external parties (partners or consumers) while keeping sensitive information protected by sharing encrypted data (encrypted externally with Sensitive Data Protection), providing decryption keys separately and decrypting with function in BigQuery.
Compatible tokenization anywhere you need it: Whether you create tokens with the Sensitive Data Protection APIs for workloads outside of BigQuery or create tokens natively in BigQuery, you can join, aggregate, and keep referential integrity where you need it.
Improved performance: Enhanced performance for example, like functions, leverages BigQuery distributed architecture to execute Sensitive Data Protection token-based encryption and decryption tasks natively in parallel across multiple BigQuery nodes to significantly accelerate the operations.
Using Sensitive Data Protection functions in BigQuery
Here are the steps to get you started:
1. Identify your sensitive data: Use the Sensitive Data Protection discovery service to pinpoint BigQuery tables and columns containing confidential information.
2. Generate your encryption keys: generate your data keys and use Cloud KMS to protect your data keys.
3. Apply encryption: Use DLP_DETERMINISTIC_ENCRYPT to encrypt the identified data fields.
<ListValue: [StructValue([(‘code’, ‘CREATE OR REPLACE FUNCTIONrn`project_name.dataset.sdp_encrypt`(column_to_encrypt STRING,rn column_ad STRING) AS ((rn SELECTrn DLP_DETERMINISTIC_ENCRYPT(DLP_KEY_CHAIN(“gcp-kms://projects/<project>/locations/<location>/keyRings/<keyring name>/cryptoKeys/<key-name>”,rnb’\012\044\000\066\206\201\007\….’),rn column_to_encrypt,rn column_ad )));’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5d4c060d30>)])]>
<ListValue: [StructValue([(‘code’, ‘CREATE TABLE `project_name.dataset.sdp_demodata_enc` ASrnSELECTrnID,rndemo_dataset.sdp_encrypt(Card_Number,”CC”) AS Encrypted_CC_Num,rndemo_dataset.sdp_encrypt(SSN,”SSN”) AS SSN_encrypted,rnFROM `project_name.dataset.sdp_demodata`;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5d4c060880>)])]>
4. Store and process securely: Continue working with your encrypted data within BigQuery, safe in the knowledge that it’s protected.
Query from encrypted table
Run aggregates on encrypted columns:
<ListValue: [StructValue([(‘code’, ‘SELECT SSN_encrypted, count(Encrypted_CC_Num) as Card_count FROM `project_name.dataset.sdp_demodata_enc` GROUP BY SSN_encrypted;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5d4c060b20>)])]>
5. Decrypt when needed: Use DLP_DETERMINISTIC_DECRYPT to access the original data only when necessary at query time and for authorized users.
<ListValue: [StructValue([(‘code’, ‘CREATE OR REPLACE FUNCTIONrn`project_name.dataset.sdp_decrypt`(column_to_decrypt STRING,rn column_ad STRING) AS ((rnSELECT DLP_DETERMINISTIC_DECRYPT(DLP_KEY_CHAIN(“gcp-kms://projects/<project>/locations/<location>/keyRings/<keyring name>/cryptoKeys/<key-name>”,rnb’\012\044\000\066\206\201\007\….’),rn column_to_decrypt,rn column_ad )));’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5d4c060a90>)])]>
Query from decrypted records:
<ListValue: [StructValue([(‘code’, ‘SELECTrnID, rndataset.sdp_decrypt(Encrypted_CC_Num,”CC”) AS Card_Number_decrypted,rndataset.sdp_decrypt(SSN_encrypted,”SSN”) AS SSN_decrypted rnFROM `project_name.dataset.sdp_demodata_enc`;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5d4c0608e0>)])]>
Next steps
Sensitive Data Protection and BigQuery data security functions are powerful tools for protecting sensitive data in the cloud. By understanding how they function, and how their capabilities can be best used, you can enhance your data security posture, reduce the risk of data breaches, and help with the confidentiality of sensitive information while safeguarding your privacy.
Ready to dive deeper? Check out the Sensitive Data Protection: DLP-compatible encrypt function documentation for detailed instructions.
Read More for the details.