Introduction
In today's data-driven world where security and privacy are paramount, the application platforms built and maintained by IT organizations must adhere to a substantial amount of compliance and governance regulations at all times. The General Data Protection Regulation (GDPR) is one such regulatory standard, designed to protect the privacy and personal data of individuals within the European Union. Even if you’re not intentionally selling software services to the European markets, if you have users in areas of the world protected by GDPR, you are required to meet its standards regarding how you collect, store, and process personal data, with significant penalties for non-compliance. As software engineers and members of the SecDevOps community, it's crucial to leverage robust tools and practices to ensure that our applications adhere to these regulations.
Google Cloud Platform (GCP) offers powerful tools such as Sensitive Data Protection (SDP) that can help enforce GDPR compliance. In this article, we'll demonstrate how to use this tool to enforce specific GDPR requirements. By leveraging SDP, we can continuously monitor our GCP environment and ensure that our data handling practices comply with GDPR standards.
Why GDPR is Important
GDPR is essential for several reasons. It enhances the protection of personal data for EU citizens, giving them more control over how their data is used. It also establishes a single set of data protection rules across the EU, simplifying compliance for international businesses.
To illustrate how GCP's SDP can help enforce GDPR compliance, we'll focus on one of its key requirements: Data Relevancy and Minimization. In a GDPR-compliant system, personal data should be: adequate, relevant, and limited to what is necessary for the purposes for which it is processed.
In this article, we’ll demonstrate how GCP's SDP can provide a robust framework for GDPR compliance. We'll walk through the steps to set up automatic detection of sensitive data via SDP, and show how SDP helps adhere to data minimization principles.
Enforcing Data Minimization
Approach
To enforce the Data Minimization requirement, we will start by creating an SDP job that scans a Cloud Storage bucket for sensitive data. This SDP job identifies and classifies sensitive information, such as personally identifiable information (PII), and publishes its findings to a dashboard accessible to SecDevOps.
This approach ensures that sensitive data is promptly identified and addressed. Additionally, the Cloud Storage bucket can be configured with a lifecycle policy that automatically deletes data older than 30 days. This policy helps ensure that stale data is regularly purged, reducing the risk of unnecessary data retention. These combined tactics create a robust system for continuously monitoring and minimizing sensitive data, ensuring compliance with GDPR requirements.
Storage Bucket Setup
Create the Storage Bucket
Before configuring SDP to enforce our GDPR Data Minimization requirement, we will create a Cloud Storage bucket named hellogdpr-data-dev and configure it to have a 30-day retention policy. Any data uploaded to the bucket that is more than 30 days old will now be automatically deleted by Cloud Storage, keeping your platform in lock step with many of GDPR’s data minimization directives.
Upload Sensitive Data
We will now upload a mock JSON file to our new bucket, called sensitive-data.json. This file contains JSON with a fake US Social Security Number inside of it (900-12-3456). This is a safe SSN to use as it is reserved by the Social Security Administration for precisely these types of testing purposes:
{ "name": "Jerry Jingleheimer", "ssn": "900-12-3456" } |
By creating this Cloud Storage bucket and uploading the clean data file, we have now set up a baseline dataset for the SDP scans that we will configure next.
Sensitive Data Protection
GCP’s Sensitive Data Protection (SDP) service is designed to help organizations identify, classify, and protect data stored within their cloud infrastructure that may be sensitive or even illegal in nature. SDP uses advanced machine learning and pattern matching techniques to automatically detect sensitive information such as personally identifiable information (PII), financial data, and health records across many sources, including Cloud Storage, Big Query, Cloud SQL, Secrets and more.
SDP operates by scanning these configured resources for sensitive information and reporting its findings to a centralized dashboard. It offers several key benefits, including real-time data discovery, best-in-show data classification, and detailed reporting. One of the standout features of SDP is its ability to provide actionable insights and recommendations based on the scans’ findings. This helps organizations take proactive measures to secure their sensitive data, such as implementing data access controls, encrypting data, or redacting sensitive information from datasets. Additionally, SDP integrates seamlessly with other GCP security tools, such as Security Command Center (SCC), to offer a unified view of an organization's security and compliance status.
Enabling SDP
The first step is to enable SDP services for our GCP project. This step is necessary to ensure that the service is available for configuration and use in the subsequent steps of our GDPR compliance enforcement.
In your GCP console, open the navigation menu and look for APIs & Services, and then a Library option underneath that. Search for Sensitive Data Protection API and activate it by clicking Enable:
Configure IAM Permissions
To ensure that SDP can access and manage your GCP resources, you need to grant appropriate IAM roles. In the GCP Console, go to IAM & Admin and then select IAM. Add the following roles to the SDP service account (format: service-<PROJECT_ID>@dlp-api.iam.gserviceaccount.com):
- Security Center Administrator- Logs Writer
- Environment and Storage Object Viewer
By enabling SDP, you have set up the foundational service needed to monitor and protect your GCP environment. This service is now available for configuration, allowing us to integrate it into our GDPR compliance strategy.
Sensitive Data Scanner Configuration (SDP Job)
Now that SDP is enabled, let's set up a job to scan our bucket once a day. Start by going to the SDP dashboard in the GCP console. Under the Discovery table, click the Enable button under the Cloud Storage option:
This will bring you to a page where you can configure your SDP scan:
- For Select a discovery type, select Cloud Storage and then Continue
- For Select a scope, select Scan bucket and then select your Cloud Storage bucket from above. Click Continue
- Leave the Manage Schedules and Select Inspection template default values as-is. This will produce daily scans and use an inspection template that looks for US_SOCIAL_SECURITY_NUMBER matches, amongst many other types of sensitive data. Click Continue to advance through both of these sections
- For Add Actions you may optionally enable Publish to Security Command Center if you are already using that service. Click Continue
- For Set location to store configuration, select Multi-region and then Continue
- Finally, under Review and Create, click the blue Create button
You will now see your configured scan set up and ready to go under SDP’s Discovery >> Scan Configurations tab. Within 24 hours, SDP will automatically kick off and scan your bucket, and you should end up seeing a report similar to this when you next return to SDP’s Dashboard >> Profiles >> Projects tab:
As you can see, it's letting us know that it ran a scan and found something that has been categorized as High Risk! If you click into that entry, you’ll see a screen very similar to this:
Conclusion
By setting up GCP’s Sensitive Data Protection (SDP) service to routinely scan a Cloud Storage bucket and flagging any sensitive data detected, you can address any of GDPR’s Data Relevancy and Minimization requirements. This hopefully highlights how GCP’s built-in smart tooling can help automate the enforcement of critical data protection standards, ensuring your applications remain compliant around the clock.
Pricing of SDP largely depends on the volume of data being scanned, the source of that data, and the frequency with which it is routinely inspected. A Free Tier is available for trial and experimentation that covers (as of the writing of this article) data up to 1GB in size.
By leveraging SDP, you can automate many of GDPR’s reporting, alerting, and enforcement requirements, providing peace of mind that your data handling practices align with regulatory standards.
Author Bios
Zac Harvey is a Senior Software Engineer at Jahnel Group, Inc., a custom software development firm in Schenectady, NY. At Jahnel Group, we're passionate about building amazing software that drives businesses forward. We're not just a company - we're a community of rockstar developers who love what we do. From the moment you walk through our door, you'll feel like part of the family. To learn more about Jahnel Group's services, visit jahnelgroup.com or contact Jon Keller at jkeller@jahnelgroup.com