Mapping and cataloging personal information collected from users is time-consuming. It is error-prone, and relies on hunting down information from multiple departments. For many teams, creating an accurate data flow map will be the hardest part of completing GDPR Article 35's data privacy impact assessment (DPIA) or any privacy impact assessment (PIA).
Even for smaller businesses with limited departments and fewer software offerings, determining how data exists and how it moves can be a challenge. The same goes for adhering to the GDPR's Article 30, where controllers are expected to keep records of data collection and processing.
The easiest way to map personal data in your business, whether it is PI, PII, or any of the variations of user data, is by automating as much of the process as possible. Preparing for GDPR compliance, implementing a privacy information management system (PIMS) like ISO27701, or working toward future privacy regulation can all be made easier by using automated data flow mapping.
Why automation should be at the core of your processes
Asking engineers and department heads to self-assess, and to do so regularly, will undoubtedly lead to an incomplete view of the personal data you store. Even teams that use a privacy by design approach will suffer from human error. Marketing will add a new form or tracking cookie without adding it to the consent list. New features will ship that use an unchecked API. In fact, assessing vendor risk is one of the hardest and most important parts of mapping data flows.
This means you, or the data protection officer (DPO) at your organization, will need regular check-ins with each department and trust that all data is accounted for and documented. You will need to know that all actions related to data usage goes through you before it goes live. That is a big ask, even for teams with the best of intentions.
Automation can assist in mapping the data. Not only the first time a piece of data is introduced into your organization, but consistently throughout its lifecycle. This means the data protection officer on your team can regularly monitor new third party vendors and data processors, changes to the codebase of your apps that introduce new types of personal information, and even receive alerts if anything unexpected—like a shadow API—shows up throughout the organization.
Instead of the DPO relying on teams to report in with data decisions they can receive automated reports, triage incoming notifications, and spend less time chasing individual reporting. This means more time can be spent focusing on meeting regulatory requirements.
How automated data mapping works
Many companies say they map your data, but what they mean is that they manually go through and interview your teams. Or worse, they give you a wizard-like tool, so you can go through and interview your team. This process is time-consuming, costly, and has the same potential for human error that we mentioned earlier.
True automated systems work by scanning codebases or running on part of the infrastructure—like in a gateway. They offer features that can:
- Look for types of data that might be personal information or sensitive information.
- Identify the source of the collection.
- Locate where the data is stored.
- Assess what processing activities are performed, and determine if they match your data policies.
- Determine where personal information moves locally within the org and if it moves outside to a third party.
- Help identify privacy risks and security risks.
Good automated data flow mapping tools (ADFM or ADF) organize this information into a dashboard for processing. The process can be broken down into three parts: Discovery, tagging, and continuous monitoring.
Discovery
An automated data flow mapping tool will move through your applications and identify any personal information it finds. This discovery process will record the type of data, where and in what system it was found, where it goes, and where it is stored. Automated discovery is a great way of finding third party services and integrations that leak data. Once all the information is found, it is placed into an information inventory that can then be classified and mapped.
Tagging and classification
Knowing personal data exists within part of your application isn't enough to be compliant. You need to tag it in a way that allows you to follow its movement through your organization, and even outside. Regulations, like the California Privacy Act (CPRA) amendment to the California Consumer Privacy Act (CCPA), require special treatment for personal information that is considered sensitive. This tagging process adds context to the data. An ADFM can assist in categorizing data, flagging sensitive information, and even identifying where it moves—at a service-level and potentially even a geographical level. The potential to know when information crosses borders is increasingly more important as the GDPR and regulation's like it include clauses that prohibit or limit moving data outside the regulation's borders.
Continuous monitoring
While an automated data flow mapping tool shows immediate value when you first begin using it, the true value comes with its ability to continuously monitor for changes. With the right approach, it allows your team to receive instant notifications when new data is discovered, when existing data is used or transported in an unexpected way, and even generate regular accountability reports.
With enough information, the system can even alert you of increased vendor risk if it detects a third party processor with known vulnerabilities or breaches. This provides real-time insights into how data is moving through your application and where it may be at risk.
With an automated system in place, the DPO or responsible member of your team can then review all data, adjust or add any tags if needed, and ensure it aligns with your company's privacy and data policies.
What automated data flow mapping doesn't do
Automated data mapping tools sound like the perfect solution. They remove the time-consuming parts of data management, but they can't do everything. Don't trust any solution that markets itself as "hands-off". No automated system can perfectly tell you where your data is going, how long it is stored, or even which regions it lives in. AI and machine learning can help improve classification and detection, but they only make informed assumptions. These tools should always include a review process.
The data protection officer overseas their output. They need to tag unique data that wasn't automatically handled, keep track of responsible "data owners" for each change, and handle the administrative tasks of interacting with the regulatory authorities for each data protection law. They can compare your usage of personal information with existing third party sharing and processing contracts, and confirm that your processes match those in your privacy policy and data protection addendum. While an automated tool can give you an estimate of the risk level a vendor poses, only your team can add the context needed to make a business decision.
These tools are one part of your data compliance toolchain. They can help export data flow maps and even assist in generating parts of your privacy or data protection documents, but they are not a replacement for dedicated team-members with expertise in data and privacy compliance.
Beyond manual auditing
Data flow maps are not the complete solution to any compliance plan. That said, they demonstrate a commitment to data security and privacy that less-complete alternatives do not. Regulators look favorably on businesses that provide detailed data maps and processing documentation. One of the more challenging parts of an audit is proving to auditors that you are actually doing what you say you are. These maps, derived from actual code and data, can show auditors that your data practices match your data promises.
Automating the discovery and personal information tagging process makes your governance, risk, and compliance tasks less tedious and frees up your team's time to focus on using user data more responsibly. It also injects privacy by design principles directly into your software development lifecycle. By integrating tools like ADFMs into your processes, you move closer to a proactive approach to data privacy and management rather than a reactive one.