Back to In the Balance

Driving data enablement through data regulation

As the global data-protection landscape matures, an increasingly robust set of data-privacy laws backed by active regulators drives governments and private-sector groups alike to invest significantly in data protection or face financial penalties, business disruption, and public censure.

Over more than two decades, much of the world witnessed a gradual migration of personal and professional life from in person to digital interactions. Then, less than two years ago, digitization came nearly overnight for many people in the few remaining areas of daily life where nondigital interactions had persisted: education and medicine, to name two. That shift is reflected in a corresponding increase in the amount of data the world collectively produced. There has been a more than ten-fold increase in data in the last decade and more than a 50 percent increase in 2020 alone. This brings the current data estimate to a massive 64 zettabytes; a number that continues to grow each day.

The world’s data is being analyzed and monetized, raising concerns and driving legislative and regulatory activity. Yet despite a proliferation of new data-protection laws1 and recent headline-grabbing fines, there are prospects for data users and regulators to collaborate to do more with this data while also potentially reducing risk to those who generate that data. And that opportunity sits just on the other side of data-protection laws.

Existing data-protection laws are important, especially because there are some uses of data that must be tied directly to a particular data subject. For example, doctors need to be able to record patients’ lab results in electronic health records so they can review them later to see what may have changed. It’s the same as a report card from a child’s school, or the statement from an individual’s local credit union. This data is personal to the individual and, because of what it may reveal about individuals and their lives, it is sensitive. Thus, the need for laws—like those that are already in place over a majority of the globe—to mandate data use in an appropriate and agreed-upon fashion.

The unregulated data space

In addition to understanding what data and activities fall within the scope of existing data-protection laws, it is also important to understand and cultivate the space that sits just on the other side of these laws: the unregulated data space2.

Not all data is considered valuable just because it can be attached to an identified or identifiable person. Of that vast, growing nebula of data the world produced last year, only a small fraction is useful specifically because it is personal to a specific, identifiable individual. A far larger portion retains significant utility even if it’s transformed so that it no longer points to any individual. This data can provide important understanding about groups of people and about society more generally.

The healthcare researchers training a machine-learning algorithm to better diagnose heart disease are not interested in the electronic health record of any single individual. They are likely more interested in the millions of nameless, unidentifiable patient records upon which the algorithm will be trained. Each anonymous patient’s contribution can ultimately help benefit every patient, for example, through more rapid and accurate diagnoses, even if no patient’s identity is ever revealed to the researchers.

However, continued advances in statistics and technology have proven, time and again, how elusive true anonymization actually is when it comes to personal data. Thus, data-protection authorities have historically set a very high standard that must be met in order to establish that the data originating from individuals has been sufficiently derisked to fall outside of the scope of the data-protection laws that protect those individuals3.

With the safe harbor of anonymization challenging to achieve in most instances4, and the intermediate status of pseudonymization (under GDPR, LGPD, and other similar data-guidance regimes) still presenting significant privacy-compliance burdens and risks, there have been significant and promising investment in other privacy-enhancing technologies (PETs). PETs include straightforward data aggregation as well as more complex technical approaches (e.g., differential privacy, multiparty computation, and synthetic data) and structural proposals (e.g., data trusts). While these techniques all have their place, they are not appropriate to many data-use cases (aggregation), may decrease the utility of data for certain analyses (synthetic data), and may be complicated to implement in practice (multiparty computation, data trusts)5. As a result, data protection laws at present impede data sharing and analysis across certain research, healthcare, and other use cases.

The benefits of achievable anonymization

A more widely achievable anonymization standard, paired with explicit safeguards against data reidentification, could create a powerful incentive for current users of personal data to pursue a data-derisking path. For instance, personal data that has been stripped of unique, direct identifiers—such that the party in possession of the data does not have any reasonable means of reidentifying the data6—could then be confirmed as anonymous data. This classification could be paired with one, a binding and enforceable promise by the data user to neither seek reidentification nor share the anonymized data with other data users and two, a continued commitment from data-protection authorities to take action against unauthorized reidentification. Finally, in higher risk situations—such as where sensitive personal data is being considered for anonymization—an impact assessment could be required to evaluate the sensitivity of the data and, in turn, the likelihood that applying a more achievable anonymization standard would place the underlying data subjects at meaningful risk. Collectively, these steps set out a framework that could reposition how anonymization and data-protection laws intersect, allowing for a more achievable anonymization standard while pairing it with clearer requirements and accountability regarding the use of anonymized data.

What could be the result of adjusting anonymization in this way? Enabling data users to achieve anonymization via steps more fully in their control—versus today’s standard that places a greater emphasis on the capabilities of outside parties—might encourage current users of personal data (e.g., companies, governments, academics) to scrutinize anew whether they could obtain all or nearly all the same value they currently receive from personal data from anonymized data. For some, this could result in increased data utility at less cost and a lower risk of noncompliance.

That, in turn, could decrease the global store of personal data, as data users continue to trade out personal data in return for anonymized data. And decreasing personal data could benefit data subjects by potentially making it easier to identify and manage the fewer authorized repositories and uses of their personal data, It could also reduce the impact of many data-mishandling incidents, since the inadvertent exposure or intentional exfiltration of anonymized data would in many cases pose less risk to an individual than the loss of the same data without such anonymization.

In many ways, the current iteration of data-protection regimes has been deemed a success. But as the amount of data—personal and otherwise—continues to expand, policymakers and regulators may benefit in considering whether lowering the high walls around personal-data anonymization might decrease the amount of unnecessary personal data that persists in the world while raising the utility of the derisked data replacing it. And in so doing, they could use the tools of data regulation to drive data enablement for the benefit of all.

1 From 2014 through 2021, the number of countries covered by the DLA Piper data protection handbook has increased by nearly two times, from 63 to 121.</p>

2 Not to be confused with the not-yet-regulated data space, including uses of personal data that fall outside of current laws and/or active enforcement in some jurisdictions but are clear candidates for future legislative expansion and regulatory attention.

3 For example, Recital 26 of the GDPR defines pseudonymization and anonymization, but the interpretation of anonymization in practice has emerged over time, including in the form of regulatory guidance, for instance the still oft-cited 2014 Article 29 Data Protection Working Party opinion on anonymisation techniques. Uncertainty about how to achieve anonymization continues today.</p

4 Some laws do provide clear, achievable standards for derisking, e.g., de-identification of Protected Health Information under the Health Insurance Portability and Accountability Act (HIPAA) in the United States. However, HIPAA is notable in part because of its objective, data-focused understanding of when data has been sufficiently derisked as to fall outside of the scope of the regulation. This approach is in the global minority, and can be compared to other regulations, such as GDPR, which take a more subjective, attacker-focused approach to the concept (see Recital 26).

5 Even more advanced privacy-enhancing technologies are not unambiguously accepted as achieving the anonymization bar under key data-protection regimes (e.g., whether multiparty computation establishes anonymity under GDPR).

6 This understanding of anonymization is consistent with the “Draft anonymisation, pseudonymization and privacy enhancing technologies guidance” published by the UK’s data protection authority, the ICO, earlier this year.