The future of data-loss prevention

By Jim Boehm, Anatoly Brevnov, Lucy Shenton, and Daniel Wallance

This past year was the worst on record for breaches of enterprise data, and the number of attacks is expected to only grow in the future. Some examples illustrate the high stakes: Wawa, a convenience-store chain, suffered a breach that went undetected for nine months, which led to the sale on the dark web of €30 million worth of exfiltrated customer records. Luxembourg’s regulators recently set a new precedent for General Data Protection Regulation (GDPR) penalties, levying a €746 million fine on Amazon for insufficient personal-data handling practices.

In light of these trends, adopting robust data-loss-prevention (DLP) practices, both through in-house capabilities and vendor solutions, has become more critical than ever. Organizations are now looking to rapidly modernize their DLP efforts to address these challenges.

The changing nature of DLP

To keep pace with the heightening complexity of managing cybersecurity, DLP approaches have undergone significant transformation over the past few years. We have seen improvements in these approaches across data inspection, data discovery, exfiltration notification, enforcement, and data management, among others.

Through internal research and expert interviews with a diverse set of experts on data loss, we identified three bleeding-edge trends in DLP that could have the largest impact on the space over the next three to five years. In this post, we will discuss these trends in detail, including the gaps they address and how they can be best harnessed by organizations to further mitigate the risk in managing confidential data.

Three priority areas for DLP

Behavior analysis and contextual heuristics

Leading organizations with access to large data sets and strong capabilities in machine learning have begun using contextual heuristics (for example, log-in time, user behavior, and mouse movements) to identify, flag, and characterize potentially malicious activity. This approach entails collecting data from multiple endpoints, passing it through behavioral-analytics tools to identify anomalous behaviors, and inferring contextual information such as intent, secondary actors, and root causes.

Such contextual information can be integrated into both in-house and vendor DLP solutions as well as policy decision points in identity and access-management tools to determine user access to data based on additional context. It can also inform the process of incident triage and targeted enforcement of access controls. For instance, upon detecting an incident, an organization’s DLP system can automatically revoke access for the suspected actor and inferred secondary actors.

To implement this capability, organizations must have sufficient telemetry to collect data across the technical estate as well as advanced analytics. While many vendors offer nascent versions of these capabilities as part of their DLP tools, only high-tech organizations are seriously exploring this functionality today, mostly using custom-built solutions.

Privacy integration and regulatory compliance

Common data-management capabilities, such as classification, dynamic alerting, and rule-based enforcement, are increasingly being combined with compliance-management solutions to proactively prevent regulatory violations. As one example, data transfers containing personal identifiable information (PII) of EU citizens can be automatically flagged and halted to comply with Schrems II, which lays out requirements for secure data transfer between the European Union and the United States and can carry fines of up to 4 percent of global annual revenues per violation. Leading organizations use both automated and manual tagging of data in compliance with the relevant regulations for a given region. Rules that control data transfer based on data tags can be encoded using existing DLP infrastructure. Automatic enforcement and notification can then be applied based on these rules.

In addition, DLP solutions can be integrated with regulatory-reporting technology to automatically generate audit-ready compliance dashboards and reports, which increase transparency and reduce the burden of compliance. These reports can show where relevant PII is located, what enforcement mechanisms exist, and any violations that have occurred.

To implement this capability, organizations must have largely automated, global compliance solutions that can be integrated with their policy decision points and data-classification tools. For large, global organizations, this requirement is particularly difficult, especially in regions where regulations continue to evolve. As a result, few organizations have yet to achieve fully automated integration, though this milestone is on the horizon.

Audio-data exfiltration

Advances in natural-language-processing voice recognition, paired with AI-based text-to-speech technology, enable organizations to implement DLP to safeguard audio data. Along with using optical character recognition and regular-expression matching to detect keywords and patterns in text documents, organizations can expand the breadth of their scrutiny for sensitive data by analyzing audio (and video) files sent within the network. For certain industries and use cases involving highly sensitive data, these controls can be extended to detect leakage of restricted data in live conversations (for example, in boardrooms or on selected phone lines).

This capability can be implemented through either in-house solutions and open-source APIs or specific vendor tools, but many organizations are hesitant to add such capabilities, since they can be perceived as intrusive by employees and clients. In considering whether to implement live-audio inspection, organizations should strongly consider the potential impact on the company’s culture and move forward only if the risk is sufficiently high.

If an organization does decide to integrate live-audio inspection capabilities, it should clearly communicate to employees that audio-data monitoring is in place and explain exactly why the solution is important. Implementation should also be targeted and risk based, so that employee trust is not compromised by increased monitoring.

Three key steps to improve DLP

To advance DLP practices, organizations should take three key approaches:

  • Identify current capabilities that can be integrated. Advanced DLP requires organizations to pinpoint sensitive data and determine its location. This exercise relies on robust asset management, data classification, monitoring, access control, and compliance. The first step in enhanced DLP is understanding the current ecosystem of these solutions, how they are integrated, and how to improve their performance in tandem. By taking into account the current ecosystem of tech capabilities, organizations can find opportunities to further integrate existing tooling in line with advanced concepts, such as zero trust.1
  • Develop advanced analytics and AI capabilities. As the volume of data and organizational technical complexity continue to grow, advanced DLP is increasingly reliant on the automated identification and management of data. In IBM’s 2021 Cost of a data breach report, respondents with mature security analytics programs had data breaches that cost nearly 33 percent less than those at organizations whose programs were less mature.2 Developing in-house capabilities in advanced analytics and artificial intelligence enables organizations to not only improve their own in-house data-management solutions but also better integrate vendor tools and gain a clearer picture of their data-loss risk, making incidents easier to prevent and contain.
  • Identify high-risk areas to strategically apply advanced DLP practices. To optimally manage cybersecurity within a given budget, organizations should identify where their high-risk information sits, as well as how it moves across the estate, and uncover their greatest risk areas (as measured by the value of governed systems and data). The organization can then strategically prioritize the application of advanced DLP controls to achieve the greatest impact.

Tightening regulations, an ever-evolving cyberthreat landscape, and increasingly complex data management have elevated the need for comprehensive transparency into where important data is located and where it flows. As a result, organizations are advancing their DLP capabilities to protect themselves against breaches and regulatory fines. The future lies in the application of advanced analytics, machine learning, and contextual heuristics and their integration with privacy and reporting solutions. For the highest-risk areas, advanced applications such as audio-data exfiltration tools can be added.

Many organizations already have foundational DLP capabilities, either vendor offerings or an in-house function. The challenge lies in advancing and scaling these capabilities over the immediate horizon to effectively mitigate ever-increasing cyberrisk.

Jim Boehm is a partner in McKinsey’s Washington, DC, office; Anatoly Brevnov is a consultant in the New York office, where Daniel Wallance is an associate partner; and Lucy Shenton is an associate partner in the Berlin office.

1 Zero trust is a cybersecurity model based on the premise that trust is never granted implicitly but must be continually evaluated. Zero-trust architecture is an end-to-end approach to enterprise resource and data security that encompasses identity, credentials, access management, operations, endpoints, hosting environments, and the interconnecting infrastructure.
2 Ponemon Institute and IBM Security, Cost of a data breach report, IBM Corporation, 2021.