Realizing more value from data projects

by Dale MacDonald, Margarita Młodziejewska, Aziz Shaikh, and Henning Soller

Traditional wisdom holds that capturing the full value from data starts with identifying the right use cases. There is no question that the appropriate selection of use cases and the right level of ownership on the business side are essential to create real value from a company’s data assets. However, approaches based purely on individual use cases often overlook the importance of building the right technological enablers. Many data projects are scoped and implemented to meet an acute need, but they often neglect to include (or delay the consideration of) features or design elements that are crucial to integrating and operating at scale. As a result, this way of working often adds organizational and financial costs after the initial use case is implemented.

“If we do A, we will get B benefit” is a logical approach in most situations, but it is less so in data projects. In fact, this modular rather than systemic or big-picture focus often leads to missed opportunities and ongoing costs.

By contrast, the benefits of more expansive investments in data projects can be significant. Our experience suggests that the right implementation of investments across the data value chain could reduce execution time for use cases by three to six months and the total cost of data ownership by 10 to 20 percent.

In this post, we focus on the technical enablers necessary to realize value from data (to learn more about the business enablers necessary to capture value from data at scale, see “How to unlock the full value of data? Manage it like a product”). We discuss common ways in which organizations can enhance the value extracted from data, identify typical antipatterns (responses to recurring problems that are often ineffective and possibly counterproductive), and outline several no-regrets actions to consider before scaling data efforts.

Missed opportunities from focusing on specific use cases

The full value of data projects tends to come more from synergies with capabilities within the system and in combination with other data in the system and less from a specific deliverable. Consider a retailer that wants to use data to develop customer profiles. Should it sell those insights to outside organizations or use them to drive additional sales for itself? The value of the data project would be linear if the data were sold as a product, but its value could be exponential if the data were used along with the retailer’s overall data system to generate more sales. Using the data internally would also keep it out of competitors’ hands.

Failing to consider relevant use cases can lead to missed opportunities. For example, we have seen banks develop their target architecture and data landscape without taking actual business use cases into account. As a result, the platform was unusable for the advanced analytics and risk use cases that would have been most helpful to the business side.

Missed opportunities can also come from a lack of automations to reduce manual tasks and enablers to allow for reuse of models and data. These considerations and components are often not prioritized during standard valuation processes, because they are difficult to operate correctly. Nonetheless, they’re critical to an effective data ecosystem. Consider the consequences: Failing to prioritize scalable infrastructure limits an organization’s ability to meet changing needs and requires projects to eventually be refactored. Failing to address maintainability up front tends to create avoidable rework and ongoing maintenance that doesn’t create new value for the organization. Overlooking the automation of data quality and standardization tasks introduces the risk of costly manual errors. Finally, not setting up the right governance and monitoring systems risks data breaches and fines that endanger the health of the project—and of the larger organization.

In other words, failing to prioritize these foundational components limits the value that can be extracted from data projects, slows value extraction, and makes replacement and upgrades to these components more difficult and expensive as time goes on. The resulting systems may be able to achieve individual project goals but are unlikely to create long-term value, at least compared to data systems designed with integration in mind.

Ongoing costs

In addition to missed opportunities, a focus on individual use cases rather than the larger data ecosystem often results in ongoing costs and substantial technical debt. The amount of value that is lost as avoidable ongoing costs is opaque to most organizations. From an ecosystem perspective, avoidable workloads are a by-product of the use-case-focused approach to data projects.

The most visible costs are related to duplicate shareable resources or multiple systems that perform the same type of work. These can be multiple databases that house the same data, repeating manual tests that could have been automated, or remodeling data from a previous use case’s model that was not reusable. This kind of avoidable work can be thought of as solving the same problem for every project in platform-specific ways or having multiple teams solve the same problem in isolation.

The ongoing costs of coordination and upkeep may be even more significant. When data projects are not implemented with integration into the larger ecosystem as part of their design, it is often necessary to deconstruct and decode features from their outputs to access a critical piece of information for a downstream operation. Because these operations are often complex and fragile, scope creep and technical debt are common outcomes. Perversely, the technical debt often becomes an argument against refactoring and improvement efforts further upstream.

In addition to creating ongoing costs, these issues tie up talent and reduce the ROI of technology investments. Organizational capacity that is avoidably dedicated to the day-to-day upkeep of multiple systems is not available to anticipate and meet upcoming demands, which can lead to failures. Indeed, our experience suggests that it is common for 8 percent of an organization’s data workforce to be dedicated to work that does not add value, usually as a result of deferring an expansive organization-level data strategy.

A vicious cycle can develop when organizations fail to meet market demand, causing data projects to underdeliver on value and sometimes even resulting in loss. These challenges are significant, but a shift toward no-regrets moves and a forward-looking orientation can help organizations lay a more solid foundation.

Integration and enablement technologies as solutions

A holistic approach to data projects includes and requires enablement technologies throughout the data flow (exhibit).

In our experience, the enablers that make this possible are no-regrets moves and requirements for next-generation use cases.

No-regrets moves

AI-enabled data quality optimization, compliance by design, and platforms for master data management are all foundational for extracting value from data projects.

  • Next-generation data quality optimization through AI helps improve data quality. Solutions assess data sets across key data quality dimensions and automatically recommend corrections for errors. Users can selectively apply or edit the recommended data modifications.
  • Compliance by design in security, data governance, and privacy set up security controls, collaboration tools, and forensic tools to ensure appropriate transparency. In our experience, more companies are moving toward “by design” approaches and policy as code. In this approach, all rules and regulations are coded as actual rules so they can be implemented within clear guardrails.
  • Platforms for managing master data, customer relationships, and reference data can provide a base for global, secure, scalable, and resilient architecture and support a master data model that’s informed by prioritized use cases.
  • A service-oriented organization and platform can help the organization deliver value to users by facilitating outcomes without owning specific costs and risks. This platform should enable self-service and define clear ownership of data and any relevant automations, consumption models, and performance metrics.

Next-generation use-case requirements

Beyond the foundational elements, the no-regrets moves contain the enablers that are required of next-generation use cases.

  • Unified data pools, or a single repository or mesh, can make data more accessible for the entire organization. The organization would need to select a data platform solution that best fits its needs: a data lake, lakehouse, data mesh, or data fabric.
  • Standardized integration with third-party data providers allows third-party providers’ data to be easily accessed as a consumable service. Companies may incorporate additional public or vendor data to enrich their analytics.
  • Dashboards and data democratization make data easily findable and accessible through a central data portal. This approach also allows users to easily view new data and integrate them into the existing data assets.
  • MLOps toolchains and automation can create a full automation pipeline for data operations and machine learning operations.

To capture the full value of their data, organizations should invest across the full value chain rather than limiting their efforts to isolated use cases. No-regrets moves can lay the groundwork for effective data transformations and sustainable programs.

Dale MacDonald is a senior engineer in McKinsey’s Seattle office, Margarita Młodziejewska is a consultant in the Zürich office, Aziz Shaikh is a partner in the New York office, and Henning Soller is a partner in the Frankfurt office.