Scaling AI for success: Four technical enablers for sustained impact

by Gerry Aue, Pepe Cafferata, Roman Drapeko, Margaux Penwarden, and Vaibhav Sinha

AI has emerged as a transformative force, revolutionizing industries and redefining the way organizations operate. The average number of AI capabilities in an organization has doubled in the past five years. The rapid growth of AI and machine learning (ML) is a testament to their immense potential to drive innovation, boost efficiency, and unlock new revenue streams for businesses. Organizational leaders often expect that with every subsequent AI use case or application, they will achieve faster speed to market and lower costs.

However, many organizations find themselves struggling to maintain this momentum as they scale their AI efforts. They often face significant challenges that lead to longer delivery timelines. Many organizations can successfully conduct experiments and create proof-of-concept models but have trouble transitioning to production-ready models. Even when an organization successfully gets an ML model up and running, the model’s performance will eventually degrade, or it risks becoming obsolete because of changes in underlying data or business requirements. Scaling AI poses its own risks, such as productivity erosion, in addition to an organization ensuring that it maintains high standards on security, regulatory, and ethical compliance. As AI projects scale, data teams may also struggle to maintain productivity due to increasing complexity, inefficient collaboration, and lack of standardized processes and tools.

Through our research, we have identified four technical enablers that organizations should use to scale AI successfully: incorporating data products such as feature stores, using code assets, implementing standards and protocols, and harnessing the technology capabilities of machine learning operations (MLOps). These four enablers can address prevailing challenges related to AI, as multiple organizations at varied levels of data and AI maturity have demonstrated through their successful implementations.

Data products: The power of feature stores

Data products such as feature stores reduce time to market and add trust to AI and machine-learning use cases.
Data products such as feature stores reduce time to market and add trust to AI and machine-learning use cases.

When developing new ML models, data teams frequently confront challenges with the quality of data, its availability for development, and maintaining and monitoring models. Data may contain errors, missing values, bias, or outliers that affect the speed of model development and the quality of model performance. In addition, data from diverse sources may be difficult to access and integrate because of inefficient data governance. These challenges ultimately increase development and maintenance costs for organizations and affect their ability to generate meaningful, actionable insights from their data.

Organizations can overcome these challenges by offering a comprehensive range of data products. Specifically, the feature store has emerged as an accelerant for AI. As a centralized marketplace for storing, managing, and sharing features—the signals that ML models consider—feature stores optimize the process of feature engineering and create consistency across different projects. By providing a unified platform where data scientists can collaborate and reuse features, feature stores help eliminate duplication of effort and accelerate the development and deployment of ML models.

As feature stores are continually enriched with new use cases, teams can leverage these additions for their immediate and future use cases. For example, data scientists working on a churn model create customer features and add them to the feature store. Later, another team building a risk model can use the same features as a foundation for new deliverables. Without a feature store, a data scientist would have to spend time building these features from scratch. Instead, each new use case can reuse and repurpose previously built features, reducing development time and producing a more consistent output.

Feature stores not only accelerate the development of individual use cases but also help maintain version control of features and track data lineage, significantly enhancing the overall quality and governance of the AI/ML pipeline. The combination of streamlined feature management and improved governance leads to more accurate, effective, and trustworthy ML models. Furthermore, feature stores can facilitate the scaling of AI/ML projects across the organization. By using the cumulative knowledge and expertise contained within the feature store, businesses can rapidly develop and deploy new ML models, fostering innovation and driving growth in a fast-paced, competitive environment.

Code assets: The building blocks for AI/ML projects

Building AI infrastructure with reusable code assets can lead to its long-term sustainability.
Building AI infrastructure with reusable code assets can lead to its long-term sustainability.

Organizations have just begun to recognize the complexity and importance of data and ML engineering and treat these areas as an extension of software engineering by adopting similar best practices. Organizations that don’t use software engineering best practices on data projects could face higher costs in maintaining the code base. In addition, organizations risk making business decisions based on code that isn’t comprehensively tested.

Using code assets such as reusable packages and modules is an essential best practice of software engineering, and organizations should consider implementing them as they develop AI/ML projects. Designing software with code packages and modules is akin to constructing a building using prefabricated components. Prefabricated code elements can help data teams expedite the development process, reduce costs, and support a consistent, maintainable, and flexible software structure, which can result in the long-term success and sustainability of AI/ML initiatives.

Reusable code packages also reduce duplication, allowing data teams to focus on collaboration, innovation, and strategic tasks instead of the minutiae of coding. This modular approach means AI/ML projects become leaner because resources can be allocated more efficiently. It also makes it easier for organizations to modify, expand, or repurpose projects. They can then continuously improve AI/ML projects in response to evolving market conditions, customer demands, or regulatory requirements. For example, a large bank in Brazil reduced the time to impact of ML use cases from 20 weeks to 14 weeks by adopting best practices in MLOps and data operations (DataOps).

Standards and protocols: Paving the path to scaling AI

As with construction of buildings, a well-defined set of standards and protocols ensures safety, consistency, and efficiency while scaling data and AI within an organizatoin.
As with construction of buildings, a well-defined set of standards and protocols ensures safety, consistency, and efficiency while scaling data and AI within an organizatoin.

Given the speed and demand of AI transformation, organizations must embrace standards and protocols to scale AI effectively. Establishing a robust framework of standards and protocols provides guidance for data teams on how to build, evaluate, and deploy ML models. By using this framework, data teams follow a standardized approach while developing AI use cases and adhere to the guardrails required for their industry (for example, not using features that could create bias).

When scaling AI, organizations can achieve success by implementing three notable aspects of standards and protocols: engineering standards, data and ML life cycle best practices, and regulations, compliance, and ethics.

Standard software engineering technologies

Organizations can adopt standard software engineering technologies to maximize the value of their AI investments. Continuous integration/continuous deployment (CI/CD) and automated testing frameworks allow organizations to automate the building, testing, and deployment of AI. With these technologies, all ML models follow a standard deployment pattern set by the organization and are effectively integrated into the broader IT infrastructure. In addition, fostering a culture of collaboration and shared responsibility through these new technologies can reduce time to market, minimize errors, and enhance the overall quality of AI applications. For example, a leading Asian bank implemented new protocols to scale AI as well as the tooling to enforce them, which helped reduce the time to impact of ML use cases from 18 months to less than five months.

Data and ML best practices

Emphasizing data and ML best practices is paramount to successfully scaling AI applications within an organization. By implementing a series of clearly defined protocols, organizations can streamline the analytics process. Such protocols typically define how organizations approach new projects, ingest data, engineer ML features, and build and test models. After a model is deployed, closely monitoring its performance and conducting maintenance become essential to achieving the best possible performance.

These best practices must be codified into comprehensive guides that explain the sequence of activities, important deliverables, and roles of various stakeholders, such as data scientists, engineers, and business professionals. Organizations that adopt these best practices can scale AI more efficiently and foster a culture of cross-functional collaboration.

Ethical and legal implications

Finally, as ML models grow in their sophistication and societal reach, it is critical that they operate within the bounds of legal and ethical norms. Without clear rules and guidelines in place from the outset, ML models become increasingly difficult and more time intensive to correct as they develop, which limits their scalability. Having a good understanding of applicable rules, compliance needs, and ethical considerations helps organizations operate within the limits of laws and societal expectations. Organizations that embrace regulatory compliance and ethical best practices as part of their AI development process can mitigate risks by requiring that ML models conform to codified compliance guidelines prior to release. The reliability of these practices also helps organizations build trust with their stakeholders and increases the longevity of their AI endeavors.

MLOps technology capabilities: From experimentation to live ops

A well-defined operational strategy for machine-learning models fosters trust in data-driven decision making.
A well-defined operational strategy for machine-learning models fosters trust in data-driven decision making.

Organizations can fully leverage their ML investments by introducing technology that efficiently transitions ML models from experimentation to production and facilitates ongoing maintenance and productivity once these models are deployed. This is where MLOps comes into play.

MLOps refers to the technology and best practices that ensure that ML models are robust and efficient prior to deployment by automating key tasks, facilitating collaboration between teams, and providing robust deployment pipelines and monitoring mechanisms. This seamless delivery guarantees the quality and reliability of ML models, fostering trust in data-driven decision-making processes. It also minimizes the risk of performance issues once ML models are live, helping companies reduce the time and resources needed for ML models to generate real-world impact.

Furthermore, MLOps prevents ML models from becoming obsolete or unused due to degradation or drifts. Once ML models are deployed, they must be continuously monitored and improved. MLOps technology provides tool kits to monitor models in production and flag maintenance teams for action when performance starts degrading. This helps organizations prevent costly errors and ensures that ML models stay relevant as data and business requirements evolve. In Europe, the Middle East, and Africa (EMEA), for example, our research has shown that top-performing insurance companies prioritize continual investment in advanced technology such as MLOps and an automation-first approach to scale their AI/ML products and keep pace with the evolving AI landscape.

By adopting MLOps solutions, organizations can set up their businesses to remain agile, competitive, and at the forefront of innovation now and into the future.

No matter where organizations are in their AI/ML journey, they can effectively scale their capabilities by investing in data products, code assets, standards and protocols, and MLOps technology capabilities. Embracing these enablers not only streamlines the development and deployment process but also facilitates continuous improvement and adaptability in the face of changing market dynamics. Ultimately, businesses that prioritize these enablers will be better positioned to harness the power of AI, maintain a competitive edge, and drive success in an increasingly data-driven world.

Gerry Aue is a partner in McKinsey’s Guatemala City office, Pepe Cafferata is a senior partner in the São Paulo office, and Roman Drapeko is a distinguished data engineer in the London office, where Margaux Penwarden is a principal data scientist and Vaibhav Sinha is a principal data engineer.