Scaling AI in the sector that enables it: Lessons for semiconductor-device makers

Artificial intelligence has significant value-creation potential in the semiconductor industry. How can semiconductor companies deploy AI at scale and capture this value?

Note: Automated text-to-speech doesn’t always get all pronunciations or voice nuances right. Our apologies in advance.

Artificial intelligence/machine learning (AI/ML) has the potential to generate huge business value for semiconductor companies at every step of their operations, from research and chip design to production through sales. But our recent survey of semiconductor-device makers shows that only about 30 percent of respondents stated that they are already generating value through AI/ML. Notably, these companies have made significant investments in AI/ML talent, as well as the data infrastructure, technology, and other enablers, and have already fully scaled up their initial use cases. The other respondents—about 70 percent—are still in the pilot phase with AI/ML and their progress has stalled.

Sidebar

We believe that the application of AI/ML will dramatically accelerate in the semiconductor industry over the next few years. Taking steps to scale up now will allow companies to capture the full benefits of these technologies.

This article focuses on device makers, including integrated device manufacturers (IDMs), fabless players, foundries, and semiconductor assembly and test services, or SATS (for more information on our research, see sidebar, “Our methodology”). In a future article, we will look more closely at the implications for equipment players.

AI’s role in tackling the challenges ahead

Because of their high capital requirements, semiconductor companies operate in a winner-takes-most or winner-takes-all environment. Consequently, they have persistently attempted to shorten product life cycles and aggressively pursue innovation to introduce products more quickly and stay competitive. But the stakes are getting increasingly high. With each new technology node, expenses rise because research and design investments, as well as capital expenditures for production equipment, increase drastically as structures get smaller. For example, research and design costs for the development of a chip increased from about $28 million at the 65 nanometer (nm) node to about $540 million at the leading-edge 5 nm node (Exhibit 1). Meanwhile, fab construction costs for the same nodes increased from $400 million to $5.4 billion.

Costs for chip design and fab construction have soared as chips become increasingly complex.
We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com

As companies attempt to increase productivity within research, chip design, and manufacturing while simultaneously accelerating time to market, AI/ML is becoming an increasingly important tool along the whole value chain.

Our research shows that AI/ML now contributes between $5 billion and $8 billion annually to earnings before interest and taxes at semiconductor companies (Exhibit 2). This is impressive, but it reflects only about 10 percent of AI/ML’s full potential within the industry. Within the next two to three years, AI/ML could potentially generate between $35 billion and $40 billion in value annually. Over a longer time frame—gains achieved four or more years in the future—this figure could rise to between $85 billion to $95 billion per year. That amount is equivalent to about 20 percent of the industry’s current annual revenue of $500 billion and almost equal to its 2019 capital expenditures of $110 billion. 1 While a significant portion of this value will inevitably be passed on to customers, the competitive advantage of capturing it, particularly for early movers, will be impossible to ignore.

Artificial intelligence could generate $85 billion to $95 billion for semiconductor companies over the long term.
We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com

AI/ML use cases in the semiconductor industry

Our comprehensive map of AI/ML use-case domains—areas that contain multiple specific use cases—spans the entire value chain for semiconductor-device makers (Exhibit 3). A use-case domain can also extend across several value-chain activities. For example, the demand-forecasting and inventory-optimization domain is relevant to manufacturing, procurement, and sales and operations planning.

A comprehensive heat map of use cases allows individual companies to focus and set priorities.
We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com

Industry-wide, manufacturing will accrue the most value from AI/ML (Exhibit 4). This is not a surprise, given the capital expenditures, operating expenditures, and material costs involved in semiconductor fabrication. The greatest relative spend reduction will occur in research and design, primarily resulting from the automation of chip design and verification. We will investigate the main use cases in the next section.

Artificial intelligence will deliver the most value by reducing manufacturing costs, but the largest relative impact will be in R&D.
We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com

AI/ML use cases in manufacturing

Manufacturing is the semiconductor industry’s largest cost driver, and AI/ML use cases will deliver the most value—about 40 percent of the total—here. They can reduce costs, improve yields, or increase a fab’s throughput. Over the long term, we estimate that they will decrease manufacturing costs (both cost of goods sold and depreciation) by up to 17 percent. Consider a few examples.

Adjustment of tool parameters. When defining steps in process recipes, semiconductor companies typically specify one constant time frame for each one. But the time frame required for some individual wafers may show statistical or systematic fluctuations, so a process could keep running after it has produced the desired outcome (for instance, a particular etch depth). That can increase timelines and waste or even damage the chip.

To achieve greater accuracy, semiconductor companies can use live tool-sensor data, metrology readings, and tool-sensor readings from previous process steps, allowing machine-learning models to capture nonlinear relationships between process time and outcomes, such as etch depth. The data collected might include electric currents in the etching process, light intensity in lithography, and temperatures in baking. With these models, optimal process times can be implemented on a per-wafer or per-batch basis to shorten processing time, improve yield, or both, thus decreasing cost of goods sold (COGS) and increasing throughput.

Visual inspection of wafers. This step, which helps ensure quality by detecting defects early in the front-end and back-end production process, is frequently conducted during production—for example, using cameras, microscopes, or scanning-electron microscopes. Those images are still commonly evaluated manually by operators for potential defects, however, leaving them subject to error and backlogs and driving up costs.

Modern wafer-inspection systems, made possible by advances in deep learning for computer vision, can be trained to detect and classify defects on wafers automatically, with an accuracy on par with or better than human inspectors. Specialized hardware, such as tensor-processing units, and cloud offerings enable automated training of computer-vision algorithms. This, in turn, allows for faster piloting, real-time inference, and scalable deployment.

With this approach, companies can obtain early insights on potential process or tool deviations, allowing them to detect problems earlier and improve yields, all while reducing costs.

AI/ML use cases in research and chip design

AI/ML use cases can help semiconductor companies optimize their portfolios and improve efficiency during the research and chip-design phase. By eliminating defects and out-of-tolerance process steps, companies can avoid time-consuming iterations, accelerate yield ramp-up, and decrease the costs required to maintain yield. They may also automate the time-consuming processes related to physical-layout design and the verification process.

Although we are not yet at the point where AI/ML acceleration can be applied to all designs and to all stages of chip design, we do not see a fundamental reason why it cannot penetrate further over time. Therefore, AI/ML may eventually reduce the current R&D cost base by as much as 28 to 32 percent, which is even higher than the gains expected from manufacturing.

Automated yield learning in integrated circuit design. If there are missteps during integrated circuit (IC) design, semiconductor companies have to undertake multiple costly and complicated iterations based on feedback from manufacturing.

Semiconductor companies may avoid this problem by deploying ML algorithms to identify patterns in component failures, predict likely failures in new designs, and propose optimal layouts to improve yield. During the process, IC designs are broken down into key components with the support of AI-based analytics. The algorithms then compare these component structures with existing designs to identify problematic locations within the layout of individual microchips and improve the design. Thus, AI- and ML-aided design can significantly reduce COGS, increase terminal yields, and decrease time to market for new products. It can also decrease the effort required to maintain the terminal yield.

Other areas. All other functions, including planning, procurement, sales, and pricing, will benefit from AI/ML use cases. Often, these use cases are not specific to the semiconductor industry and are partially established in other industries, thus allowing implementation to occur more rapidly. Overall, applying AI/ML use cases to additional functions could yield up to $20 billion in annual value.

Six critical enablers for successful AI/ML implementation at scale

To assist semiconductor companies with AI/ML transformations and deploy use cases at scale, we focus on six enablers that are part of the McKinsey playbook for digital and analytic transformations: the creation of a strategic road map, talent strategy, agile delivery, technology, data, and adoption and scaling (Exhibit 5).

Our research indicates that six enablers are critical for successful implementation of artificial intelligence at scale.
We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com

Creation of a strategic road map

Above all, scaling AI/ML efforts must be a strategic priority for companies. The initial effort, which involves coordinating data, agreeing on priority use cases, and encouraging collaboration among the right business, data-science, and engineering talent, is too great to be successful as a bottom-up project.

Ideally, the AI/ML effort will be linked to clear business targets, giving business units and business functions a joint interest in making the transformation successful. For example, companies could identify cost savings for predictive maintenance and provide resources for the appropriate AI/ML use case. The resulting savings would help the function that sponsored the use case and provided the appropriate resources, allowing it to achieve its business targets. Such gains will give functions a strong incentive to support AI/ML implementation. Setting clear business targets will also help companies measure the benefits of each use case over time.

In line with their defined targets, companies should identify specific business domains and value levers that will be their focus. They can then select relevant use cases that allow them to apply these levers.

When prioritizing use cases in the strategic road map, companies should emphasize their total value, feasibility, and time to value. As their experience and capabilities grow, they can undertake additional use cases that are more difficult to implement or take longer to achieve. As they determine the value of potential use cases, companies should examine levers that often get overlooked, such as the competitive advantages associated with decreased time to market and higher quality. Such details will allow them to size and prioritize initiatives accurately.

After setting their priorities, semiconductor companies must allocate sufficient resources to their AI/ML initiatives and investigate supportive partnerships with third parties that have complementary skills, rather than trying to reinvent the wheel themselves. Some larger players may have the spending power required to develop most capabilities in-house, as well as sufficient data from their large installed tool fleet to train AI/ML models, allowing them to retain full control over all associated intellectual property. Given the required resources, smaller players might find it beneficial to leverage commercially available solutions where available, or to partner with others to develop or share algorithms, or to create joint data-sharing platforms that increase the amount of information available to train models. Examples of potential partners include other semiconductor-device makers, companies involved in electronic design automation, hyperscale cloud providers, or equipment OEMs.

Talent strategy

Most companies that successfully implement AI/ML create a centralized organization, such as a center of excellence (COE), that focuses on such initiatives. This group serves as a clear home for the new talent required and is responsible for defining common standards and building a central repository for best practices and knowledge. Some of the leading semiconductor companies have already made significant investments in AI/ML COEs that include hundreds of engineers.

When hiring technology staff for the central team, semiconductor companies should carefully balance the role composition to ensure that it has the right capabilities to move from pilot to full scale-up of a use case. For example, data scientists and data engineers are required for piloting an AI/ML use case, but ML engineers, infrastructure architects, or full-stack developers are needed to drive the scale-up. Typically, semiconductor companies do not have employees with these profiles and must recruit them externally.

The centralized AI/ML function cannot be isolated from the business and functions in which it will deploy use cases. To build connections, people with business/operations domain expertise, such as R&D designers, process engineers, and equipment engineers, should be included in the AI/ML function. These team members have a critical role in identifying AI/ML use cases and also act as ambassadors for AI/ML solutions within the organization.

Likewise, successful companies will ensure that local sites—both fabs or functions—add data-science expertise to their AI/ML teams. The employees trained to become “data citizens” can work jointly with specialist roles from the AI/ML COE to lead use-case selection and support implementation in cross-functional teams.

Agile delivery

To avoid a situation where AI/ML use cases become stuck in a “proof-of-concept” spiral with limited use or scale, teams should focus on achieving business value, with a heavy emphasis on iterative improvement.

An agile approach, which is central to software development, can help semiconductor companies attain this focus. Although AI/ML development involves intense discovery and exploration, semiconductor companies should receive continuous feedback from people who use insights from their models. Many agile teams have found success by leveraging the vertical-sliver approach, which involves creating an end-to-end analytic pipeline that includes data ingestion, modeling, recommendation development, and deployment to users—typically business owners or engineers who work on the fab floor—in the first or second sprint. The vertical-sliver approach may be counter to many established practices since semiconductor companies typically only make changes within manufacturing engineering when they are completely certain that the shift will deliver perfect results.

From an operational perspective, agile teams are beneficial because they reduce dependencies on people outside the team. Typically, it is difficult to avoid such dependencies since there are often organizational divisions among data owners, AI/ML experts, and IT infrastructure. But agile AI/ML teams are cross-functional and encompass all needed expertise for the use case even if some members are only included for a limited number of sprints. Agile teams can also leverage self-serve resources, such as access to data and infrastructure.

The shift to agile AI/ML delivery should occur as soon as possible and will be more likely to gain traction if top leaders lend their support and companies attempt to change mindsets as well as processes.

Technology

Within the fabs, successful companies establish a connectivity layer for real-time access to relevant data sources, including production and measurement tools, auxiliaries, facilities, and others. Tool OEMs can help ensure this connectivity, which is particularly essential for manufacturing use cases. We will explore the role of tool makers in enabling AI/ML in a second article.

Semiconductor companies also require a common data-integration layer. This layer first combines the data before deploying the analytics engines and use cases in a development environment. For best results, semiconductor companies must find ways to combine data and use cases from different tool vendors to limit complexity and prevent multiple Internet of Things stacks in parallel silos.

Successful companies will leverage both edge and cloud computing to support their AI/ML use cases. Since some tools generate tremendous amounts of data, edge-computing capabilities—deploying the AI/ML use case within or close to the tools—are often required for real-time applications. Cloud solutions provide economies of scale and enable links among different fabs, increasing the pool of training data for use cases. (Semiconductor companies are historically cautious around data security, however, so they may limit deployment of sensitive data to on-premises solutions.)

Data

Semiconductor companies have several hundred tools in each fab, some of which generate terabytes of data, and it would be impossible to examine every piece of information. To ensure maximum effectiveness and efficiency, players must prioritize data that might enable multiple use cases since this will have a much greater impact than a single initiative.

Even if players limit the amount of information analyzed, their AI/ML initiatives will still require extensive time and resources, such as sufficient numbers of data engineers on AI/ML teams. Strict data-governance policies are required to ensure that existing data and newly generated data are immediately ready for use, consistently high in quality, and trustworthy. Successful companies typically have a dedicated data-governance team to ensure data consistency as well as the quality of new and existing data.

Adoption and scaling

Semiconductor companies should stringently focus on the scalability of prioritized use cases, beginning in the design phase. Experts from multiple sites or fabs must be included early on to ensure that use cases can later be deployed across locations. Some semiconductor companies are creating focus groups within the fab landscape to plan for scale-up. For specific domains, they pick a fab to serve as the lead site, and it then identifies use cases, collects requirements from the other fabs, creates the implementation plan, and ensures transfer of knowledge. As noted earlier, semiconductor companies will need to prioritize use cases for deployment based on their value after full scale-up.

Second, semiconductor companies should ensure that the entire organization follows standards and best-known methods (BKMs) when developing and scaling up use cases. Codifying and enforcing the use of BKMs across the organization can ensure that solutions are sustained and improved over time, allowing machine learning to gain maximum scale across sites. Typically, the central AI/ML team oversees this critical task.

Finally, semiconductor companies must seamlessly integrate use cases into an end user’s digitized workflows to ensure adoption. Many companies overlook this step, but this oversight has major consequences. In our survey, nearly half of semiconductor-device makers stated that lack of integration was the second-biggest problem in scaling AI/ML use cases. If organizations form tight links between the AI/ML function and the business side, it will be significantly easier to take the user’s perspective when initially designing the use case.


The semiconductor industry is at a turning point, and companies that don’t devote significant resources to AI/ML strategies could be left behind. Although semiconductor companies may take different approaches, depending on business model, experience with AI/ML, and strategic priorities, the goal is the same: to take productivity and innovation to new levels.

Explore a career with us

Related Articles