Skip to main content

What high-reliability organizations get right

Technology isn’t the only—or even the most important—reason high-reliability organizations outperform their peers.

As Industry 4.0 continues to advance with breathtaking speed, unleashing new capabilities at equally breathtaking speed, it’s all too easy for business leaders to succumb to relying solely on technology to drive operational improvement. Automation, advanced analytics, digital performance management, cloud computing, machine learning—all offer powerful and game-changing ways for organizations to achieve new heights in operational performance.

But the costs and effort these technologies and platforms entail can often exceed their payoff. The expectations surrounding them, it turns out, are often inflated. Take, for example, advanced analytics-driven predictive maintenance. As a means of boosting reliability, it is not the panacea many think it is. Without engineers who are trained in data analysis and in developing solutions based on those analytics, companies cannot possibly expect to realize the full potential of the technologies. Often, there are simpler, more cost-effective ways to accomplish the same goal.

Moreover, technology alone does not make for excellence in reliability. In industries that live by the laws of science, leaders often underestimate the role of management processes and skills in reliability-engineering success.

Research we conducted in a cross-section of predominantly heavy-asset industries reveals what distinguishes high-reliability organizations (HROs) from the rest. These companies focus as much on the enablers—the rigorous processes, role clarity, and accountability systems—as they do on the Industry 4.0 technologies.

Yet as essential as these enablers are, they’re still not enough. HROs also focus on talent: they put a premium on certain skills that other companies don’t, and they invest more in professional development. Finally, HROs structure their organizations according to how centralized the function and its accountability are. To be sure, advanced technologies can deliver dramatic improvements, but ultimately, it’s the human element that spells success.

The three core business practices that drive reliability

We selected eight best-in-class reliability organizations from a cross-section of industries, based on internal reliability metrics (such as percentage of downtime and overall reliability) and external performance benchmarks and industry awards for operational excellence. We then interviewed in depth a dozen current and former leaders of these organizations to identify the organizations’ key characteristics and practices.

As varied as our study sample was—it spanned the mining, energy, power generation and distribution, pharmaceuticals, airline, and military sectors—all the organizations adhere to three fundamental business practices (Exhibit 1).

We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com

HROs implement robust reliability processes.

All eight organizations follow strong reliability processes across their operations, from the ground level up. In this respect, they stand out from the average reliability organization, whose processes are either lacking specifics or inconsistently followed.

For example, HROs clearly define the assets critical to their operations, ensuring that the list is not merely well-understood but also considered in decision making. They are skillful in disseminating the definitions and standards throughout their companies. They create equipment-reliability strategies and execute them by strictly following preventative-maintenance schedules, closely monitoring equipment health, and identifying issues and proactively or promptly resolving them.

HROs also engage in root-cause problem solving to determine underlying issues and implement holistic, practical solutions. Their reliability engineers draw on a variety of data sources, tools, capabilities and subject-matter expertise.

Another common practice among HROs is that they all have robust systems in place for managing, preserving, disseminating, and updating their reliability knowledge base—including both reliability analysis and reliability design standards. For instance, these companies effectively share learnings from every reliability event, and update their equipment-design standards and work processes accordingly to ensure the event is not repeated. Finally, HROs hold other functions accountable to execute the reliability processes they’ve put in place.

One manager from an energy company noted that his organization eschews advanced reliability techniques or “fancy predictive maintenance models,” relying instead on traditional root-cause problem solving and defect-elimination approaches to get results. Another interviewee, a former submarine officer, put it plainly: the reason there’s rarely a failure of critical equipment “is twofold: the design is robust, and things just get done when they need to get done. Period.” HROs employ systematic methods to carry out root-cause problem solving on the front lines.

They define roles clearly and institutionalize knowledge.

Roles and responsibilities are clearly defined and well understood by operational leaders as well as all those with whom they work: the plant managers, maintenance leaders, operators, technicians, supply-chain managers, and so forth. Each member of the organization has a clear understanding of the role they play in driving reliability, so the guidelines for dealing with and engaging personnel are thus unambiguous. A leading pharmaceutical company in our research, for example, rotates personnel— including reliability engineers—to give them a firsthand understanding of the critical roles in the organization and how they interact.

They set accountability at the executive level and delegate it down.

HROs believe accountability resides at the top. To ensure that, they set executive compensation according to reliability-specific metrics and outcomes. These organizations establish clear corporate reliability standards and communicate them well: for example, they’re included in capability models and reliability metrics, which are tracked publicly on scoreboards. In addition, HROs discuss outcomes at all levels of the organization, from the frontline control room to the boardroom. At a major power generation company, executive sponsorship is considered a key success factor. “Senior executives really bought into frontline support for reliability and communicated its importance for our business clearly and frequently.”

HROs put people first

HROs recognize that it takes more than technical expertise to make a great reliability engineer. Our research revealed that to attract and retain the best and the brightest, HROs follow three specific talent-management practices: they pay higher salaries, emphasize communication and coordination skills relevant to the reliability engineer’s role as cross-functional problem solver, and provide well-defined career options and paths.

HROs offer higher pay than their peers.

We analyzed two years’ worth of job postings from the companies in our sample, comparing their pay levels with those of their competitors. Salaries at the HROs—average, as well as the low and high end of pay scales—were 15 percent higher than those of their peers (Exhibit 2). HROs also reward high performance; several we interviewed have developed specific key performance indicators (KPIs) for performance-based compensation and bonuses.

We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com

The fact that HROs pay better is hardly surprising: in any industry, offering a higher salary is an obvious way to attract top talent. But it is by no means the only way. Nor does it guarantee talent retention or reliability success.

HROs prize communication and problem-solving skills.

They appreciate that technical expertise alone is not enough for a first-rate reliability engineer. Knowing how to solve problems and how to communicate— up, down, and across the organization, in ways that earn trust and support—are critical skills. In fact, HROs rank communication among the three most critical skills in job candidates, followed by problem solving—a skill most companies omitted from their top eight (Exhibit 3).

We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com

HROs recognize the importance of being able to communicate equally well with the frontline and management; their engineers are adept at translating technical issues into laymen’s terms that their non-technically trained peers can understand. Effective problem solvers are solid conceptual thinkers who can understand a problem or condition by identifying patterns and connections that reveal which underlying issues to address. They’re also leaders: HROs give more weight to leadership skills than the more than 70 other organizations in our comparison set.

Creativity and rigor in problem solving are always valuable, particularly at a time when organizational complexity is growing and the costs of failure (financial, social, and environmental) are ever-increasing. Each day that a plant is sidelined can translate into millions of dollars lost; a malfunction that releases toxins in the environment can cause untold damage and even loss of life. Similarly, today’s higher-stakes operating environment heightens the importance of communication skills: specifically, the ability to engage different teams in constructive dialogue, raise concerns and potential issues proactively, and foster consensus on the appropriate action to take.

HROs create attractive paths for career advancement.

Reliability engineers typically follow one of three main career tracks, each with different rates of retention (Exhibit 4).

We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com
  • Field-based engineer to operations manager: Entry-level personnel with technical degrees make up this track. While each organization is different, we see common patterns. Generally, there is little opportunity for advancement in the initial years. Near the five-year mark, field engineers have two options if they want to stay with the company: they can either become a supervisor, or switch career tracks. There is no ultimate role in reliability engineering for reliability-focused personnel. Not surprisingly, this track experiences the lowest retention levels: fewer than one in four remain in it to become field operations leaders.
  • Junior to senior field-based subject-matter expert (SME): These individuals typically have five years of industry experience or an advanced degree (or both), and usually spend the first few years as a functional area expert. At that point, their choices are either to become a senior SME, change tracks—or leave the company. Engineers in this track have greater longevity than the first track; 50 percent remain in the track, most likely because companies effectively prescreen for this role, and senior onsite SME is considered an ultimate role.
  • Site-based engineer to corporate SME: From the get-go, these engineers (either entry-level or experienced technical personnel) know that corporate-level opportunities await them at a specific career milestone. They will be able to choose among different tracks, including corporate-level functional expert. Not surprisingly, employees on this track have the highest retention levels, as they have more opportunity to progress to higher-level roles and influence decision making across multiple sites. However, companies must manage expectations and performance effectively to ensure that the corporate SME stays connected with site operations and continues to deliver value across the network.

HROs demonstrate that they value quality talent by investing accordingly in their reliability bench. They establish well-defined career tracks to give their engineers ample opportunities for professional and personal development. The HROs we studied employ a variety of talent-management and -retention practices. All strive to offer numerous career-path options, even within the technical or managerial ranks.

Moreover, HROs are committed to training their people on an ongoing basis, whether through internal classroom sessions for professional certification, paid time off to attend industry-sponsored events, on-the-job training, or informal mentoring.

At a major energy company, field-based reliability employees are “truly engineers, closely linked to the equipment,” as a former manager noted. The company considers them high-potential employees and “gives them the option to pursue other technical, commercial, or managerial roles.”

The former head of reliability at another major energy company commented on the multiple career options of field reliability engineers, including moving into operations or risk management. “More importantly,” the leader added, “their career is well-managed from the start, with performance reviews every six months. We tend to keep our good engineers.”

HROs recognize that form follows function

For all their similarities, HROs vary widely in structure, according to the specific characteristics and challenges of their industry, their overall organization structure and culture, and the nature of their products. Essentially, there are four basic archetypes that vary along two dimensions: the strength of the central reliability function, and how centralized accountability is: that is, who tracks asset reliability and who is ultimately responsible for outcomes (Exhibit 5).

We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: McKinsey_Website_Accessibility@mckinsey.com

A center of excellence.

With this archetype, the corporate center establishes rigorous reliability protocols, but each local site is responsible for demonstrating its adherence to processes and procedure. This archetype works well for organizations whose assets are uniform and where there is little variability or change in the production process.

At a major airline, each fleet has a dedicated team of reliability engineers who work remotely to monitor equipment and oversee basic maintenance (when heavy maintenance is needed, they travel to sites). The center’s reliability analysts study big data and their analysis informs decisions about preventive maintenance. Reliability engineers engage subject-matter experts from the central corporate-reliability function (along with other technical resources) to help make recommendations. The senior manager of the company’s large maintenance organization reports directly to the COO.

Command-and-control.

In this model, a central team designs, implements, and enforces the reliability programs and standards throughout locations. This approach works well for industries or companies that are very process-oriented and where there are distinct differences between production or operating facilities. In such companies, strong oversight is needed; there is a high risk of catastrophic failure, and repeat failures are unacceptable. A top-down approach ensures all facilities and businesses comply with corporate reliability standards.

At a third energy company, a functional group sets global reliability strategy, and a small central SWAT team implements procedures throughout the sites. (SWAT, because they are quick, tactical, and execute with precision.) Integrity and reliability teams are responsible for site-level reliability. Each of the company’s business units has a reliability engineer who oversees maintenance and implementation for each discipline (for example, equipment rotation or electrical operations). The functional group works with the local reliability engineer, operators, technicians, operations managers, and vice presidents, all of whom are versed in the reliability standards that apply to their roles.

Bottom-up reliability.

Here, local entities define reliability standards and practices and are responsible for reliability outcomes. Technicians conduct tests and file reports to central reliability teams with the help of process engineers. In some cases, reliability work is outsourced. Such an approach is well-suited to decentralized businesses where no two sites are alike, either in their culture, operating environment, or both.

A major resource company illustrates the benefit of this model, with its dozens of facilities and assets that include all manner of heavy equipment, refineries, and processing operations. Although reliability programs and accountability are decentralized, the company defines metrics for use across different sites—and considers it a priority to make them transparent to the entire organization.

Corporate oversight.

With this archetype, local operations define the reliability program, but the corporate center tracks (and has ultimate responsibility for) outcomes. The central office often develops KPIs for use enterprise wide. This model is effective for businesses whose products vary considerably and which require relatively tailored processes across facilities (such as pharmaceutical companies). The hybrid structure makes sense, given the strong local leadership that can be relied upon to carry out reliability without direct responsibility for outcomes. Crucial prerequisites include either having strong, local reliability processes and capabilities, or having other robust processes and strict metrics that reinforce reliability excellence, such as through quality control.

At a leading pharmaceutical company with dozens of facilities, corporate maintains consistency in reliability practices by establishing clearly defined KPIs, which local sites report on via dashboards. The central team also shares best practices (and failures) companywide. It holds weekly meetings with local facilities to review key topics and issues, and provides resources for major initiatives, such as implementing new technologies.

But it’s field engineers who lead such initiatives. Local reliability personnel also focus on root-cause investigations (RCIs) and failure-mode and effect analysis (FMEAs). Led by a maintenance manager and general manager, the local engineers, maintenance group, project teams, and planning and scheduling teams work in concert with the central reliability function.

A driving principle and enterprise priority

Reliability engineering emphasizes statistical analysis, but experience and history show that quantitative methods alone are insufficient for success. In many of the most dramatic operational failures in modern times, miscommunication or poor decision making exacerbated a fundamental engineering failure—in some cases resulting in catastrophic loss of life that could have been averted. Such events stand as sobering reminders of the importance of rigorous management, transparency, and accountability in reliability engineering.

Today, supply-chain complexity, heightened business interdependencies, competitive and financial pressures, and intensified public and regulatory scrutiny all mean that reliability organizations, regardless of industry, have their work cut out for them. Leaders cannot expect Industry 4.0 technologies alone to be a cure-all. Rigorous reliability processes, role clarity, and clear accountability structures that align with the broader organization—all are essential components of reliability success. So is talent management and development. But high-reliability organizations go one step further: they make reliability an explicit priority, not just an afterthought. As one reliability manager put it: “Everyone at our company takes reliability seriously, not just the reliability engineers. They know that reliability is a top priority, and one of our main criteria for success.”

About the author(s)

Matt Gentzel is a partner in McKinsey’s Pittsburgh office, where Bill McDonnell is a business analyst; Ethan Hessney is an engagement manager in the New York office; and Joël Thibert is an associate partner in the Santiago office.

Related Articles