Like companies in other industries, banks are racing to take advantage of the opportunities and manage the risks that the digital economy creates. To do so, they will need computing platforms that provide greater agility at lower cost. As global head of Goldman Sachs’s technology division, Don Duet has led the development and execution of the firm’s private-cloud strategy, as well as its thinking about opportunities in the public cloud. “None of this marks a sudden or abrupt shift in strategy for the firm. It’s always been about making continual progress,” he says. In this edited interview conducted by McKinsey’s James Kaplan at Goldman Sachs’s headquarters in New York, Duet discusses the firm’s use of a private-cloud infrastructure—the challenges and risks it faced in conceiving of and launching the platform almost a decade ago and the benefits the firm is realizing through this technology.
McKinsey: Everyone’s talking about the use of digital technologies in banking—what’s changed? Are we at an inflection point in the industry?
Don Duet: I joined Goldman Sachs near the beginning of the IT revolution, when distributed computing was just taking hold. I was able to help design some of the first trading systems for the firm, which were based on Windows PC architecture. Since then, I’ve been in the middle of the shift toward distributed architecture, then the Internet—and now cloud platforms, mobile platforms, and big data.
Through all of this, technology has remained a core competency for Goldman Sachs. Technology engineers make up roughly one-third of our workforce. I think that number is pretty representative of how much we value technology and how much we believe our investments in technology can enable the business.
We’ve been investing in technology for a long period of time—more than two decades—so none of this marks a sudden, abrupt shift in strategy for the firm. It’s a process of continual transformation—moving more and more core parts of our business into models in which things are done electronically, at higher scale, and delivered in a more seamless fashion.
As macro forces like open-source software and cloud architectures have created more opportunity to innovate at a higher pace and lower cost, we’ve seen a general movement in the industry toward digital frameworks and digital business models. Think about how much digital literacy there is today compared with even 10 or 15 years ago. Our customers, our employees, and all the people we interact with across the industry are much more technology literate and want to be empowered through technology. Customers are demanding that we reinvent and recast many of the experiences they have with the firm. We’re rethinking how we do things and the way we articulate our services and solutions to our customers and to ourselves.
McKinsey: How so? What are clients telling you they want?
Don Duet: We work in an industry where there is mostly an institutional focus. But across every aspect of our business, clients are increasingly demanding services that are both personalized and efficient. So we’ve been focused on recasting a number of experiences that would’ve been analog. Customers and clients want to access our services and solutions online, using the devices they choose within the time frames they need. So we’re making investments along those lines and getting products to market. It used to be that the API [application programming interface] to Goldman Sachs was a phone call; increasingly, it’s actually an API.
McKinsey: The cloud seems to have become a major element in the firm’s digital strategy, as it has for a lot of other companies. Can you talk more about Goldman’s efforts to establish a private-cloud infrastructure?
Don Duet: The journey began almost ten years ago, almost before the term cloud was part of the business vocabulary. At that time, we had reached a certain size and scale of technology investment that forced us to consider how agile we were. We needed to be more responsive in meeting business demands, and we wanted to drive down cost of ownership. We realized that we had to have foundational agility in our infrastructure to deliver computing and solutions to our businesses in different parts of the world, create new software products and services for our customers, and otherwise succeed in different parts of the business. So we adopted an x86 architecture that would enable us to run large-scale grid computing, which allowed us to improve risk management for our derivatives businesses and products. It was a very different model for us; we had been mostly building very specialized pieces of IT infrastructure for specific business products and solutions.
We invested in things like our virtual-desktop environment; we wanted to ensure that people would have access to the firm’s applications and services wherever they were. This building you’re in—in fact, every Goldman Sachs building worldwide—has no PCs sitting underneath people’s desks anymore. Our computer processing happens in data centers around the globe, and bandwidth and results are served up wherever our people are. We designed an architecture that allows the firm to bend to where our people are versus having everyone bend to where the firm is.
Over the past four years, we’ve focused on how we could bring some of these same concepts and principles into our core business processing. We’ve developed a uniform architecture for running an internal cloud that’s much more of a shared environment. We enable our business products and software solutions to craft their requirements against that uniform pool. As a result, we’ve gotten more agile. The change is allowing us to reduce a lot of operational risk and has enabled us to become much more efficient. For instance, we’re able to free up computing cycles that would otherwise be stranded. So if no one is using computing power at the end of the day in Japan, we can free up that processing power for the people in New York, who are in the middle of their market day. This year we will have 85 percent of our distributed workloads running on the cloud, so you can imagine the potential impact.
McKinsey: What advantages have you captured from using a private-cloud infrastructure?
Don Duet: I’d say reducing risk to the business has been the biggest value driver. But it’s hard to measure that quantitatively. It’s easier to measure how much additional computing power we can get, for instance, or the efficiency of our computing footprint. The uniform structure of our private-cloud infrastructure has allowed us to reduce complexity, which is enormously important for managing risk. We have benefited from having fewer design points, a more manageable footprint, and automated server provisioning. We can respond to failures more quickly.
We’ve also moved from an environment in which it could take months to launch or update an application to where it now takes days, sometimes even minutes. We have teams that provision and manage the cloud infrastructure, independent of all the different consumers that are coming. They can assess demand and plan capacity based on actual consumption and known business opportunities. Better capacity planning translates into faster turnaround and much more responsiveness without, again, creating pools and islands of computing that ultimately increase risk and reduce efficiency. So when we need to set up facilities in, say, China or other parts of the world, it is substantially simpler to get people and processes on the ground up and running faster. The change has also empowered people to think differently about computing and what is required to do certain tasks. It’s changed the dynamics of our engineering team and how they address problems.
McKinsey: How has internal demand been affected?
Don Duet: Every year we seem to grow our computing power, whether you measure it in cores or gigaflops or whatever. The private-cloud has allowed us to change the dialogue in the technology division from being confrontational—why is this not done, or how do I get this done?—to being focused on problem solving. But demand hasn’t changed that much; it continues along at a pretty steady rate. On the supply side, we are now ordering equipment in bulk. We have a forward-forecasting and planning cycle, so we’ll buy thousands of machines at once, which enables us to work with a range of OEMs. Previously we would have had to work with a few select vendors. We’re thinking about design differently—we are building hardware in ways that allow for generic but highly tuned functions.
McKinsey: How did you manage the transition to the new platform?
Don Duet: We have a complex technology environment, with more than 4,000 critical systems and applications in the firm. Some comprise one or two pieces of software; others might comprise thousands of pieces of software that need to be carefully orchestrated to work properly.
We’ve sought to dramatically improve engineers’ and developers’ experiences—for example, making it easier for software developers to test applications by providing appropriate computing power and services, and empowering them to make modifications by giving them the right controls to test and implement new features. We share our cloud strategy widely across the firm, and we constantly measure how much progress we’re making, which applications still need to be migrated, and other areas for improvement. We’ve created incentives for people to move their workload to the cloud because they’re seeing a lot of benefit right away.
A few years back, we did a meaningful reorganization within the technology division. We were vertically oriented, with teams that focused on different parts of the business. We wanted to be more like an agile start-up that can go from essentially nothing to running products within months, with little capital investment. To do that, we created a platforms team, moving many people in our division into different roles. This team uniformly supports and delivers
core cloud-based services, applications, and data-related services across all business units and groups within the organization. More of our developers now sit on teams that are aligned with the business. They are finding that their ability to go from concept to product is much simpler.
McKinsey: How have your skills requirements and talent-management approaches changed?
Don Duet: We have a global footprint, so we recruit everywhere around the world, and we have about 9,000 engineers in our technology division. The size of our group can be challenging at times, but it is also a great asset to have that sheer amount of human capital. We recognize that we’re in a competitive market for talent, so we focus a lot on making sure that we create a value equation for employees, where the roles people have at Goldman Sachs are innovative and exciting, and where individuals feel like they can have a direct impact on the business.
When it comes to skills requirements, our software engineers, who historically have operated at an abstract level far from the core of our infrastructure, are moving a lot closer to the core. They’re learning more about it. They’re becoming systems owners and participating in decisions about how it gets constructed to support their applications. We think that’s healthy. It creates a better-educated workforce in our software-engineering community. It’s prompting them to make better design decisions when they’re building solutions, which ultimately creates positive outcomes for the business. Meanwhile, most of the engineers in our infrastructure functions are becoming subject-matter experts—in networks, or storage, or computing. But many are also becoming software engineers who are actually building solutions and services for the cloud environment. If you look back five or six years, maybe 10 percent of our environment and infrastructure teams would have included software engineers. Today that number is probably closer to 50 percent. The fact is, we need both deep technical expertise and generalists who can apply integrated approaches to solve technology problems.
McKinsey: With the private-cloud up and running, how are you thinking about analytics, DevOps, and other new tools and organizational approaches associated with digitization?
Don Duet: As I said before, we see things as a progression. With our new architecture around our internal cloud, we’ve been able to run multitenant computing from our business applications.
We’re now taking that a step forward to think about data: How do we build a very large, scalable way of managing business intelligence? That’s a big initiative for us. Our cloud architecture enables our developers to think about data as an asset to be shared separately from the applications themselves—with appropriate controls, how can the firm’s data be integrated and stored in places where it can be accessed by more people? Being able to bring different types of data sets together, forming new data sets, and being able to learn from that information—that creates business intelligence for us. I don’t think we could’ve delivered or built a large data leg if we did not have the underlying cloud capabilities we have today.
Now, DevOps? That’s an interesting and important concept for us. To this point, we’ve been mostly focused on creating an environment that is as close to “no ops” as possible. So we’ve been investing in a system-defined model, moving things that have traditionally been in people’s experience and knowledge into well-defined components of code. We’ve been considering how to bring concepts like machine learning and deep learning into the organization. But for a lot of these initiatives, it will be a multiyear journey, moving from human time to machine time.
McKinsey: How are you thinking about the public cloud?
Don Duet: The future of our business is in how we enable a digital community. One of the big challenges in getting to that state is determining how we could put our data and applications anywhere, not just on the servers that we own or the data centers that we own and manage. We’ve identified a few steps. We’re participating in the development of industry standards, joining standards bodies, and working actively with many of the large companies that are providing for and facilitating the public infrastructure, helping to address questions relating to design and security. How do we build architectures that will allow us to bring our data and content to the public cloud but where we can maintain our own encryption keys and manage them ourselves?
Security clearly will be the most important enabling factor, particularly for us. We’re often a custodian or steward of sensitive client information. If you can’t secure information, and if you can’t do that intrinsically—writing a contract and signing a contract is just not good enough—that becomes a backdoor risk. No security model is completely perfect, but we’d need one that allows for us to have information and content independent from point of control. That’s incredibly important, and the products that will allow for that are coming.
Data movement is another critical enabling factor—how easy is it to connect to a third-party data center and get things done? Vendor lock-in is another big issue. Having a heterogeneous environment is important. It enables activity and flexibility. Particularly when you’re talking about the cloud infrastructure, companies will need to rely on a collection of services and solutions. They must be able to put computing in multiple data centers but treat them as though they are in one. So we’re focused on using containers and other open-source technologies to ensure uniform run time. I can now build a container for my file system—essentially a wrapper for the code, system tools, system libraries—using the open-source Docker technology or other container technologies that are adhering to the Docker run time, operate it within any vendor’s environment, and I know I’m going to get the same results. That’s a big deal for those of us concerned about interoperability and fostering community.
The containers also have become a useful tool for our software developers, a way for them to understand infrastructure in a different way than they’re used to. In the long term, containers will also make it easier for us to bridge out to external clouds.
McKinsey: Any advice for others considering adopting cloud infrastructure?
Don Duet: One of the things we’ve learned, even with our experiences with private-cloud, is that you can’t simply move a problem from one place to another. The transition from bespoke infrastructure to cloud infrastructure can become a teachable moment, a time to solve for basic fundamental problems, to really invest in automation of base functions and in the empowerment of application developers. The goal should be to reduce the frictional middle parts of the IT infrastructure and create a more seamless, end-to-end solution.