AI in software development: Boosting code quality

(5 pages)

AI’s potential to generate code quickly and easily is transforming how companies build software. However, recent developments have shown that achieving true impact from AI in product development requires organizations to fundamentally change their entire product development life cycle. To better understand the evolution of product development in the era of gen AI, McKinsey Senior Partner Martin Harrysson and Partner Prakhar Dixit spoke with Tariq Shaukat, CEO of Sonar. Sonar’s technology is designed to help companies produce cleaner, more secure code, a process that is becoming more complicated as AI enters workflows. Shaukat shares how AI is changing developer roles, the challenges of maintaining code quality and trust, and the importance of rethinking productivity, accountability, and risk in this new landscape.

This interview has been edited for length and clarity.

Prakhar Dixit: Could you tell us a bit about Sonar and your journey as Sonar’s CEO over the last couple of years?

Tariq Shaukat: Sonar started 17 years ago as an open-source project founded by three developers in Geneva who wanted to help engineering teams manage and improve code quality.

For the first decade, we were essentially an open-source project with millions of users. Over time, as developers loved the platform, enterprise leaders began asking for features tailored to their needs. That is when we evolved into an open-core business.

Today, we have more than seven million developers using Sonar. Since I joined Sonar in 2023, we have leaned heavily into AI, because as code generation accelerates, organizations need guardrails to ensure that their software remains trusted, maintainable, and compliant.

Martin Harrysson: Given that Sonar sits at the intersection of developers and tooling, what are you seeing in terms of AI’s impact on software development?

Tariq Shaukat: Not every company allows AI yet, often due to trust and security concerns, but adoption is growing quickly. Almost every developer wants to use AI tools now, which was not true a few years ago. Start-ups are writing nearly all their code with AI, and larger enterprises are moving from experimentation with AI to real production use, with a clearer understanding of what the models can and cannot do.

We have been benchmarking coding models and found that each one has a “personality.” Some are verbose, some introduce security or maintainability issues, and some are elegant but brittle. For example, newer models like GPT-5 solve problems better overall but use about three times as many lines of code as earlier versions, which increases complexity and introduces more potential bugs.

The takeaway is that as models get better, they also become harder to review and maintain. Developers now face the challenge of checking far larger volumes of code for subtle issues that are harder to spot.

As code generation accelerates, organizations need guardrails to ensure that their software remains trusted, maintainable, and compliant.

Prakhar Dixit: As the tools keep getting better, what else needs to happen for AI to truly improve productivity across the product life cycle?

Tariq Shaukat: We have moved beyond code generation being the bottleneck. The challenge now is clarity and trust.

AI is literal, not creative, so architecture and design matter much more. You need to specify up front exactly what you want the AI to build.

Then, downstream, we face the “code trust” bottleneck. Humans need to review and verify all the code produced. Many organizations will need to build stronger code review practices to ensure that AI-generated code can safely enter production.

Prakhar Dixit: That ties into how teams work together today and how that might need to evolve. Are you seeing organizations change their operating models or talent mix?

Tariq Shaukat: Definitely. No one becomes a developer just to be a copy editor for AI, but the job is changing as catching errors becomes more important.

In the past, code review was something only top engineering companies did. Now, every large organization needs to make it part of their core discipline. Teams need to identify issues earlier in the development cycle, ideally inside the IDE [integrated development environment] itself. Otherwise, the volume of generated code becomes unmanageable.

This change does not just affect engineers; it affects product managers and designers, too. Tools like Figma and other generative UI [user interface] systems mean that product and UX [user experience] teams can now prototype in functional code, which reshapes their workflows and expectations.

In the past, code review was something only top engineering companies did. Now, every large organization needs to make it part of their core discipline.

Martin Harrysson: Despite these advances, many organizations still do not see major productivity gains. Why do you think that is?

Tariq Shaukat: A lot of the hype focuses on “greenfield” examples—start-ups building from scratch. In reality, most companies have millions of lines of code already, plus high internal and external regulatory standards. That complexity slows adoption. This lagging productivity is not a failure of technology; it just takes time to integrate these tools.

We use the concept of “vibe and verify.” You can vibe—that is, generate code quickly—but you must verify it before deployment. Verification ensures the code fits your environment, meets compliance requirements, and can be trusted.

This is similar to what we saw with cloud computing. It took years for enterprises to overcome integration, training, and governance hurdles. AI will follow the same pattern.

Prakhar Dixit: Measurement is another challenge. How should companies think about tracking success?

Tariq Shaukat: It is still an unsolved problem. Many current metrics are misleading. You can say, “30 percent of our code is written by AI,” but you do not know whether that code is good or bad.

Productivity is not just about output; it is about maintainability, quality, and reduced rework. We need better combinations of input and output metrics to track productivity. Inputs can include things like issue backlog and defect rates, while outputs can be about shipping timelines and product quality.

In essence, measurement in AI-assisted development should not differ much from traditional software. It just requires more nuance. Overemphasizing a single metric, like cycle time, can create new problems elsewhere in the system.

Martin Harrysson: Accountability seems increasingly important as AI takes on more tasks. How do you see that evolving?

Tariq Shaukat: Accountability starts with defining what “trustworthy” means for your organization. A large manufacturer and a start-up will have very different thresholds.

Most enterprises care about explainability, transparency, and repeatability. They are uncomfortable with the same AI model evaluating its own output. Using independent models for verification—for example, one model checking another—is emerging as a best practice.

At Sonar, we focus heavily on explainability. Our system is largely deterministic, supplemented with AI. That means we can reproduce the same results every time and explain exactly why something was flagged.

Until organizations define what “trustworthy code” means for them, they cannot assign accountability properly. My advice is to start by articulating what matters most: explainability, evidence, reproducibility, or speed. Once that is clear, accountability follows.

How AI is shaking up coding: Interview with Sonar CEO Tariq Shaukat

Explore a career with us

Related Articles

Measuring AI in software development: Interview with Jellyfish CEO Andrew Lau

Unlocking the value of AI in software development

The AI-centric imperative: Navigating the next software frontier