Putting gen AI to work in the People function

Our people analytics tools have come a long way. What began as static dashboards have evolved into tools that surface personalized insights for decision makers and employees. When gen AI entered the picture, we saw an opportunity, along with a set of hard questions. What could a gen AI agent meaningfully contribute? Should it define metrics? Retrieve data? Summarize trends? Or even explain why things happen?

Very quickly, one issue dominated our thinking: trust.

People dashboards deal with sensitive topics—headcount, attrition, hiring. Users expect answers that are not only fast, but authoritative. A wrong answer about attrition is not just a technical error; it undermines confidence in the entire system. And once trust is lost, it is hard to win back.

Even seemingly simple questions can be deceptively complex. Take a basic FAQ such as, “How do we define attrition?” On the surface, this feels straightforward. In practice, it is something of a tightrope walk. Large language models are designed to be helpful. If a user asks a question the system has not been explicitly prepared for, the chatbot will often respond anyway, drawing on general training data rather than signaling uncertainty. The result can be an answer that sounds plausible, confident, yet is completely misaligned with how the organization actually works.

Early on, we saw just how risky that could be. More than once, the chatbot produced responses that felt right, but weren’t. That experience made one thing clear: if gen AI was going to sit inside our people dashboards, it needed stronger guardrails.

Definitions were only the first hurdle. Most users do not just want explanations; they want answers to concrete questions. “How many analysts left last March?” “What does attrition look like for this office?” That meant building a chatbot that could retrieve and reason over real data.

Here, we ran into a second challenge. To operate efficiently, LLMs are optimized to be economical with tokens. For complex analytical questions, that can mean compressing reasoning or skipping intermediate steps. In a consumer setting, that might be acceptable. In people analytics, it isn’t. Skipped logic can lead to skipped context—and that is where errors creep in.

Rather than pushing the technology to do everything, we made a deliberate, and at times uncomfortable, choice to constrain it. We could have allowed the chatbot to generate queries dynamically. Instead, we limited its role to translating user intent into a controlled set of pre-built tools. When a user asks, “How many analysts departed in March last year?”, the chatbot does not generate its own query. It maps the question to an internal concept like attrition, executes the pre-written query associated with that concept to retrieve the data, applies the appropriate filters, and summarizes the result.

This approach trades some flexibility for reliability. It also makes the system less “magical” than some gen AI demos. But it dramatically reduces the risk of confident wrong answers—and helps build user trust over time.

Our first steps

We have started small, on purpose.

Our first chatbot is embedded in one of our simplest dashboards, covering headcount, arrivals, and attrition. Limiting scope reduces the number of combinations we have to anticipate and makes failures easier to spot. From there, we built a four-step process that sits behind every response, most of which users never see, but all of which matter.

First, the system interprets the user’s question and translates it into organizational meaning, deliberately weighting our internal dictionary over any external or generic definitions (i.e., our definition of attrition, not a commonly used external definition).

Second, it routes the request to the appropriate data retrieval tool—such as headcount or attrition—each backed by a curated set of SQL queries tied to the dashboard’s dataset.

Third, it executes the tool, applies filters, and summarizes the results in clear, plain language.

Finally, and critically, we log every prompt and response. This allows us to see what people are actually asking, where the system struggles, and how real-world usage differs from what we anticipated.

That last step has proven especially valuable. Some of the most useful insights have not come from QA testing, but from watching how colleagues interact with the tool and where it breaks.

We are intentionally testing with a small group of trusted collaborators before expanding further. This gives us space to learn, fix, and iterate without overpromising. It also helps establish credibility: users can see that we were experimenting thoughtfully, not deploying gen AI for its own sake.

What does this mean for people analytics teams

One of the more counterintuitive lessons from this work is that, in the short term, chatbots increase the workload for people analytics teams rather than reducing it. Designing guardrails, monitoring questions, reviewing failures, and refining definitions all require sustained effort. The logging we put in place is not just a technical feature: it is what allows the system to improve at all.

That investment, however, is what makes scale possible. As gen AI begins to handle more routine, well-bounded questions, such as basic headcount or attrition queries, analysts can spend more time on higher-value work: interpreting patterns, connecting signals across domains, and advising on complex decisions where context matters.

At the same time, user expectations are rising. As people become more comfortable asking questions in natural language, they increasingly expect fast, precise answers, and eventually, insights that span people, finance, operations, and client data. Meeting those expectations safely requires more than a chatbot.

Our experience so far has made a few things clear. Reliable answers depend on data that is well-engineered and consistently defined. They require careful design of how models reason and respond. And they rely heavily on people who understand the business context well enough to spot when something looks right but isn’t. In that sense, gen AI does not replace people analytics skills or judgment; it makes it more visible and more important.