When the Wrong KPI Starts Making Sense

Measuring AI usage instead of results looks like a category error. Most of the time, it is. But sometimes, it is less about measurement and more about forcing a shift in how people operate.

May 26, 2026

Recently, I heard something that sounded like a punchline. It wasn’t.

In some software companies, developer performance is now measured not just by delivery or quality, but by how many AI tokens they burn each month. The more you prompt, the better your KPI. Tokens as progress.

The first reaction is obvious: why reward people for spending more? Tokens are not free. On the surface, this looks like incentivizing waste.

But look closer, and it gets more interesting. Not more correct, just more revealing. This is less a KPI than a lever to force a behavioral shift.

Activity Is Not Outcome. Even If It’s Measurable

As a KPI, token usage is a clean example of measuring activity instead of outcome. The distinction is not subtle. Tie evaluation to tokens, and you get more prompting, more interaction, more output. What you do not get is better software, faster delivery, or higher quality.

This is the familiar pattern: once a metric becomes a target, it loses its value as a signal. The system optimizes for the number, not the impact. From a distance, it looks like progress. Up close, it is mostly motion.

You are Not Buying Tokens. You are Buying Behavior

Yet companies persist. Which suggests the real goal is elsewhere.

What is really being purchased is not tokens, but a shift in behavior. The aim is to make AI interaction default, not optional. Left alone, most people stick to what they know, underestimate new tools, and avoid the friction of learning something that does not yet feel essential.

So the organization reaches for a blunt tool: remove the choice. Use AI, or fall behind. Eventually, enough people adapt, and the system shifts from experimentation to habit. It is not elegant, but it works. Until it doesn’t.

The Productivity Paradox Nobody Likes to Mention

The catch is that more output does not guarantee better outcomes. Developers can generate more code, product managers more documents, designers more options, journalists more content. At the organizational level, speed and quality do not always improve. Sometimes, they slip.

Output is not outcome. More code means more to review, test, and maintain. More content means more noise. More options mean more decisions. AI rarely removes work; it just moves it, often from creation to validation and correction.

There is also the perception gap. People often feel faster with AI, even when they are not. The interaction creates a sense of acceleration, while hidden costs pile up. This is typical of transition phases: the shape of work changes before the results do.

From Doing the Work to Supervising It

Underneath all this, the roles themselves are shifting. Developers move from writing code to orchestrating it, from direct problem-solving to supervising machine-generated solutions. Product managers shape and evaluate AI outputs. Designers explore wider spaces before narrowing in. Customer care agents supervise more than they compose. Even journalists, in organizations that limit AI in reporting, use it for distribution, summarization, and research.

The pattern is not replacement, but reallocation. New habits, new judgment, and new definitions of good work follow.

Where Forced Adoption Actually Creates Leverage

Forced adoption sometimes creates leverage, but only in the right places. Where work is repetitive, structured, and easy to check, and where mistakes are cheap, AI speeds up iteration. Product managers draft faster, designers explore more options, customer care teams respond more consistently, journalists repackage content across formats. The value is in acceleration, not in replacing judgment.

In these cases, the gain is speed and breadth of exploration, not the outsourcing of decisions. The path to better choices gets shorter, but the choices remain yours.

Where It Quietly Breaks Things

The same approach breaks down in areas that depend on judgment, context, or accountability. Decision-making, prioritization, editorial voice, and complex reasoning do not improve when forced through AI. Here, the tool adds friction. People game the metric, offload thinking they should keep, or spend more time fixing AI output than doing the work themselves.

This is where the original instinct holds: the KPI is wrong. In these areas, mistakes are expensive and quality is hard to measure.

What a More Mature Approach Looks Like

More mature organizations move on. The question shifts from ‘did you use AI’ to ‘did it matter.’ Activity gives way to outcomes, volume to impact. AI becomes just another tool, visible in the workflow only when it adds value.

Judgment returns, including the choice not to use AI when it adds nothing. Experimentation is separated from production. Over time, the organization builds capability, not just usage. The goal shifts from forcing adoption to making effective use unavoidable.

It’s Not Just About Developers

This pattern is not limited to software. The same dynamic plays out in product management, design, customer care, journalism, marketing, analytics. Forced usage speeds adoption where work is structured and execution-heavy. It quietly erodes quality where judgment and context matter.

The real mistake is not force, but lack of precision. Treat every task as equally automatable, and the results will be equally mediocre.

So Was the KPI Stupid?

As a long-term measure, it is hard to defend. As a temporary lever to break inertia, it sometimes works. The problem is when organizations stop at the metric and keep optimizing for tokens, long after the real goal — changing how people work — should have taken over.

In the end, this is not about tokens or even productivity. It is about reshaping habits. And that is always more complex, and more fragile, than any KPI can capture.

Director’s Fallacy

Discussion about this post

Ready for more?