I came across something very interesting last week. Sherwin Wu just pulled back the curtain on engineering at OpenAI, and what he revealed changes everything we thought we knew about measuring productivity.
95% of their engineers use Codex. Engineers who embrace AI tools open 70% more pull requests than their peers. The average PR review time dropped from 10-15 minutes to 2-3 minutes. Engineers are managing 10-20 parallel AI coding threads instead of writing code themselves.

As someone who’s spent the last seven years building engineering analytics, here’s what keeps me up at night: the metrics we’ve been measuring are answering yesterday’s questions.
And honestly? That’s the most exciting problem I’ve ever faced.
Let’s be direct about what’s happening.
When an engineer at OpenAI manages 20 parallel Codex threads and opens 70% more PRs than their peers, what are we actually measuring?
Every traditional engineering metric assumes humans write the code. That assumption just broke.
We’re trying to measure productivity with tools designed for a world where engineers typed into IDEs for 6 hours a day. That world ended sometime in 2023, and most engineering leaders haven’t noticed yet.
Here’s what Sherwin’s observations reveal: the job description changed, but the measurement framework didn’t.
The old job: Write code, review code, deploy code.
The new job: Orchestrate AI agents, steer parallel workstreams, make judgment calls AI can’t make, review AI output for strategic alignment.
The engineers opening 70% more PRs aren’t “more productive” in the traditional sense. They’re better at managing AI agents. They’re better at prompt engineering. They’re better at knowing which problems to delegate and which require human judgment.
That’s a completely different skill set. And we’re measuring it with the wrong instruments.
It’s like measuring a Formula 1 driver’s performance by tracking how hard they press the gas pedal. The pedal pressure doesn’t matter, the lap time does. But we’re still reporting on pedal pressure because that’s what our dashboards were built to show.
If AI writes 95% of the code at OpenAI, and that percentage is rising across the industry, what should engineering leaders actually measure? Here’s what I’m seeing with our customers who are furthest along this curve:
The bottleneck isn’t “how fast can we write code?” anymore. It’s “how fast can we decide what to build, validate it’s the right thing, and course-correct when it’s not?“.
The engineers thriving in AI-augmented environments make faster, better decisions. They kill bad ideas earlier. They validate assumptions quicker. They iterate on product direction, not syntax.
Traditional metrics don’t capture this. PR count doesn’t show decision quality. Commit frequency doesn’t show strategic thinking.
Sherwin mentions engineers running 10-20 parallel Codex threads. That’s not about typing speed, it’s about managing complexity.
The best engineers know how to:
This is a learned skill. Some engineers pick it up immediately. Others struggle. The productivity gap Sherwin mentions (between AI-embracing engineers and others) isn’t about who types faster. It’s about who adapts faster.
And right now, most engineering orgs have no visibility into who’s adapting and who’s falling behind.
When AI can generate the implementation, the engineer’s value shifts to defining the problem correctly.
Bad prompt: “Write a user authentication system.”
Good prompt: “Write a user authentication system that handles edge case X, integrates with our existing session management, follows our security standards Y, and accounts for the scaling constraint Z we hit last quarter.“
The difference is context, judgment, and institutional knowledge. AI doesn’t have that. Humans do.
So the question becomes: How do we measure whether engineers are providing the right context to AI agents?
Is it quality of AI-generated code after human review? Is it defect rates in AI-assisted PRs vs. human-written PRs? Is it time from prompt to production-ready code?
Here’s where this gets uncomfortable for most CTOs and VPs of Engineering.
Your current dashboards are measuring the wrong things.
If you’re still reporting to your board on:
You’re reporting on theater. You’re measuring the appearance of productivity, not the reality of value creation.
Worse, you’re potentially optimizing for the wrong behaviors. If engineers know they’re measured on PR count, and AI lets them open 70% more PRs, guess what happens? PR count goes up, but business outcomes might not.
The scary part: Most engineering leaders know this, but don’t know what to measure instead.
The frameworks we’ve relied on were all designed for human-written code. They’re not wrong, exactly. They’re just incomplete for an AI-augmented world.
At Waydev, we’ve been having versions of this conversation with engineering leaders for the last 18 months. The questions keep evolving:
Here’s our thesis on where engineering analytics needs to go:
Stop measuring how much code gets written. Start measuring how quickly validated ideas reach customers.
The relevant metrics:
AI compresses the implementation phase. So measure the phases AI can’t compress: problem definition, strategic alignment, outcome validation.
The engineer managing 20 AI threads isn’t “20x more productive.” They’re leveraging a different system architecture.
What matters:
This shifts measurement from “how productive is Engineer X?” to “how efficient is the human-AI system?”
Traditional metrics tell you what happened. In an AI-augmented environment, you need metrics that predict what’s about to happen.
Sherwin mentions that top performers become disproportionately more productive with AI. The gap is widening.
Can you identify your top AI-augmented performers before they leave for a startup that will 10x their leverage? Can you see which teams are adopting AI tools and which are resisting? Can you predict which engineers will thrive in the next 2-3 years?
These are leading indicators. Most dashboards show lagging indicators.
Your engineering org is no longer just humans. It’s humans + AI agents.
Eventually, you’ll need to measure:
The winning orgs will be the ones who optimize the collaboration between humans and AI, not just one or the other.
Here’s what I believe, even though it makes our product roadmap significantly harder:
Most engineering metrics will need to be reinvented in the next 3 years.
The frameworks we built at Waydev, well, the frameworks the entire industry built, were optimized for measuring human engineering teams. They worked because the assumptions held: humans write code, humans review code, humans deploy code.
Those assumptions are breaking. Fast. At some companies, AI already writes the majority of code. At OpenAI, it’s 95%. Within 2-3 years, Sherwin believes we’ll see one-person billion-dollar startups.
If that’s true, and I think it is, then measuring “how many PRs did your team open this sprint?” is like measuring how many horses your company owns in 2024. Technically measurable. Completely irrelevant.
At Waydev, we saw this shift coming. Not because we’re clairvoyant, but because our customers at the frontier started asking questions traditional metrics couldn’t answer.
For the past 18 months, we’ve been working with engineering orgs where 50%+ of code is AI-generated, where engineers orchestrate AI agents more than they write functions, where the old playbook has already broken down.
What we’ve learned is shaping the next evolution of engineering analytics:
New measurement frameworks we’re deploying:
These aren’t experiments anymore. They’re becoming core capabilities in Waydev – because the companies winning in AI-augmented environments are the ones measuring the right things.
The shift is clear: Engineering Intelligence needs to evolve from measuring coding activity to measuring strategic leverage. We’re building for that future, not retrofitting old frameworks.
Here’s the exciting part, and why I’m more energized about Waydev’s mission now than ever:
Engineering leadership is about to get way more strategic.
When AI handles the implementation, the human contribution shifts entirely to judgment, strategy, and orchestration. That’s a higher-leverage role.
CTOs won’t spend board meetings defending story points. They’ll discuss:
This is a better conversation. More strategic. More tied to business outcomes.
But it requires different data. Different metrics. Different dashboards.
That’s what we’re building.
Sherwin says the next 2-3 years will be the most exciting in tech history. I believe him.
For engineering leaders, the challenge is this: your job is changing faster than your measurement tools.
You can’t manage what you can’t measure. And right now, most of what matters in an AI-augmented engineering org is invisible to traditional dashboards.
The orgs that figure out the new measurement framework first will compound advantages faster than their competitors. They’ll identify top AI-augmented performers earlier, optimize human-AI workflows better, and make faster strategic decisions.
The orgs that keep measuring lines of code and PR velocity will optimize for the wrong things and wonder why their “productivity improvements” don’t translate to business outcomes.
At Waydev, our job is to help engineering leaders see clearly in this transition. To measure what actually matters, not what’s easy to measure. To provide visibility into the human-AI system, not just the human part.
Because here’s what I know for sure: the future of software development isn’t about writing more code. It’s about making better decisions about what code to write, and orchestrating AI agents to write it.
And if that’s the future, then the measurement frameworks need to evolve too.
Get a demo call with Waydev to explore the new frameworks we’re creating for AI-augmented teams.
Let’s build that future together,
Alex
Ready to improve your SDLC productivity?