This does not reflect the opinion of my employer
TL;DR: we are DDoSing each other with code reviews, but it doesn't have to be this way.
Software engineering was solved until about last year. More or less.
Most projects got finished. We know which roles to hire and what levels they should be. We even have a default process,
Scrum. If you can't lead an engineering team by yourself, just rub some Scrum on it. It won't be the fastest team in the
world, but by golly they will finish the project at some point.
Projects still have lots of problems of course. But we're not bemoaning our ability to ship software like we did back
in the '90s and '00s. We know we can do it. We can ship in spite of the problems around us.
Tech leads and project leads are a big reason for this success. They guarantee engineering outcomes. They collaborate
with design and product. They lead the technical design. They are the primary reviewer for the project. And what happens
when you make someone responsible for the technical execution of a project? They go into the important code reviews,
even the ones they're not assigned. Even if they don't say a word, you know they're reading over the code looking for
unhandled error cases and race conditions, making sure there aren't fatal flaws under the surface.
This "lead" is a strawman of sorts; it doesn't have to be one person. Maybe you had a cabal of 3 engineers out
of 10 that guided the engineering, or you had a small stacked team and they could all truly retain context and hot swap
for each other. But for the remainder of the post I will talk about "the lead" and we will all know what I mean.
The lead is the hub in a "hub-and-spoke" team model, since code reviews flow into this central point. Sometimes you'll
send individual reviews to other people. But again, if someone is responsible for the technical delivery of a project,
you know they're looking at what everyone's doing. You can't avoid this fanout, and it has been our secret sauce for a
while. But our new magic wand is turning this fanout into an antipattern.
The magic wand
OK, well, something changed. We have a new magic wand. This magic wand vomits code at an impossible rate. And the worst
part: the magic wand is pretty good! It's in a dangerous sweet spot. It can generate an entire "working"
website, soup to nuts, from a single large prompt. And hopefully, you took the time to make sure that the API keys
aren't on the client... and that the endpoints check auth... and it's doing something with CORS... and 2 dozen other
fiddly bits necessary to launching in production... and its not leaking debug errors with important information in
5xx responses... and before long you realize that the generated project wasn't even 50% of the way to a production
system.
This magic wand is pressuring our tech and project leads. Don't believe me? Go ask one who works with a lot of agentic
coders. "How has code review been feeling lately?" And you'll get a sigh and they'll tell you, "these tools are great
but it's a lot to keep up with." I don't know where the performance ceiling is for these tools. But it's obvious that
they will produce code faster and faster over the immediate future. This pressure will only increase.
This is creating an interesting problem. The hub will get DDoSed in all of these hub-and-spoke
team models. This means that your most senior engineers will be spending a disproportionate amount of time
reading and reviewing code.
This will create an even more interesting problem: a paradox! Our most senior engineers will practice less with
these new tools, because they're spending their day sweating over line 351 and asking themselves "is it
REALLY okay for this module to take on a dependency to the database?" because these are the kinds of questions
that lead to decisions that avoid serious problems down the line. But the more junior members of the team are spending
their time getting better with agentic programming. They may even start to drive how it's used at the company, while
the more senior engineers begin to lack the experience to make these judgement calls themselves.
Software engineering isn't solved anymore. But we're still following the old rules and ignoring the magic wand
and its impact.
What can we do about it?
Here's the disappointing part of the post: I don't know!
But you should have seen that answer coming. I told you that software engineering isn't solved anymore! How can I tell
you a solution if I don't believe it's solved?
As a consolation prize, I want to higlight some tools and experiments I think are promising in the short term. The
situation is evolving rapidly enough that I can't assert an expiration date on these.
Pair programming / the buddy system
When I joined Google in 2010, Google had a regimented code review system. I'm sure it still does, but I haven't
worked there for a decade and I can't be bothered to ask anyone there now. Every changelist
needed approval by another engineer. Between you and the reviewer, someone needed to be in OWNERS for that directory
and someone needed to have "readability," a.k.a. clearance to write code in that language. And even if you had both
permissions, someone still needed to explicitly approve your CL.
But there was a neat workaround. If you pair programmed a CL with a second person, you didn't need to get it reviewed
by an external party, assuming that you and your pair had OWNERS and readability. This might not have been written
down anywhere, but it was a logical application of the rules. One person sent out a changelist and
another person approved it in the system. You also happen to be coauthors, but that wasn't forbidden at the time.
And that was a big deal at Google. Code reviews could get really bogged down. Some people just didn't review code
that often, and some people just loved bogging down reviews in nitpicks that couldn't be found in any style guide. I
knew a platform team that only reviewed external changes once a week, and if they left comments you needed to wait for
the next week to hope to God they hit approve. A shortcut was a big deal.
But nobody pair programmed. I sure didn't. I hate pairing unless we're bug hunting or someone's getting training. It
feels like a waste to burn 2x the engineering time when everything is going well.
But it's a potential solution to the hub-and-spoke problem with the magic wand. Here's what I'm imagining: a team
consists of staff/junior, staff/senior, and senior/senior pairs. These aren't permanent pairings; they're just today's
arrangement. Each team prompts together and looks at the output together. The pairing has enough combined seniority
that the pairing can own technical decisions. They have the authority to decide that their code can be shipped.
This has an important caveat. These pairings must understand when they need outside input. They need to gossip
to the other pairings if they need to highlight an architectural decision or a bad assumption. Or if the
pair cannot come to an agreement on a decision they need to find tiebreakers. But ideally these are exceptions; they
would be prompting together and reviewing together. By the end, both engineers agree on the technical outcome and own
the decisions.
In fact, this would become part of the definition of junior, senior, or staff engineer; how much you're trusted
to ask for input when you need it.
They don't have to literally sit with each other for the whole day. They just need to both be responsible for the
prompting and agree with the direction of the final code that ships. They don't need to sit together when they're
updating documentation or having meetings or shitposting on Slack. But at a certain point you're having a conversation
about it and making sure the architecture is reasonable and the verification is correct, and ensuring that you don't
need to raise any problems with the team.
How would this look on the previous example of a four IC and one project lead team? Maybe you have two senior:senior
pairings, one staff:junior pairing, and then the final floating engineer is situational. Maybe they're performing
individual IC work and it will be reviewed later with one of the pairings. Maybe the project lead is kinda doing
two pairing assignments at once (instead of effectively the four they had previously). I don't really care; it's your
team. You figure it out. But the important thing is that the project lead's workload doesn't scale with team size; the
number of pairings does.
I haven't literally pair programmed with someone else yet in this manner. But I've worked on some two-engineer projects
recently and it felt pretty good. Each of you have a default reviewer, and nobody is getting overwhelmed by N magic
wands.
This has some benefits. First, it provides a concrete path to hire and train junior engineers for your
organization. Even if you believe that the software engineering occupation will be decimated over and over by
advances in the technology until finally one of Sundar, Sam, or Dario are holding the head of the last engineer, admit
that you still need a way to teach new people how to do it. Second, it provides a role for staff engineers as a level,
which obviously I appreciate as a staff engineer.
Product engineers
For a few months, I've been saying that I need to become a product manager before a product manager
becomes an engineer. It turns out that that already existed, but I arrived at it independently. With the advent
of increases in coding velocity, it becomes possible to start projects closer to the final implementation than
ever before.
When I finally caught up on my unread backlog of The Pragmatic Engineer newsletters recently, I found an issue with
the subject line "The product-minded engineer",
which was an interview with the author of the book with the same name..
This was a book about the need to grow your empathy with the user, and ways that technical skills and product skills
can mesh together.
Why is this important? Look at the areas in LLMs that are seeing rapid development and rapid adoption. They're
all dev tool related! Developers can be insanely productive nowadays, assuming they don't need to figure out
what someone else needs them to build. But as soon as the topic is not "development" the process grinds to a halt.
But most companies aren't like that. If I've learned anything from working for B2B and B2C companies, it's that you
can't possibly guess what people need without an obsession over qualitative and quantitative feedback. Thus, I believe
that engineering is going to be more and more vital in the discovery phase of projects, where you're not even sure
what to build. The ultimate software engineer will be one that can perform the product discovery work themselves. It'll
be the ones that get better at producing up-front prototypes and iterating on those prototypes.
Have you ever seen a designer in a user research session, just tweaking upcoming mocks as a participant speaks to tailor
it to them? Or chatting with a PM and calling an audible to tweak a major part of the mocks before the next session?
Engineers who can do this kind of work will become more valuable because they will go beyond just putting hypotheticals
for reaction. They will be able to produce working systems for reaction. And sure, maybe they are only 50% prototypes
and there is still a bunch of productionization work. But it's clear how adding more firepower to the earliest product
iterations will only improve discovery.
I'm sure someone's gonna be like "oh no, the LLM will just be the product manager and the designer and the researcher."
Really? You're going to do research for a dating app by putting Codex in front of someone and having them explore
a user interview with questions like "So, puny human, is your situation more about copulation or procreation?" I don't
see it.
So yeah, I think there will be a period of time where the lines between discovery and execution will blur. I've never
worked for a proper startup, so it's possible I'm just making an assertion like "more and more companies will need
to act like a startup" or something. But I'll let the startup people assert that for me.
I think this will help address the hub-and-spoke problem because at the start of the project, you start with a
system that is already halfway there. You just need to refactor and add tests and productionize. This will reduce
the scope of projects (or more accurately, move a lot of scope to the discovery phase) and reduce the during-the-project
review workload.
AI code review
This one is exasperating. You have a magic wand that generates a pull request description, commit message, and code.
And now you want to check if that magic wand did good work. So you wave the same magic wand -- but held differently! --
and now it's going to see why this code was such a bad idea? It sounds stupid when you say it out loud.
But at the moment, they're actually pretty good; I'd wager that they find more nitty-gritty problems than I do. They
do all of the rote callsite checking that you might overlook. They catch swapped parameters of the same type
by noticing name mismatches. They will notice when you try to set a dangerous or weird config value.
I mean, not RELIABLY. Half of the comments are horrible.
"Oh no, you changed this!", said the bot.
"Buddy, that's the whole point", said Jake.
But it's a good first pass. I'd be comfortable if my company adopted this rule: "You can't ask for human review
until you do a pass with the bot and satisfy its comments." It removes silly errors so that the code reviewer can spend
time focusing on the big picture.
I don't think this is some panacea. If an agent produced a major architecture flaw, I don't expect its corresponding
reviewer to notice the flaw either. But it adds more value than noise at this point.
To summarize
- We used to love hub-and-spoke team structures, where reviews would fan-in to a lead engineer responsible for technical
execution.
- LLMs have increased execution velocity, putting lead engineers under additional strain.
- We need to rethink how to scale teams without scaling the lead's workload.
- This isn't a solved problem, but there are a few options.
- Working in pairs / the buddy system, where the pair has enough authority and responsibility to make decisions and
ship.
- Getting engineers more involved in the discovery process
- Having the bots help out with code review, to remove obvious problems before a human looks at it.