Parallel Agents and the New Bottleneck in Software Engineering

January 2025

Parallel Agents and the New Bottleneck in Software Engineering

Today powerful LLM agents have rocked the foundation of software engineering.

For the last half century, code has looked remarkably similar. Scientists were writing imperative software in Fortran in the 1950s in much the same style we still write today. The syntax evolved, the tooling improved, the hardware became absurdly faster, but fundamentally software engineering remained a deeply human process of understanding systems and translating intent into code.

LLMs have changed that. But not in the way most people think.

The Fantasy of Parallel Software Engineers

A common vision of the future looks something like this:

  • Spin up 10 agents
  • Assign 10 tickets
  • Merge 10 PRs
  • Achieve 10x output

In practice this breaks down very quickly in production environments.

Yes, agents can technically work in parallel. You can isolate them using git worktrees, separate branches and local environments. You can even get them to independently “complete” tickets.

But there is an intrinsic bottleneck: review.

The agents can complete 10 tickets in a day, but reviewing those tickets is inherently single threaded. It requires human focus, direction and understanding of the larger system.

And most importantly: the work often is not actually complete.

The biggest issue I see with autonomous agents is confidence without understanding.

They will:

  • make major architectural mistakes,
  • misunderstand surrounding patterns,
  • violate abstractions,
  • duplicate logic,
  • silently break conventions,
  • claim tasks are complete when they are not.

Even branch management is surprisingly fragile. Agents regularly lose track of their worktree boundaries, make changes on the wrong branch or accidentally bleed changes across contexts.

The issue is not code generation anymore. The issue is system comprehension.

AI scales local implementation much faster than it scales global understanding. That distinction matters enormously.

Where LLMs Actually Excel

I think people make a mistake when they talk about AI capability as if it is uniform across all engineering work.

It isn’t. The usefulness of an LLM depends heavily on the topology of the task.

Finding Bugs

Bug fixing is primarily about understanding.

A production bug can easily be solved in one line. The difficult part is identifying:

  • where the issue originates,
  • which assumptions failed,
  • whether the bug represents a deeper systemic issue.

LLMs are genuinely useful here, but mostly as thinking partners:

  • exploring code,
  • tracing execution paths,
  • debating hypotheses,
  • identifying edge cases,
  • accelerating investigation.

The actual fix is often trivial.

Broad Structural Changes

This is where AI struggles most.

Large interconnected systems require:

  • architectural awareness,
  • historical context,
  • understanding tradeoffs,
  • understanding team conventions,
  • understanding future direction.

AI can assist in tracing a single prop through a React tree or refactoring an isolated utility function, but broad system evolution still belongs primarily to engineers.

The larger the blast radius, the more important human judgment becomes.

Single File or Isolated Function Changes

This is where LLMs are legitimately excellent.

Constrained contexts with well-defined boundaries are ideal for current models.

But even here there are subtle issues.

The model often copies patterns from elsewhere in the codebase without communicating where those patterns came from. This creates a hidden review burden:

  • was this duplication intentional?
  • should this logic be shared?
  • does a utility already exist?
  • is this pattern deprecated?

The generated code may look correct while still slowly degrading the architecture.

Ticket Creation

This is one of the highest leverage uses of AI today.

Most tickets historically were terrible:

  • vague titles,
  • incomplete requirements,
  • missing acceptance criteria,
  • no technical context.

Given good templates and sufficient repository context, AI can often produce tickets better than humans. Not because the model understands product better, but because it is extremely good at structured synthesis.

The reduction in effort here is enormous.

Potentially 80%.

PR Creation

LLMs are also very good at PR generation.

Given:

  • commit history,
  • ticket information,
  • templates,
  • repository context,

they can usually explain:

  • what changed,
  • why it changed,
  • how it works.

Where they still struggle is validation:

  • testing instructions,
  • rollout concerns,
  • production risks,
  • edge cases.

Unit Tests

This is another area where AI performs surprisingly well. To the point where it becomes tempting to stop reviewing the tests carefully. That is dangerous.

The failures are often subtle:

  • mocking the very component that should be under test,
  • testing implementation details instead of behavior,
  • generating assertions that still pass after breaking the feature,
  • creating tests that merely mirror the implementation.

The tests look comprehensive while silently providing little protection.

Planning is Becoming Overrated

One of the stranger trends I see emerging is excessive AI-generated planning. Massive implementation plans. Massive design docs. Massive task decompositions.

In reality, for many engineering tasks, reviewing the plan takes longer than simply doing the work.

AI-generated plans often contain:

  • subtle logical flaws,
  • incorrect assumptions,
  • invented abstractions,
  • over-engineered decomposition,
  • false certainty.

The verbosity creates the illusion of rigor.

For many practical engineering tasks, shorter feedback loops are more effective than giant speculative planning exercises.

The Role of the Engineer is Moving Up the Stack

As implementation becomes cheaper, engineering value shifts upward.

The bottleneck is no longer typing speed.

It is:

  • system understanding,
  • architectural judgment,
  • prioritization,
  • coordination,
  • review quality,
  • operational awareness.

This is why I’ve found developer dashboards increasingly valuable.

A modern engineer is managing far more than code.

The dashboard I currently want includes:

  1. PR pipeline statuses
  2. Jira ticket statuses
  3. What teammates are currently working on
  4. Deployment monitoring
  5. Important communications
  6. AI-level coaching and mentorship insights
  7. Tracking completed work against engineering competencies

The engineer increasingly becomes the orchestrator of a complex socio-technical system:

  • humans,
  • agents,
  • CI pipelines,
  • deployment systems,
  • communication platforms,
  • product requirements,
  • operational metrics.

The implementation layer is being partially commoditized. The coordination layer is not.

The Big Picture Still Belongs to Engineers

LLMs are not replacing software engineers. But they are radically changing what software engineering is.

The industry spent decades optimizing implementation speed:

  • better languages,
  • better frameworks,
  • better tooling,
  • better abstractions.

Now implementation itself is becoming abundant. But understanding remains scarce. And software engineering has always ultimately been a problem of understanding.

Matthew Martin