May 15, 2026

Agent Orchestration

As I have been exploring deeper automation, I have found that orchestrating agents in more deliberate ways can be more useful than simple prompting. Which is obvious but the way to go about it is interesting.

It starts pretty simple. Call out to a model, send a prompt, and get a response back.

Start Prompt GPT-4 Review this code.

This is probably where most agent workflows begin, but as soon as we want to do more complex work, the broad prompt starts to feel inadequate. "Review this code" can mean many different things.

Break Into Specific Reviews

The next move is to split the broad prompt into more specific review prompts. Instead of asking for a general review, we ask for a specific lens.

Prompt 1 Review concurrency Look for actor isolation, async tasks, and data races.

Prompt 2 Review Glass design Look for SwiftUI Liquid Glass and interface issues.

Prompt 3 Review the story Compare the code to the requested behavior.

This helps the agent focus. Each turn now has a specific job it is meant to perform, instead of one broad instruction trying to cover every kind of review at once.

The Context Problem

The problem is that all of those investigations are now happening in the same context. The thread contains the diff, the concurrency notes, the design notes, the story notes, and the synthesis.

The more specific the reviews become, the more the main context fills up with intermediate work.

Main thread PR diff The shared input

Main thread SwiftUI notes Design investigation

Main thread Concurrency notes Architecture investigation

Main thread Story notes Product investigation

Main thread Final summary Mixed with all prior context

Introduce Sub-Agents

Sub-agents are awesome because each agent gets its own context as well as one specific goal to focus on. It can go deep on that goal without filling the main thread with every detail.

This is similar to how teams are built today. Design, QA, product, and engineering all look at the same work from different angles. They each have a specific role in deciding whether something is ready.

Main thread PR diff The shared input

Sub-agent SwiftUI reviewer Design investigation in its own context

Sub-agent Concurrency reviewer Architecture investigation in its own context

Sub-agent Story reviewer Product investigation in its own context

Return Focused reports Only the findings come back

Add An Orchestrator

Once sub-agents exist, something needs to organize them. That is the orchestrator.

The orchestrator receives the PR, creates the sub-agents, waits for the reports, and reports one coherent finding back to the user.

Main thread Orchestrator Creates focused review tasks

Main thread PR diff The shared input

Dispatch SwiftUI reviewer Reports design findings

Dispatch Concurrency reviewer Reports architecture findings

Dispatch Story reviewer Reports product-fit findings

Main thread Combine sub-agent outputs Merge the useful findings

Return Final PR review One synthesized answer

Selective Routing

A lot of the time we have PRs that are solely focused on UI or data flow, which means we might not need to run every sub-agent every time.

So the orchestrator needs a discovery step. Before it fans work out to sub-agents, it should inspect the PR and decide what kind of review is actually useful.

Main thread Orchestrator Decides which reviewers should run

Sub-agent Research Inspect the PR and identify review needs

Main thread Determine selected reviewers Choose which specialists should run

Skip SwiftUI reviewer Not needed for this change

Run Concurrency reviewer Relevant for async ViewModels

Run Story reviewer Relevant for product-fit review

Main thread Combine sub-agent outputs Merge the useful findings

Return Final PR review One synthesized answer

Model Choice

Let's take this one step further by also choosing and splitting the models. The orchestrator needs to run on our highest reasoning level because it is the one owning, planning, and outputting our final report.

The research step is different. It is read-only, and the goal is to inspect the PR quickly without editing code. That can run on a faster coding model while the main thread stays on the stronger model for judgment.

The reviewers can run lower because their jobs are intentionally narrow. A SwiftUI reviewer, concurrency reviewer, or story reviewer is not responsible for the whole answer. Each one looks through a specific lens and reports back the useful findings.

GPT-5.5 high Main thread Orchestrator Owns planning and routing

GPT-5.3 Codex medium Sub-agent Research Read-only PR investigation

GPT-5.5 high Main thread Determine selected reviewers Choose which specialists should run

GPT-5.5 low Reviewer SwiftUI reviewer Focused UI review

GPT-5.5 low Reviewer Concurrency reviewer Focused async review

GPT-5.5 low Reviewer Story reviewer Focused product-fit review

GPT-5.5 high Main thread Combine sub-agent outputs Merge and rank findings

GPT-5.5 high Return Final PR review One synthesized answer

Recap

We started with a single prompt, which still has its usefulness, but we ended with an orchestrator that can adapt to real workflows.

The orchestrator can scale up or down depending on the complexity of the work. Sometimes one prompt is enough. Sometimes the workflow needs research, routing, specialist reviewers, and a final synthesis.

That is the bigger shift: these tools are no longer just better autocomplete or better chat. Used well, they become building blocks for engineering workflows that can reason, route, specialize, and report back with structure.

Start

Start Prompt GPT-4 Review this code.

Orchestrated workflow

GPT-5.5 high Main thread Orchestrator Owns the workflow

GPT-5.3 Codex medium Sub-agent Research Read-only discovery

GPT-5.5 high Main thread Select reviewers Route the work

GPT-5.5 low Reviewer Specialists Focused checks

GPT-5.5 high Return Final review Findings merged and ranked

Final Thoughts

I can imagine where we can take this in the future. There are so many different workflows that could benefit from these orchestrators.

A product-manager story groomer could pull in context from adjacent user stories, inspect the code to find relevant testing steps, and maybe even include Slack comments to give the story better business backing.

I am sure this structure is going to change as agents become more capable and our tooling becomes more advanced. But as we keep experimenting with all of this, it is fun finding new ways to tackle problems.