Tyler Schultz Writing

/goal

It was cool to test out the new /goal command in Codex. I was able to move a large Python backend to 100% unit test coverage, then use that coverage to start a refactor to Rust. I can see how this will be useful in the future, and what new questions it poses.

Use /goal when a task needs Codex to keep working across turns toward a verifiable stopping condition.

OpenAI

The first thing I thought to try was code coverage. It is an easy goal to verify:

Make sure all of the backend logic has unit test coverage.

It also had real value beyond the experiment, because better tests would make the eventual Rust refactor safer. And it felt like the kind of task that would take a while to complete. I was right. It ran for nearly sixteen hours.

Codex terminal output showing Goal achieved after 15 hours and 55 minutes
The first useful result was not the Rust code. It was a finished validation pass after nearly sixteen hours.

Unit Test Refactor

With any refactor like this, you need a source of truth. Before changing the implementation, something has to define what the system already does. I knew from the weeks prior that the backend was working correctly, so for this project the existing Python codebase became the guideline for the tests.

So the unit-test goal was not asking Codex to redesign anything yet. It was asking Codex to read the current behavior and write tests from it. For each module, the job was to find the reachable logic, identify the branches and edge cases, add tests, run them, and keep iterating until the backend was covered.

By the end, Codex had generated roughly 2,200 unit tests across a few hundred files.

I am still validating the full result, but so far the daily runs with this library have been working correctly. It even found a bug and fixed it itself.

Rust Migration

Once I had these tests in place, I wanted to try something more ambitious: refactoring the entire backend to Rust. I had heard Rust could be much more performant, and again, it seemed like a good experiment for the /goal command.

The unit tests gave the /goal command a verifiable result to feed off of. I could keep the tests as they were, run them against the Rust-backed implementation, and use the same suite to verify that the conversion was still correct.

That is the kind of task where /goal starts to make sense. The work is large enough to take a long time, but the stopping condition is concrete: keep migrating logic, keep running the tests, and do not move on unless the behavior still matches.

Overly Ambitious

My first Rust goal was too vague. I did not clearly describe the Rust module structure I wanted. I also did not say that the new Rust code did not need to preserve the old Python layout.

So Codex kept moving, but in the wrong shape. Around seven or eight hours into the goal, I had a nearly 30,000-line lib.rs file. I stopped it, reorganized the architecture, and resumed with clearer constraints.

That was the main lesson:

/goal will keep going.

That is the feature. It is also the risk. If the direction is right, the hours compound. If the direction is wrong, they compound there too.

A Better Approach

The better workflow was slower than just typing /goal rewrite this in Rust. It made more sense to talk through the codebase first, make the architecture explicit, create a Markdown plan, and then let Codex feed the goal off of that plan while checking things off as it went.

01 Talk through the codebase
02 Decide on the architecture
03 Write a Markdown plan
04 Start /goal from that plan
05 Let Codex check off progress
06 Keep running tests

The plan mattered more than I expected. Codex had a durable path to follow, and I had something concrete to review. Suddenly, the long-running work felt less like a black box. That is the version of /goal I would reach for again: large, verifiable tasks where the plan can become the thing the agent works against.

Still TBD

The Rust performance numbers are still TBD, but the workflow lesson is already useful. The question is no longer just, "Can the agent build it?" It is much more, "What should I build?"

The next experiment I want to try is in a codebase where I have stronger taste: a legacy Swift app. Something like moving an older architecture toward modern Swift patterns with Observable, cleaner view models, and a more current app structure.

That feels like a better test of the workflow. Python to Rust taught me what /goal can do with a clear validation loop. Swift would let me judge the architecture more directly, because I know what good looks like there.