It now has eight live endpoints, a tiered pricing page and an alpha banner warning people the schemas might still move under them. None of that was the plan. The plan was to understand how UK rail data actually fits together, and to work out how to use an LLM properly on something real instead of a toy. The product is just what happened while I was doing that.
I'm still not sure it becomes a business. I'm completely sure the way I learned to work on it was worth the time, and that part I now use every day on everything else.
Why rail data made a good sandbox
One station has more names than you'd believe, and that's the whole problem. King's Cross is KGX to the fares system, KNGX to the timetable and 54311 to the movement feeds, with a handful more codes besides (NLC, ATCO, UIC), each from a different corner of the railway. You don't need to hold those in your head. Nobody ever agreed on one name, so every feed brought its own. It gets worse with size: a big terminus like London Bridge doesn't have one timetable code, it has a cluster of them, roughly one per platform group, so even "which code is the station" isn't a clean question. Then there's the live side. Darwin (the real-time running feed) pushes updates at you as a stream, while the reference data turns up separately, each source on its own schedule and in its own shape. Before you can render a single departure board you've written a reconciliation layer and a code-mapping table, and now you own both of them.
That mess is exactly why it was good to learn on. Bounded enough that you can actually finish it, awkward enough that you can't bluff your way through. You either understand how the feeds relate or your departure board quietly shows the wrong train.
The workflow grew up alongside the project
The workflow I use now didn't arrive fully formed. It evolved, and the project is where each step earned its place.
I started where most people start: a prompt and a plan. Ask for a thing, get a plan back, let it build. That's fine for small, self-contained work. It fell apart the moment a task touched code the model hadn't really looked at. It would produce something plausible and confidently wrong, and then I'd spend longer unpicking it than the thing was worth, either fixing it by hand or trying to prompt my way back out, which sometimes just dug the hole deeper. The failure wasn't the model being bad. It was me asking it to act on intent it didn't have.
So the requirement moved to the front. Before any code, I'd work up a proper PRD with the model: I'd set the direction and push back, it would draft and fill in. These weren't a paragraph of good intentions. They grew into real documents, with the goals stated plainly and the out-of-scope list stated just as plainly, functional requirements (what it does) sitting next to non-functional ones (how fast, how reliable, what the limits are), a sketch of the technical architecture, the phases it would be built in, how it would be tested and what might go wrong along the way. Then I'd break that down into tasks small enough to review one at a time. The output got noticeably better, because the model was working to a brief instead of filling in the gaps itself. It stayed focused on the thing in front of it, and it had helped write the thing that kept it there.
The step that changed the most came later, and it wasn't obvious to me at the start. A PRD is only as good as your understanding of the code it lands in. So before writing the PRD, I'd have the model research the existing codebase and write up how the relevant part actually works, what's already there, which patterns to follow.
The thing that made the research useful was a hard rule: document the codebase as it exists today, and nothing else. No suggested improvements, no root cause analysis, no critique, no refactoring proposals, no architecture it wished were there. Only what exists, where it lives, how it works and how the parts connect. Left to its own instincts a model will reach for the fix, because pointing out problems reads as helpful. Forbidding all of that kept the output objective, a technical map of the system as built with the opinion stripped out. Adding that step changed the relationship. The model stopped being the author and became something that I directed, sent to find things out and report back rather than left to decide what to build. The research feeds the PRD, the PRD feeds the tasks, and I'm steering at every handoff.
Review runs through all of it. I read the research, the PRD, the tasks, the plans, not just the final diff. That order matters more than it sounds. If the research is wrong, the PRD inherits the error and every task underneath it inherits it again, and by the time you're reviewing code you're three layers downstream of the actual mistake. The review at the top is worth far more than the review at the bottom.
That's the honest line between this and vibecoding. It was never about whether I used the AI. It was about whether I ever let go of understanding what it was doing, and I made a point of not letting go.
What not rushing bought me
It wasn't all done this way from the start. The early parts of Headcode were built the way most things get built with an LLM, by prompting, getting something working and moving on. The discipline came later. As the research-into-PRD-into-tasks process settled, designing before implementing became the default, and the project quietly split into a scrappy first phase and a deliberate second one.
You can see the join. The later work, the bulk of the API surface, the endpoint groups, the schema as it stands now, was designed before any of it was written, the spec settled while the code was still hypothetical. The early prompted bits I've mostly gone back and rebuilt to the same standard, because once you've felt the difference the scrappy version nags at you.
What fell out of that patience is an API where the schema is the contract. It's OpenAPI-first, with a downloadable spec you can point a client generator or contract tests at. Identifiers resolve cleanly: hand it any code system and it gives you back all of them, so the reconciliation table that would normally be a thing you maintain becomes a field you read.
Vibecoding the same idea would have got me a convincing departure board demo and a wall the moment the identifier resolution got hard. Rail data punishes building before thinking, which is precisely what made it a good teacher.
Where it actually is
Headcode is in alpha. There's no self-serve signup. Access is by request, you email me with what you're building and I send a token. That's deliberate while the data and the schemas are still settling.
I genuinely don't know whether there's external demand for it. It might just stay a personal project, something I experiment with and build other things on top of, now that I've got the rail data in a clean format to start from. That alone was worth doing. I want to build visualisations on top of it, possibly a small app, and a clean API I control is reason enough to have built the thing. If people turn up actually wanting the data, it might grow into a small SaaS. What I'm not going to do is manufacture a roadmap I don't believe in, or pretend there's urgency around a project I started in order to learn.
The part I kept
Whatever Headcode turns into, the workflow has already paid for itself. I went in wanting to learn how to use an LLM well on a real codebase, and the research into PRD into tasks pipeline, with review at every layer, is now simply how I work with one. The product is a maybe. The method, I kept.
If you want to see what came out of it, Headcode lives at headcode.dev, and the API docs — endpoints, schemas and the OpenAPI spec — are open to browse at docs.headcode.dev.