Posted on ::

Over the past year I've started noticing something in the conversations I have with others when talking about AI usage at work. Ask someone how their company is handling AI tooling and you'll almost always get one of two answers. Either they've gone all in. Every engineer gets a Copilot or Cursor or Claude licence, token budgets don't exist, and leadership has basically said "spend what it takes, we'll figure out the ROI later." Or they're in the other camp: tight budgets, a handful of engineers with access, everything scoped to a pilot project with measurable outcomes before anyone else gets a look in.

There isn't much middle ground. Companies pick a lane and commit to it, and that choice says more about their culture than any AI strategy document ever could. But the longer I watch this play out, the more I think both camps are arguing about the wrong number. One side doesn't track cost at all and the other tracks it obsessively. Yet the thing that actually decides the bill, and whether any of this is sustainable, is one most of them never touch: the harness their models run through, and who controls it. That's what I want to get into here.


The All-In Approach

The all-in companies aren't subtle about it. Every engineer gets a Copilot, Cursor, Claude or ChatGPT Enterprise seat as part of onboarding. There's no token budget to track, no approval chain to navigate. The philosophy is straightforward: AI is the biggest productivity shift since the internet, and the companies that embed it deepest and fastest will pull ahead. Worrying about per-seat costs right now is like worrying about electricity bills in 1890.

In practice this looks like engineers using AI for everything. PR reviews get run through a model before a human sees them. Architecture discussions start with an AI-generated proposal that the team then critiques. Onboarding documentation gets drafted by pasting the codebase into a chat window and asking for a summary. Some teams are running AI agents that pick up tickets, open pull requests, and write their own tests, with varying degrees of success.

The cultural shift is the part that interests me most. When everyone has unlimited access, the conversation changes from "should we use AI for this?" to "why wouldn't we?" That's a genuine acceleration. Junior engineers who might have spent a week figuring out a Kafka consumer are shipping in a day because they've got a model walking them through it step by step. Senior engineers are spending less time on boilerplate and more time on the hard problems they actually enjoy.

The downside, of course, is that unlimited access doesn't come with unlimited judgement. I've written before about what happens when you run AI-generated SQL without understanding it, and that problem scales with the size of the organisation. When everyone's moving fast and the AI is confidently wrong about something subtle, the blast radius gets bigger.


The Cautious Approach

The other camp is just as deliberate, but their starting assumption is different. They see AI as a tool that needs to prove its value before it gets rolled out broadly. So they scope a pilot: maybe five engineers on a single project, maybe a specific workflow like test generation or documentation. Someone owns the budget spreadsheet. Someone else is tracking which prompts produced useful output and which ones burned tokens on nothing.

The philosophy here is that cost optimisation is a first-class concern, not something you figure out after the invoices land. These companies want to know what they're getting for their money before they commit to a hundred seats. They're not anti-AI (most of them are genuinely interested), but they're treating it like any other tooling investment rather than a cultural transformation.

What this looks like in practice is more restrained. A small team experiments, reports back, and leadership decides whether to expand. Token budgets are real and sometimes tight enough that engineers think twice before asking the model to rephrase a function they wrote six months ago. Project selection is careful: you pick something where the AI's contribution can actually be measured, so you have data to justify the next round of spending.

I understand the logic. If you're running a team where every pound matters, you don't hand out AI subscriptions like stickers at a conference. But there's a cost to caution too, and I think it shows up in ways that don't appear on the budget spreadsheet.


Who Actually Controls the Cost?

Here's the part both camps tend to underestimate: a lot of the cost was never theirs to control in the first place. The two biggest levers (which model you run and which harness you run it through) sit largely with the providers, not with you.

The model is the obvious one. Running everything through the frontier model versus a cheaper, smaller one can be an order of magnitude difference on the invoice, for output that's often good enough either way. But the harness matters just as much and gets talked about far less. The same task, on the same model, can burn wildly different amounts of tokens depending on the tool wrapping it. A harness that re-reads the entire codebase on every turn, or pads each request with a bloated system prompt, will quietly cost you several times what a leaner setup would for an identical result.

You can watch this happen in the tools you already use. Something like Claude Code doesn't send a fixed system prompt. It assembles one on the fly for every request, pulling in your project instructions, the definition of every tool it can call, the skills you've enabled, the MCP servers you've connected, and a pile of environment context on top. Add more skills, connect more servers, write more project config, and that prompt grows, and it gets sent again on every turn of the conversation. Under per-token pricing that's a standing cost most people never see, because the harness assembles it for you and never shows you the system prompt it built. Caching softens the repeated static part, but it doesn't make it free, and the parts that change from request to request aren't cached at all.

That has an awkward implication for the cautious camp. You can count seats and cap budgets all you like, but if the real spend is being driven by model choice and harness efficiency, you're optimising the wrong variable. And it's just as awkward for the all-in camp: you're building on pricing and tooling decisions the provider can change under you at any time. A model gets more expensive, a harness gets chattier in an update, and your costs move without you having touched a thing.

This is getting harder to ignore as the billing model shifts. The industry is quietly moving from flat-fee subscriptions to consumption-based pricing, and the early numbers are startling. In a June 2026 report, Gartner found that nearly a quarter of technology leaders are already spending between $200 and $500 per developer each month on tokens, with around 6% over $2,000. It also projects that by 2028 AI coding costs will overtake the average developer's salary. That headline deserves a caveat: Gartner's "average" is a global one, pegged to roughly $2,000 a month, not a senior Western salary. But the direction of travel is the point. When a harness can quietly burn 50,000 tokens on a single test-suite run, your spend is set by tooling decisions the provider can change under you, not by how many seats you bought.

I've watched this play out up close. When the bill lands on the company card rather than your own, spending thousands of dollars a month on tokens stops feeling like spending at all. I've seen engineers burn through more in a month than their whole tooling budget used to be for a year and not think twice about it, because it simply isn't their money on the line. That's fine while the models hold their price, but they don't. Each new frontier model tends to arrive more capable and more expensive than the last, often close to double, and usage only ever climbs. A way of working that already looks careless at today's prices doesn't become sustainable when the underlying cost doubles. It becomes less.


The Trade-offs

Neither approach is cost-free. The all-in camp gets speed and cultural momentum, but they're burning through budget on something whose ROI is genuinely hard to measure, and they're building a dependency on tools that might change pricing or disappear tomorrow. The cautious camp has control and cost visibility, but they risk falling behind competitors who are iterating faster, and they're potentially demoralising engineers who see peers elsewhere shipping with better tools.

DimensionAll-InCautious
SpeedHigh. Immediate developer acceleration and little context-switching friction.Lower. Rollouts and multi-step approval gates tend to slow things down.
Cost ControlPoor. Unpredictable consumption invoices driven largely by developer usage.Strong. Predictable seat metrics and explicit token caps.
InnovationHigh but chaotic. Plenty of organic experimentation, often alongside tool sprawl.Measured. Scoped wins that can be slow to spread beyond the pilot.
SatisfactionHigh. Engineers tend to feel trusted and well equipped.Mixed. Those still waiting on access can feel sidelined.
Retention RiskLower. The organisation is visibly keeping pace.Higher. Talent can drift from teams stuck in permanent evaluation.

The risk I think about most with the cautious approach isn't the money they're saving, it's the talent cost. Good engineers know what tooling is available elsewhere. If they're stuck on a team that's still "evaluating" AI while their friends at other companies are shipping with it daily, that's a retention problem with a price tag that doesn't show up on the AI budget line.


Is There a Middle Ground?

I don't think this has to be binary. The answer that makes the most sense to me is structured experimentation: give engineers access, but with guardrails that create visibility without creating friction. Let people use the tools, but measure what's actually working and redirect effort towards the patterns that produce results.

That's closer to how I work personally. I use AI heavily, but never as a black box. I write a PRD first, break the work into small, reviewable tasks, and treat the AI's output as a draft that I'm responsible for, not a solution I'm rubber-stamping. The model generates, I review. That separation, with generation and judgement living in different hands, is the part that keeps me from repeating the mistakes I made when I lost that database.

It's also why I've spent time building my own tooling and harnesses rather than living entirely inside off-the-shelf ones. Earlier I said the harness is one of the levers the provider controls. Building your own is how you take some of it back. When you own the loop, you decide how much context gets sent on each turn, which tools and MCP servers the model can actually reach, and where a task should stop and hand back to you. That control shows up directly on the invoice: the same work, on the same model, costs a fraction of what it does through a harness that reloads the world on every request. It's more effort up front, but it turns token spend from something that happens to you into something you decide.

Owning the loop also means you stop paying frontier prices for work that doesn't need them. Not every step of a task wants the same model. The expensive, heavy-thinking models earn their keep on high-level work such as planning an approach, breaking a problem into reviewable steps, and weighing up an architecture. But once the plan exists, most of the implementation is narrow, well-specified work that a cheaper, faster model handles perfectly well. Route the thinking to the expensive model and the grunt work to the cheap one, and the bill drops again with no real hit to the output. That kind of routing is hard to pull off inside an off-the-shelf tool that runs everything through whichever single model it defaulted to.

You can see the payoff in the wild. Mitchell Hashimoto recently described running exactly this kind of split, using one model as a planner and architect, a different one as the coder, and then the first model again as a judge to check the work. The numbers are the striking part. At API pricing he put the planning and judging steps in the region of a few dollars, against the $50 or more that a single full round trip through one frontier model would typically cost. It's an early experiment, and he's the first to say the longevity isn't proven, but the shape of the saving is hard to argue with. Same work, broken across the right models at the right price points, for a fraction of the bill.

I'm not the only one going down this road. There's a small but growing group of engineers doing the same, a lot of them building on Pi, a deliberately minimal coding-agent harness you're meant to reshape around your own workflow rather than bend yourself to fit. That framing is the whole point. People are shaping harnesses to match the way they actually work, and tuning them to get the best out of the specific models they're driving. That's exactly the control the off-the-shelf tools don't hand you. There's enough here for its own article, which I'll write separately. For now the point is simply that owning the harness isn't hypothetical. People are already doing it, and it changes both what the tools cost and how well they fit.

Info

The approach I landed on isn't complicated: write a plan, break it into tasks, review every piece of AI output before it touches anything real. The structure matters more than the specific tool or model you're using. Without it, you're just hoping the AI doesn't lead you off a cliff.

I don't see why that pattern couldn't scale to a team or a company. Give people access. Expect them to use it. But also expect them to own what they ship, to understand the code they're committing, and to stay curious enough to catch the model when it sounds certain and isn't. The guardrail isn't a token budget. It's a culture of reviewing output before trusting it.


Conclusion

I don't think either extreme gets it right. The all-in approach risks the kind of blind trust that cost me a database. The cautious approach risks paralysis dressed up as prudence, and in an industry that moves as fast as ours, that's its own kind of expensive.

The answer is almost certainly somewhere in between. Give people access to AI tools. They're genuinely useful and they're not going away. But teach them to use those tools with judgement. Create a culture where questioning the AI's output isn't seen as a lack of skill but as a basic professional reflex, the same way you'd review a colleague's pull request even if they were the best engineer on the team.

What matters isn't how much you spend or how many tokens you burn. It's whether that spend is something you control or something that just happens to you. The teams that come out of this well won't be the ones with the biggest budgets, or the strictest ones. They'll be the ones who own the harness their models run through, route the right model at the right cost to the right work, and keep a human in the loop whose judgement can tell a right answer from one that only sounds right. Spend you can't see or steer is the real risk. Spend you own is just a tool doing its job. That ownership, not the size of the invoice, is what decides whether AI makes your team better or just makes them faster at being wrong.