The Salary You Didn't Hire

The inference bill that made me stop wasn't the biggest one. It was the one that stopped looking like a software cost and started looking like a payroll line. Same shape: a large number, recurring, load-bearing, the kind you agree to once and then pay every month without re-reading. I'd added an always-on agent to do a job, and somewhere in there it had quietly become an employee. The kind you pay whether or not it did much that month.

That is the part nobody warned me about when the pitch was "let AI do it." For my whole career, software had one magic property that made automation a free lunch: once you'd built the thing, running it again cost essentially nothing. Add a feature, automate a task, serve one more user, the marginal cost rounded to zero. That is the assumption under every "just automate it" you have ever heard, and it is the assumption an AI agent breaks.

An agent has a metabolism. Every time it works, it eats: tokens in, tokens out, on every step of every loop, re-reading the whole conversation each pass because the model has no memory between turns. Leave it always on and it eats while it idles, waiting for something to do. This is not a rounding error at scale. Solo founders running always-on agents are watching monthly AI bills climb into the hundreds of thousands, numbers that sit uncomfortably close to the salaries the agents were supposed to save. The cost didn't disappear when you replaced the person. It moved to a different line of the same budget and changed its name.

An always-on agent isn't free leverage. It's an employee with a metabolism, and it eats whether or not it works.

Put the agent on the payroll

The first discipline is just to admit what you did. You didn't add a feature. You made a hire. So treat it like one: give it a fully-loaded monthly cost, the way you know what a person on your team costs before you know if they're worth it.

Almost nobody does this. We track headcount obsessively, one number everyone in the company can recite, and we let the agents accumulate uncounted because each one felt like "just an API call." Then the bill arrives as one lump and you cannot say which agent earned its keep and which one is the intern who shows up, runs all day, and produces nothing you'd miss. A payroll you can't itemize is a payroll you can't manage.

The move is unglamorous and it works: give every agent a name, a job, and a monthly cost. The moment each one has a salary next to it, the questions that manage a team start working on your software too. Is this role worth what it's paid? Is it full-time work or did I put a full-time agent on a part-time job? Would I re-hire it today?

Cheaper than a person is not automatic

Here's the assumption I want to kill, because I believed it too: that an AI hire is obviously cheaper than a human one. Sometimes it is. Often it is a bad hire wearing a cheap price tag, and you can't tell which until you've costed it.

The agent is a bargain when it does work that is high-volume, repetitive, and would genuinely have needed a person, and it does that work in a tight loop that finishes fast. It is a terrible hire when it runs frontier-model calls twenty-four hours a day to do something a cron job and a cheap model could handle at a fiftieth of the cost, or when it re-reads a novel's worth of context on every trivial step, or when it "reasons" for six paragraphs before answering a yes-or-no question. None of that shows up in a demo. All of it shows up on the invoice, thirty days later, as a number that looks a lot like a salary you didn't mean to commit to.

So audit your agents the way you'd audit roles you inherited. Not "is this cool," but "what does this cost per month, what does it produce, and what is the cheapest version of it that still does the job." Most teams have never run that audit once. The first time you do, you find the intern.

The levers that turn an expensive hire cheap

The good news is the same as with a real team: the cost of an employee is not fixed, it's managed. And the levers that make an AI hire cheap are boring, mechanical, and mostly things you can do this week. This is the whole reason a few people can run serious volume without a serious payroll: the same agent, doing the same job, can cost like a mid salary or like a rounding error depending on how you run it.

One agent, one job, three bills

Naive: frontier API, whole transcript re-read every call, always on$9,000/mo

It rivals the salary it replaced.

Plus prompt caching and a pruned context window$3,000/mo

Stop paying to re-read what never changed.

Plus a small open model on your own hardware$600/mo

The hire becomes a rounding error.

Magnitudes are illustrative, the shape of the argument, not a benchmark. The levers and their rough order are real and live in The Economics of a Token. Same agent, same work: the bill is a decision, not a fact.

Three of those levers do most of the work, and I've written about all of them from the engineering side. Prompt caching: the unchanging part of every prompt, the system instructions, the rules, the early turns, gets read once at full price and then costs a fraction on every call after. Skip it and you pay full freight, every loop, to re-read words that never changed. Pruning the context: stop carrying the entire transcript on every pass, keep what's live, summarise the rest, and you cut the tokens each step eats. And the big one, the small model: a smaller open model in a tight loop finishes in fewer tokens and costs less per token, and once your volume is real, running it on hardware you own instead of renting every token off someone else's meter is what drops the floor out from under the bill.

Notice what these have in common. Every one of them is the software version of managing a person's time. You don't pay a senior engineer to do data entry. You don't make anyone re-read the entire company wiki before every task. You don't keep someone on the clock overnight to watch a queue that fires twice a day. Costing an agent well is just refusing to pay frontier rates for busywork, applied to a worker that never complains about it.

Why a small team lives or dies here

This is the essay in the series where the whole promise gets tested, so let me be blunt about the stakes. The pitch of a small team is leverage: a few people plus systems doing the work of a department. But if every "AI hire" quietly costs like a real hire, the leverage is a mirage. You haven't escaped the big-company cost structure. You've rebuilt it with fewer humans and a scarier bill, and you've kept none of the headcount you were told you'd save.

The edge only exists if your AI headcount is genuinely cheap, and cheap is a choice, not a gift the technology hands you. This is why I keep insisting that owning your cost structure is strategy and not thrift. The company across the street renting every token pays a floor it does not control; the small team that caches, prunes, runs small models, and owns its hardware pays a floor it built. Same agents, same work, two different businesses, and the difference is entirely in whether someone treated the payroll as something to manage.

The question was never "human or AI." That framing lets you skip the only question that matters. It's "what does this hire cost, fully loaded, and is it earning it." Ask that of every agent you run, name each one, price each one, fire the interns, and the small team stays small on purpose instead of small until the bill catches up. The cheapest employee you'll ever have still shows up on payroll. You just have to be the one who reads it.