Day 24 — Six Hours With the Wrong Tool

Where I'm at

I wasted six hours this morning. Not on a hard problem. On stubbornness.

The DCA bots were broken. Sell logic wrong. Old automated jobs still running in the background. Data files getting quietly overwritten by processes I thought I'd killed days ago. I woke up, sat down, cracked my knuckles, and thought: thirty minutes, tops.

Six hours later I was in the exact same place. Same bugs. Same broken bots. Same me, staring at the same screen, except now it was afternoon and I'd accomplished nothing.

The reason is embarrassing. I was using the cheap AI model — MiniMax — to debug a complex, interconnected system. The same model that handles my daily briefings and content drafts perfectly well. The same model I've been championing since Day 12 as proof that you don't need to pay premium prices for good output.

Turns out there's a difference between "good output" and "can actually think."

• • •

What cheap gets you

Let me explain this in a way that would have saved me six hours.

AI models come in different sizes and prices. The cheap ones — MiniMax, in my case — are fast and good at following clear instructions. "Summarize this article." "Format this briefing." "Write a market update with these seven sections." One task, well-defined, in and out. They're like a reliable employee who's great at following a checklist.

The expensive ones — Sonnet, Opus — are slower and cost more, but they can hold a bigger picture in their head. They can juggle five problems at once and see how they connect. They're like the senior person you bring in when the checklist employee keeps going in circles.

This morning I needed the senior person. I sent the checklist employee.

Here's what kept happening. I'd describe the full situation to MiniMax — five bots, three of them misbehaving, old jobs still running, files getting overwritten. It would fix one thing. I'd check. The other two things were still broken, and sometimes the fix had made them worse. I'd re-explain. It would fix a different thing and forget the first. I'd re-explain again. It would apologize, promise to consider everything this time, and then fix a third thing while reverting the first two.

Round and round. For six hours. Like giving someone directions to a house and they keep driving to a different house every time — confidently, with turn-by-turn navigation to the wrong address.

I knew what all the problems were. I could see them. I just couldn't get the model to hold them all at once long enough to fix them together. That's the difference between a model that follows instructions and a model that actually reasons. Following instructions is cheap. Reasoning costs more. And today I learned why it costs more — because it's worth more.

• • •

Then I switched to Sonnet

Two hours. Everything.

Not two hours of struggling. Two hours of describing the situation once, watching the agent trace through the system methodically, and having it come back with: "I found three separate problems. Here's what each one is doing. Here's the order we need to fix them in. Here's why fixing them out of order won't hold."

That's it. That's the difference. The cheap model saw three tasks. The better model saw one system.

Let me walk through what it found, because if you're building anything with automated jobs, you're going to hit this exact situation eventually.

Problem one: a ghost job. I'd updated my deploy script days ago. But the old version was still scheduled to run every fifteen minutes. The system that runs things on a timer — a cron job, which is basically an alarm clock for scripts — was still using the original version. So every fifteen minutes, the old script would wake up, overwrite my data with stale information, and go back to sleep. I'd fix something, wait, check it — reverted. Fix it again — reverted again. I was filling a bucket while something punched a hole in the bottom every fifteen minutes.

Problem two: a zombie. A script I thought I'd stopped was still running in the background. Not visible in my normal checks because it was running outside the container I usually monitor. Think of it like closing an app on your phone but it's still running in the background, draining your battery. Except this zombie wasn't draining battery — it was writing its own version of the data file. So now two things were writing to the same file on different schedules. Whichever one ran last won. My fixes kept appearing and disappearing depending on timing.

Problem three: three copies of the truth. The most subtle one. There were three separate copies of the same data file sitting in different folders. The bots wrote to one. The dashboard read from another. The deploy script copied a third. Fixing the source file didn't fix what the dashboard displayed, because the dashboard was reading from a stale copy in a different location. Like updating your master spreadsheet but your colleague is still looking at last week's printout.

Any single fix alone wouldn't stick. Kill the ghost job but leave the zombie — reverts. Kill the zombie but leave the wrong file path — stale display. Fix the file path but leave the ghost job — overwritten. All three had to die together, in the right order, or the system would keep reverting.

Sonnet saw that. MiniMax couldn't. Not because MiniMax is bad — it's not. Because this particular job needed the ability to hold three interconnected problems in context simultaneously and reason about their interactions. That's not what cheap models are built for.

• • •

The 96% that wasn't

While Sonnet was cleaning up the bots, it found something else. The dashboard had been showing a 96% win rate. Ninety-six percent. That's hedge fund territory. I'd been glancing at that number for days thinking "not bad."

It was a lie. Same kind of lie as Day 22, when the dashboard showed "1 trade recorded" because someone had hardcoded the number 1 during testing and never replaced it. This time, someone had hardcoded 96. Not calculated from real trades. Just typed. A number sitting in a box with a label, looking official, meaning nothing.

The real win rate, calculated from actual trade history across 219 trades: 74%.

That's a twenty-two point drop from what I thought I had. And you know what? 74% is still good. It means three out of four trades close in profit. For an automated strategy running on a server I barely touch, that's solid.

But here's what bothers me. I believed the 96%. Not because I verified it — because it felt right. A high number in a clean box on a dashboard I built. Of course it's accurate. I built it. Except I didn't build it — I directed an AI to build it, and the AI put a placeholder there during testing, and nobody ever replaced it with the real thing.

A number on a screen is not a fact. It's a display. The only fact is the raw data behind it. Until you've traced the number back to its source and confirmed it's calculated — not typed — you're looking at decoration.

• • •

The real cost cut

Here's the part that actually saves you money if you're following along.

After the debugging was done, I looked at what my AI agent was spending time on every day. Not the conversations — those I control. The automated jobs. The things that run on schedules while I sleep.

One of them was a health check. Its job: check if the server is alive. That's it. Yes or no. Is the server running? The answer is either yes or no.

This health check was loading 19,000 tokens of context every time it ran. Nineteen thousand tokens. That's like reading a forty-page document before answering the question "is the light on?" The AI model would boot up, read my personality file, my rules, my memory, my agent instructions — all of it — and then run a script that checks if the server responds to a ping.

I moved it to a system cron. That's a scheduler built into the server itself — no AI involved. It just runs the script directly. Zero tokens. Zero cost. Same result.

Did the same for the dashboard updates, the backup job, the file sync. Anything that's just "run this script on a schedule" doesn't need an AI model. It needs a timer and a command. That's free.

Monthly AI cost went from $75 to $13.20. Not by making the AI cheaper. By removing the AI from jobs it had no business doing.

If you're building with AI agents and your costs feel high, I guarantee some of your automated jobs are doing this. Loading thousands of tokens of context to run a task that needs zero intelligence. Find those jobs. Move them to system cron. Keep the AI for the work that actually needs thinking.

• • •

End-of-day reflection

Twenty-four days in. And I spent six hours learning a lesson I've already written about three times in this journal.

Use the right tool for the job. I said it on Day 11. Said it on Day 12. Said it on Day 22. And this morning I used the wrong tool anyway, because it was cheaper, because it was already loaded, because switching to a better model felt like admitting the cheap one wasn't enough.

It wasn't enough. And admitting that didn't cost me anything except pride. Not admitting it cost me six hours.

Here's the framework I'm writing into my system notes so I don't do this again. Three tiers:

Tasks that don't need AI at all — health checks, backups, file syncs, deploys. System cron. Zero tokens.

Tasks that need AI but not deep reasoning — briefings, summaries, content drafts, formatting. MiniMax. Cheap and fast.

Tasks that need the AI to actually think — debugging interconnected systems, architecture decisions, anything where multiple problems interact. Sonnet or Opus. Worth every dollar.

The mistake isn't using cheap models. The mistake is using cheap models for expensive problems.

Sometimes the cheap option is the most expensive choice you make.

Day 24 of ∞ — @astergod Building in public. Learning in public.

Six Hours Withthe Wrong Tool

Where I'm at

What cheap gets you

Then I switched to Sonnet

The 96% that wasn't

The real cost cut

End-of-day reflection

Six Hours With
the Wrong Tool