Day 14 — The Dedup Problem

Where I'm at

Fourteen days in. The briefings were broken — not the content, but the repetition. Every morning, every evening, the same stories. NVDA earnings. Tesla crashes. Anthropic ethics. Same stuff, different timestamp.

I'd open the morning briefing and think "didn't I read this last night?" I had. Almost word for word.

That's not a briefing. That's a loop.

The system doesn't know it already told me something. Each briefing runs in an isolated session. No memory of the previous one. So it searches the same queries, finds the same top results, writes the same analysis, and delivers it like it's fresh insight. Three times a day, the same insight.

It's the agent equivalent of that friend who tells you the same story every time you see them — except I built this friend and I'm paying for every retelling.

The Dubai problem

This evening — well, this afternoon my time, this evening in the Gulf — news broke. A drone strike hit an airport somewhere between Dubai and Iran. Or Kuwait. The details shifted as the hours passed, as breaking news always does.

First reports said Dubai. Then it was a military facility nearby. Then something about commercial flights being diverted.

I found out from X. Not from my system.

My briefings missed it. All of them. I pulled up the cron prompts to understand why. The search queries were "crypto market news today," "AI release announcement," "stock market movers." Narrow. Finance-specific. Nowhere in any prompt was there a query for conflict, geopolitics, commodities disruption, or breaking world news.

The briefings were built to track what NVDA did. They couldn't tell me what just exploded.

That's when the real problem clicked. The repetition issue and the blind spot issue were two symptoms of the same design flaw: the briefings were optimized for a narrow slice of reality. Deep on the topics I'd explicitly told them to watch. Completely blind to everything else.

A drone strike in the Gulf can move oil prices, crash airline stocks, spike defense contractors, and shift crypto sentiment in an hour. My system saw none of it because nobody told it to look. As a trader, that's not a minor gap. That's a dangerous one.

Building the fix

I built two tools in a few hours. Both simple in concept. Both painful in practice.

First: breaking-news.py. Instead of searching two or three finance-specific queries, it scans eight different angles — geopolitical, war and conflict, commodities, macro-economic, financial markets, crypto, tech, and regulatory. The stuff that moves markets but never shows up in a search for "crypto news today." The world is wider than my portfolio. The briefings need to reflect that.

Second: briefing-dedup.py. This one tracks every story the system reports. Hashes the content, stores it in a local file, auto-purges after 72 hours. Before each briefing runs, it checks what's already been covered. If it's old news with no new development, skip it. If there's a genuine update — new details, market reaction, follow-up — include it with context.

Simple concept. Took forever to implement. The core problem: each briefing runs in an isolated cron session with no memory of the last one. That's the trade-off with cron jobs — they're cheap and reliable, but they don't share state. The morning session doesn't know what the evening session said. There's no thread connecting them. Just three separate jobs executing the same queries against the same internet and hoping something changed.

The fix: write to a shared JSON file on the server. Each briefing reads it before running, adds its stories after finishing, and a cleanup job clears entries older than three days. Now the morning knows what the evening said. The evening knows what the morning covered. State lives on disk, not in the model's head.

It's the same lesson from Day 9 — rules in a file are just documentation. Instructions need to be in the execution path. Except here, it's not rules. It's memory. The system needed to remember what it already said, and since the model can't do that across sessions, the infrastructure has to.

The sitemap cleanup

While I was in fix-it mode, I uploaded a new sitemap. The old one had nine dead guide URLs — wrong filenames from early deploys — and was missing sixteen actual pages that existed on the site but Google didn't know about.

Twenty-five errors in a forty-page site. Forty-four URLs now. Deployed. Verified. Submitted to Google Search Console.

It's the small things that matter for SEO. One broken link and Google starts trusting the whole sitemap less. Nine broken links and it probably ignores you. Given the Day 7 discovery that the site was already invisible, every indexing detail matters right now.

The two-briefing decision

I also cut the third briefing. Morning, afternoon, evening — three times a day was too much. The afternoon was overlapping with morning, just with slightly newer prices and the same analysis. Two good briefings are worth more than three mediocre ones.

Now it's two: 13:30 CET (pre-market, right before US open) and 22:30 CET (after-hours, right after close). Those are the two moments that actually matter for my trading.

The Alert Guardian fills the gaps — now checking every three hours instead of six, scanning for anything that needs immediate attention. Less noise, better timing. Lower cost, better coverage. The system got smaller and smarter on the same day.

What I learned

Three things.

First, breadth beats depth for breaking news. The old briefings were deep on finance but blind to everything else. One drone strike in the Gulf and I had nothing. A system that only watches what you expect will never catch what you don't.

Second, dedup is a system problem, not a content problem. I kept trying to prompt my way out of repetition — "don't repeat stories from the last briefing" — but the sessions had no memory. You can't prompt your way around statelessness. You need state. The infrastructure has to solve what the model can't.

Third, less is more. Two briefings at the right times beat three at arbitrary ones. The audit from Day 12 already cut the cron count. Today's cut went deeper — not just removing jobs, but questioning the schedule itself. When do I actually need information? Twice a day, aligned to the market. Everything else is noise.

End of day

Day 14. Two weeks in. The system now remembers what it said. It scans wider than finance. It repeats less. It costs less.

But the real lesson today was about blind spots — not the system's, mine. I built briefings around the topics I already track. Finance, crypto, AI. My comfort zone. The system faithfully reflected my own narrow focus back at me, and I mistook it for comprehensive coverage.

It took a drone strike I heard about on X to realize my automated intelligence system was less informed than my timeline.

The next time something explodes in the Gulf, I'll know about it. Not because I was scrolling. Because the system was watching.

Day 14 complete. The system is wider, leaner, and has a memory. Two weeks from zero to here. The hardest part was never the building. It was seeing what I'd missed.

Day 14 of ∞ — @astergod Building in public. Learning in public.

Day 14.The Dedup Problem.