If Your Math Agent Can't Check Itself, It's a Poet
If Your Math Agent Can’t Check Itself, It’s a Poet
Most math “reasoning” demos are storytelling.
They sound convincing. They look sophisticated.
And then you do the one thing Olympiads—and reality—require:
you check the result.
If your math agent can’t do exact arithmetic, validate constraints, and run sanity checks…
it’s not a solver.
It’s a poet wearing a lab coat.
Math Olympiad “Reasoning” Isn’t the Secret. Checks Are.
🐉 The Reasoning Dragon (what it really is)
The “reasoning” failure mode isn’t that models can’t think.
It’s that they produce un-auditable chains with no witnesses:
- hidden assumptions
- sloppy arithmetic
- missing boundary cases
- no falsifiers
- no verification step
Humans get fooled by long explanations. Math doesn’t care.
The Stillwater rule
Reasoning must be checkable.
Not just plausible.
In practice, that means:
- Exact computation where possible (integers/fractions, not floating vibes)
- Multiple test cases (don’t trust one example)
- Witness-first steps (intermediate claims you can verify)
- Falsifiers (what would break this?)
- Receipts (the checks you ran)
Poet vs Solver (the only split that matters)
Poet (looks smart)
- one-shot solution
- confident tone
- long narrative
- no tests
- no invariants
Solver (is correct)
- defines constraints clearly
- derives intermediate witnesses
- checks arithmetic exactly
- validates edge cases
- outputs a verdict + receipts
If you want reliable math, build a Solver.
A concrete template (copy/paste into any math workflow)
Use this every time:
- Claim: what are we solving?
- Given: list constraints + domain (integers? reals? nonnegative?)
- Plan: the smallest method that could work
- Witnesses: intermediate results you can verify
- Checks: substitute back / boundary conditions / dimension sanity
- Falsifiers: what would disprove this solution
- Verdict: PASS / FAIL / NEED_INFO
This turns “reasoning” into something you can audit.
Why this wins on Olympiad-style tasks
Olympiad problems punish:
- missing cases
- arithmetic slips
- hand-wavy leaps
- “it seems” arguments
Checks catch all of those.
And checks are cheap.
That’s the punchline.
MrBeast-style challenge (participation loop)
Give me any hard problem you care about (contest / interview / real project).
I’ll reply with:
- a checks-first solution structure
- the exact witnesses to compute
- the sanity checks to run
- and what would falsify the answer
Comment CHECKS and I’ll drop a one-page “Witness-First + Checks” card you can reuse forever.
Tower placement (why this is Floor 3)
In the Stillwater Tower, this is Floor 3: Proving.
Because the goal isn’t to “sound like you reasoned.”
The goal is to produce an answer that survives a skeptic.
The point (one line)
Math agents don’t win by talking smarter. They win by checking harder.
— Phuc Vinh Truong