← Back to home

Published February 23, 2026

If Your Math Agent Can't Check Itself, It's a Poet cover image

If Your Math Agent Can't Check Itself, It's a Poet

If Your Math Agent Can’t Check Itself, It’s a Poet

Most math “reasoning” demos are storytelling.

They sound convincing. They look sophisticated.

And then you do the one thing Olympiads—and reality—require:

you check the result.

If your math agent can’t do exact arithmetic, validate constraints, and run sanity checks…

it’s not a solver.

It’s a poet wearing a lab coat.


Math Olympiad “Reasoning” Isn’t the Secret. Checks Are.

🐉 The Reasoning Dragon (what it really is)

The “reasoning” failure mode isn’t that models can’t think.

It’s that they produce un-auditable chains with no witnesses:

Humans get fooled by long explanations. Math doesn’t care.


The Stillwater rule

Reasoning must be checkable.

Not just plausible.

In practice, that means:

  1. Exact computation where possible (integers/fractions, not floating vibes)
  2. Multiple test cases (don’t trust one example)
  3. Witness-first steps (intermediate claims you can verify)
  4. Falsifiers (what would break this?)
  5. Receipts (the checks you ran)

Poet vs Solver (the only split that matters)

Poet (looks smart)

Solver (is correct)

If you want reliable math, build a Solver.


A concrete template (copy/paste into any math workflow)

Use this every time:

  1. Claim: what are we solving?
  2. Given: list constraints + domain (integers? reals? nonnegative?)
  3. Plan: the smallest method that could work
  4. Witnesses: intermediate results you can verify
  5. Checks: substitute back / boundary conditions / dimension sanity
  6. Falsifiers: what would disprove this solution
  7. Verdict: PASS / FAIL / NEED_INFO

This turns “reasoning” into something you can audit.


Why this wins on Olympiad-style tasks

Olympiad problems punish:

Checks catch all of those.

And checks are cheap.

That’s the punchline.


MrBeast-style challenge (participation loop)

Give me any hard problem you care about (contest / interview / real project).

I’ll reply with:

Comment CHECKS and I’ll drop a one-page “Witness-First + Checks” card you can reuse forever.


Tower placement (why this is Floor 3)

In the Stillwater Tower, this is Floor 3: Proving.

Because the goal isn’t to “sound like you reasoned.”

The goal is to produce an answer that survives a skeptic.


The point (one line)

Math agents don’t win by talking smarter. They win by checking harder.

— Phuc Vinh Truong