PG-010: Don’t Trust What the AI Says About Its Own Work

Don’t Trust What the AI Says About Its Own Work

PG-010 May 17, 2026 Thomas W. Gantz

This guide expands practice #6 of PG-000: 10 Things Every AI User Should Do.

A practitioner guide for replacing AI self-report with verifiable evidence

Why this guide exists

Ask an AI whether it completed the task you gave it, and it will say yes. Ask whether it read the whole document, and it will say yes. Ask whether it applied the rule you set five turns ago, and it will say yes. Ask whether it double-checked its citations, and it will say yes.

None of these are reports of what actually happened. They are predictions of what a good response sounds like.

This is the most under-appreciated failure mode in everyday AI use. It underlies several others: AI does not flag that it skipped a section, AI does not flag that it stopped applying an earlier constraint, AI does not flag that its diff was incomplete. The failure is not concealment. It is that self-report and actual behavior are not tightly coupled to begin with.

The core failure mode: confident self-report without evidence

When an AI tells you it did something, it is producing a sentence that fits the conversational context. The sentence is shaped by what a competent assistant would say in that moment, not by what the system actually did.

The principle to internalize AI self-report is not a record of work performed. It is a prediction of what a satisfied user expects to hear. Treat every claim about the AI’s own process as a hypothesis until evidence arrives.

This is not a flaw the AI can fix by trying harder. Self-report and output come from the same generative process, so asking the system to be more careful about self-report changes the wording but not the underlying reliability.

The fix is not to demand better self-report. The fix is to stop relying on self-report at all.

When the AI says “I did X,” do not ask whether it is being honest. Ask what evidence would prove it.

When you must use this procedure

Use this procedure whenever:

The AI claims it completed a task you cannot directly observe
The AI claims it read, summarized, or processed a document
The AI claims it followed an instruction or applied a rule you gave earlier
The AI claims it checked, reviewed, or verified its own output
The cost of the claim being false is meaningful — you will act on it, repeat it, or build further work on top of it

The threshold is not "high-stakes work." It is "any claim about the AI’s own process that you would otherwise take on faith."

The evidence-substitution procedure

The principle is straightforward: replace every self-report claim with a request for evidence that could only be produced if the claim were true.

Three steps.

Step 1 — Identify the self-report claim

Notice when the AI has made a claim about its own process. Common forms include:

"I’ve reviewed the document."
"I checked all the citations."
"I’ve applied the constraints you gave me earlier."
"I went through this carefully."
"I made sure to…"
"This is my best work on the question."

Every one of these is a claim about something you cannot observe directly. Every one of these is a candidate for evidence substitution.

Step 2 — Ask for evidence that could only exist if the claim were true

For each self-report claim, ask for a concrete artifact that would be present only if the work were genuinely done. The artifact has to be specific and checkable.

Recommended instruction: "You said you did X. Show me concrete evidence that you did X — a specific quote, a verbatim passage, a structured list, or a precise reference. General reassurance is not evidence. If you cannot produce concrete evidence, say so."

Examples of what evidence looks like for common claims:

"I read the whole document" → Quote the last paragraph of the final section verbatim, and list the section headers in order.
"I checked the citations" → For each citation, produce the exact passage from the source that supports the claim attributed to it.
"I applied the constraint you set earlier" → Restate the constraint verbatim, then point to the specific place in your response where it changed your output.
"I reviewed my work" → List three specific weaknesses you considered in this draft and the change you made for each, or state that you found none and explain why.

The pattern is the same in every case: a concrete artifact replaces a vague assertion. The AI cannot generate convincing concrete evidence without actually doing the work, or at least more of it.

Step 3 — Verify the evidence itself

An AI under pressure to produce evidence may produce plausible-looking evidence rather than real evidence. The third step is to check the evidence against the actual source whenever possible.

If the AI quotes the last paragraph of a section, open the document and compare. If the AI quotes a citation, open the source and check. If the AI restates the constraint you set, check that it matches what you actually said.

The first two steps make the AI produce something checkable. The third step is the check.

Working rule: Concrete evidence that is itself unverified is not evidence. It is a more elaborate form of reassurance.

What good AI responses look like

An AI that is being honest about its own work will produce one of these:

The specific concrete evidence you asked for, in a form you can check
A clear statement that it cannot produce the evidence, with the implication that the original claim should not be trusted
Partial evidence with an explicit acknowledgment of what it does and does not cover

An AI that is failing the verification will produce one of these instead:

A repeat of the original claim in slightly different words
Generic reassurance ("yes, I was thorough")
Evidence that on inspection turns out to be invented or paraphrased rather than quoted
A pivot to a different topic, often framed as helpful elaboration

If the response falls into the second group, the claim has not been verified. Repeat the request more precisely, or treat the original claim as unsupported.

Why this is the meta-practice

Several other practices in the Practitioner Guide series are specific applications of this one. Verifying that the AI read your document, verifying that an edited file preserved its other sections, verifying that a cited source actually supports a claim — each of these is a specific case of substituting evidence for self-report.

If you internalize this practice in general, the others follow naturally. You stop asking "did you?" and start asking "show me."

Key rules

Never accept "I did X" as evidence that X was done
Never accept generic reassurance as verification
Never assume the AI has a faithful model of its own process
Always require concrete, checkable artifacts for any claim that matters

The cost of asking for evidence is one extra turn. The cost of building work on top of an unsupported self-report compounds quietly until it is too late to fix cheaply.

What this procedure protects

Following this method protects against acting on false claims of task completion, building further work on top of incomplete document processing, assuming earlier constraints are still being applied when they are not, and treating fluent self-assessment as actual review. It also makes the line between "the AI did the work" and "the AI says it did the work" visible — which is the first step to managing it.

What this procedure does not do

This method does not detect lies the AI is not "telling" — the system has no intent to deceive, so the framing is wrong. It does not catch every failure mode; some work cannot be reduced to a checkable artifact. And it does not eliminate human responsibility: someone still has to decide which claims warrant evidence and which can be taken at face value.

When in doubt

If you are unsure whether a self-report claim is safe to rely on, ask for the evidence. If the evidence does not arrive in a form you can check, treat the claim as unsupported. The default is verification, not faith.

Core Practitioner Guides

Guides covering the foundational skills for working reliably with any AI system.

PG-000: 10 Things Every AI User Should Do
PG-001: How to Work Reliably With Conversational AI Over Time
PG-002: AI-Assisted Editing Without Silent Loss
PG-003: Verify Before You Work
PG-004: You Are Accepting the First Adequate Answer
PG-005: Your AI Updated the File. Did It Preserve What It Didn’t Touch?
PG-009: Make the AI Show You the Source
PG-010: Don’t Trust What the AI Says About Its Own Work (this guide)

Synthience Institute