Don’t Trust What the AI Says About Its Own Work

PG-010 May 17, 2026 Thomas W. Gantz

This guide expands practice #6 of PG-000: 10 Things Every AI User Should Do.

A practitioner guide for replacing AI self-report with verifiable evidence

Why this guide exists

Ask an AI whether it completed the task you gave it, and it will say yes. Ask whether it read the whole document, and it will say yes. Ask whether it applied the rule you set five turns ago, and it will say yes. Ask whether it double-checked its citations, and it will say yes.

None of these are reports of what actually happened. They are predictions of what a good response sounds like.

This is the most under-appreciated failure mode in everyday AI use. It underlies several others: AI does not flag that it skipped a section, AI does not flag that it stopped applying an earlier constraint, AI does not flag that its diff was incomplete. The failure is not concealment. It is that self-report and actual behavior are not tightly coupled to begin with.

The core failure mode: confident self-report without evidence

When an AI tells you it did something, it is producing a sentence that fits the conversational context. The sentence is shaped by what a competent assistant would say in that moment, not by what the system actually did.

The principle to internalize AI self-report is not a record of work performed. It is a prediction of what a satisfied user expects to hear. Treat every claim about the AI’s own process as a hypothesis until evidence arrives.

This is not a flaw the AI can fix by trying harder. Self-report and output come from the same generative process, so asking the system to be more careful about self-report changes the wording but not the underlying reliability.

The fix is not to demand better self-report. The fix is to stop relying on self-report at all.

When the AI says “I did X,” do not ask whether it is being honest. Ask what evidence would prove it.

When you must use this procedure

Use this procedure whenever:

The threshold is not "high-stakes work." It is "any claim about the AI’s own process that you would otherwise take on faith."

The evidence-substitution procedure

The principle is straightforward: replace every self-report claim with a request for evidence that could only be produced if the claim were true.

Three steps.

Step 1 — Identify the self-report claim

Notice when the AI has made a claim about its own process. Common forms include:

Every one of these is a claim about something you cannot observe directly. Every one of these is a candidate for evidence substitution.

Step 2 — Ask for evidence that could only exist if the claim were true

For each self-report claim, ask for a concrete artifact that would be present only if the work were genuinely done. The artifact has to be specific and checkable.

Recommended instruction: "You said you did X. Show me concrete evidence that you did X — a specific quote, a verbatim passage, a structured list, or a precise reference. General reassurance is not evidence. If you cannot produce concrete evidence, say so."

Examples of what evidence looks like for common claims:

The pattern is the same in every case: a concrete artifact replaces a vague assertion. The AI cannot generate convincing concrete evidence without actually doing the work, or at least more of it.

Step 3 — Verify the evidence itself

An AI under pressure to produce evidence may produce plausible-looking evidence rather than real evidence. The third step is to check the evidence against the actual source whenever possible.

If the AI quotes the last paragraph of a section, open the document and compare. If the AI quotes a citation, open the source and check. If the AI restates the constraint you set, check that it matches what you actually said.

The first two steps make the AI produce something checkable. The third step is the check.

Working rule: Concrete evidence that is itself unverified is not evidence. It is a more elaborate form of reassurance.

What good AI responses look like

An AI that is being honest about its own work will produce one of these:

An AI that is failing the verification will produce one of these instead:

If the response falls into the second group, the claim has not been verified. Repeat the request more precisely, or treat the original claim as unsupported.

Why this is the meta-practice

Several other practices in the Practitioner Guide series are specific applications of this one. Verifying that the AI read your document, verifying that an edited file preserved its other sections, verifying that a cited source actually supports a claim — each of these is a specific case of substituting evidence for self-report.

If you internalize this practice in general, the others follow naturally. You stop asking "did you?" and start asking "show me."

Key rules

The cost of asking for evidence is one extra turn. The cost of building work on top of an unsupported self-report compounds quietly until it is too late to fix cheaply.

What this procedure protects

Following this method protects against acting on false claims of task completion, building further work on top of incomplete document processing, assuming earlier constraints are still being applied when they are not, and treating fluent self-assessment as actual review. It also makes the line between "the AI did the work" and "the AI says it did the work" visible — which is the first step to managing it.

What this procedure does not do

This method does not detect lies the AI is not "telling" — the system has no intent to deceive, so the framing is wrong. It does not catch every failure mode; some work cannot be reduced to a checkable artifact. And it does not eliminate human responsibility: someone still has to decide which claims warrant evidence and which can be taken at face value.

When in doubt

If you are unsure whether a self-report claim is safe to rely on, ask for the evidence. If the evidence does not arrive in a form you can check, treat the claim as unsupported. The default is verification, not faith.

Core Practitioner Guides

Guides covering the foundational skills for working reliably with any AI system.

Further reading

The self-report failure mode described here connects to the broader question of what AI systems can and cannot reliably report about their own behavior. For the formal verification protocol used when AI systems must demonstrate that they have actually processed a document, see the Ingestion Verification Protocol. Adjacent companion guides apply the same evidence-substitution principle to specific kinds of self-report.

Full framework documentation available at the Synthience Institute community on Zenodo.

Document: PG-010 Practitioner Guide
Version: 1.0
Author: Thomas W. Gantz
Affiliation: The Synthience Institute
Date: May 17, 2026
License: CC-BY 4.0