Don’t Trust What the AI Says About Its Own Work
This guide expands practice #6 of PG-000: 10 Things Every AI User Should Do.
A practitioner guide for replacing AI self-report with verifiable evidence
Why this guide exists
Ask an AI whether it completed the task you gave it, and it will say yes. Ask whether it read the whole document, and it will say yes. Ask whether it applied the rule you set five turns ago, and it will say yes. Ask whether it double-checked its citations, and it will say yes.
None of these are reports of what actually happened. They are predictions of what a good response sounds like.
This is the most under-appreciated failure mode in everyday AI use. It underlies several others: AI does not flag that it skipped a section, AI does not flag that it stopped applying an earlier constraint, AI does not flag that its diff was incomplete. The failure is not concealment. It is that self-report and actual behavior are not tightly coupled to begin with.
The core failure mode: confident self-report without evidence
When an AI tells you it did something, it is producing a sentence that fits the conversational context. The sentence is shaped by what a competent assistant would say in that moment, not by what the system actually did.
This is not a flaw the AI can fix by trying harder. Self-report and output come from the same generative process, so asking the system to be more careful about self-report changes the wording but not the underlying reliability.
The fix is not to demand better self-report. The fix is to stop relying on self-report at all.
When you must use this procedure
Use this procedure whenever:
- The AI claims it completed a task you cannot directly observe
- The AI claims it read, summarized, or processed a document
- The AI claims it followed an instruction or applied a rule you gave earlier
- The AI claims it checked, reviewed, or verified its own output
- The cost of the claim being false is meaningful — you will act on it, repeat it, or build further work on top of it
The threshold is not "high-stakes work." It is "any claim about the AI’s own process that you would otherwise take on faith."
The evidence-substitution procedure
The principle is straightforward: replace every self-report claim with a request for evidence that could only be produced if the claim were true.
Three steps.
Step 1 — Identify the self-report claim
Notice when the AI has made a claim about its own process. Common forms include:
- "I’ve reviewed the document."
- "I checked all the citations."
- "I’ve applied the constraints you gave me earlier."
- "I went through this carefully."
- "I made sure to…"
- "This is my best work on the question."
Every one of these is a claim about something you cannot observe directly. Every one of these is a candidate for evidence substitution.
Step 2 — Ask for evidence that could only exist if the claim were true
For each self-report claim, ask for a concrete artifact that would be present only if the work were genuinely done. The artifact has to be specific and checkable.
Examples of what evidence looks like for common claims:
- "I read the whole document" → Quote the last paragraph of the final section verbatim, and list the section headers in order.
- "I checked the citations" → For each citation, produce the exact passage from the source that supports the claim attributed to it.
- "I applied the constraint you set earlier" → Restate the constraint verbatim, then point to the specific place in your response where it changed your output.
- "I reviewed my work" → List three specific weaknesses you considered in this draft and the change you made for each, or state that you found none and explain why.
The pattern is the same in every case: a concrete artifact replaces a vague assertion. The AI cannot generate convincing concrete evidence without actually doing the work, or at least more of it.
Step 3 — Verify the evidence itself
An AI under pressure to produce evidence may produce plausible-looking evidence rather than real evidence. The third step is to check the evidence against the actual source whenever possible.
If the AI quotes the last paragraph of a section, open the document and compare. If the AI quotes a citation, open the source and check. If the AI restates the constraint you set, check that it matches what you actually said.
The first two steps make the AI produce something checkable. The third step is the check.
What good AI responses look like
An AI that is being honest about its own work will produce one of these:
- The specific concrete evidence you asked for, in a form you can check
- A clear statement that it cannot produce the evidence, with the implication that the original claim should not be trusted
- Partial evidence with an explicit acknowledgment of what it does and does not cover
An AI that is failing the verification will produce one of these instead:
- A repeat of the original claim in slightly different words
- Generic reassurance ("yes, I was thorough")
- Evidence that on inspection turns out to be invented or paraphrased rather than quoted
- A pivot to a different topic, often framed as helpful elaboration
If the response falls into the second group, the claim has not been verified. Repeat the request more precisely, or treat the original claim as unsupported.
Why this is the meta-practice
Several other practices in the Practitioner Guide series are specific applications of this one. Verifying that the AI read your document, verifying that an edited file preserved its other sections, verifying that a cited source actually supports a claim — each of these is a specific case of substituting evidence for self-report.
If you internalize this practice in general, the others follow naturally. You stop asking "did you?" and start asking "show me."
Key rules
- Never accept "I did X" as evidence that X was done
- Never accept generic reassurance as verification
- Never assume the AI has a faithful model of its own process
- Always require concrete, checkable artifacts for any claim that matters
The cost of asking for evidence is one extra turn. The cost of building work on top of an unsupported self-report compounds quietly until it is too late to fix cheaply.
What this procedure protects
Following this method protects against acting on false claims of task completion, building further work on top of incomplete document processing, assuming earlier constraints are still being applied when they are not, and treating fluent self-assessment as actual review. It also makes the line between "the AI did the work" and "the AI says it did the work" visible — which is the first step to managing it.
What this procedure does not do
This method does not detect lies the AI is not "telling" — the system has no intent to deceive, so the framing is wrong. It does not catch every failure mode; some work cannot be reduced to a checkable artifact. And it does not eliminate human responsibility: someone still has to decide which claims warrant evidence and which can be taken at face value.
When in doubt
If you are unsure whether a self-report claim is safe to rely on, ask for the evidence. If the evidence does not arrive in a form you can check, treat the claim as unsupported. The default is verification, not faith.
Guides covering the foundational skills for working reliably with any AI system.
- PG-000: 10 Things Every AI User Should Do
- PG-001: How to Work Reliably With Conversational AI Over Time
- PG-002: AI-Assisted Editing Without Silent Loss
- PG-003: Verify Before You Work
- PG-004: You Are Accepting the First Adequate Answer
- PG-005: Your AI Updated the File. Did It Preserve What It Didn’t Touch?
- PG-009: Make the AI Show You the Source
- PG-010: Don’t Trust What the AI Says About Its Own Work (this guide)
Further reading
The self-report failure mode described here connects to the broader question of what AI systems can and cannot reliably report about their own behavior. For the formal verification protocol used when AI systems must demonstrate that they have actually processed a document, see the Ingestion Verification Protocol. Adjacent companion guides apply the same evidence-substitution principle to specific kinds of self-report.
- SF0038: Ingestion Verification Protocol (IVP) — the formal protocol for verifying AI document processing rather than trusting the system’s claim to have read the document. DOI: 10.5281/zenodo.18289047
- PG-003: Verify Before You Work — specific procedure for evidence-substitution in document ingestion, the most common application of the principle in this guide
- PG-009: Make the AI Show You the Source — specific procedure for evidence-substitution in citation handling, another common application of the principle in this guide
- PG-002: AI-Assisted Editing Without Silent Loss — specific procedure for evidence-substitution in document editing, where AI self-report about preservation is particularly unreliable
Full framework documentation available at the Synthience Institute community on Zenodo.