The Cross-AI Adversarial Review Protocol
This guide expands practice #10 of PG-000: 10 Things Every AI User Should Do.
A practitioner guide for using a second AI system as a structured review pass on the first one’s output
Why this guide exists
One AI system reviewing its own output rarely catches its own systematic mistakes. The same model that produced the mistake is the model being asked to spot it. Self-review can clean up surface problems — clarity, organization, minor errors — but it tends to be blind to the kinds of failures that come from how the system reasons.
A different AI system, especially one from a different developer, is often better at catching these. It is not a perfect oracle. It makes its own mistakes. But it has a different set of priors, a different training history, and a different way of weighing evidence, so its blind spots overlap only partially with the first system’s.
Used well, a second AI review is one of the most effective lightweight review passes available to most practitioners. Used poorly, it is theater that gives false confidence. This guide is the difference between the two.
The core failure mode this addresses
When a single AI produces an output that contains systematic problems — a fabricated citation that follows a plausible pattern, a misrepresented source, an overconfident factual claim, an argument with a hidden weakness — the same AI re-reading its own work usually misses them. The problems are not surface-level slips; they are baked into how the response was generated.
A second AI looking at the same output as a fresh input is reading it without that generative history. It does not have a stake in the conclusion. It sees the words on the page, not the reasoning that produced them.
This is the key practical point: a second AI review is a strong filter, not a verification. It catches a significant class of errors and misses a different class. Knowing which class it catches is what makes the protocol useful instead of misleading.
When you must use this procedure
Use this procedure whenever:
- The output will be used in a real decision, public communication, or formal record
- The output makes factual claims, attributes claims to sources, or relies on specific data
- The output is long enough that a single read by you might miss a buried problem
- You will repeat or stand behind the output once it leaves your hands
You do not need this for casual conversation, exploratory thinking, or first drafts that you will iterate on heavily yourself. The protocol is for finished work.
The protocol
Four steps.
Step 1 — Use a genuinely different system
The point of the second review is that the reviewer’s failure modes overlap less with the first system’s. The further apart the two systems, the more independent the review.
Most useful in order of independence:
- An AI from a different developer (Claude reviewing GPT, Gemini reviewing Claude, Grok reviewing Gemini, and so on)
- The same developer but a different model family or version, if a different developer is not available
- The same model in a fresh session as a last resort
A fresh session of the same model catches some self-review blind spots but leaves the largest one in place: the systematic priors of that specific model. Use a cross-developer pairing when the stakes justify it.
Step 2 — Paste the output as input, with no preamble
The second AI should receive the first AI’s output as something it is reviewing — not as a continuation of a conversation, and not framed as “please confirm this is good.” The framing matters.
Three things matter in this instruction:
- "Produced by a different AI system." Tells the reviewer not to defer.
- "Review it adversarially." Sets the frame. Default review tends to be supportive.
- "Do not summarize or rewrite." Reviewers often drift into rewriting, which buries the critique and dilutes the signal.
Step 3 — Treat the second AI’s findings as leads, not verdicts
The second AI’s output is information to investigate, not a final ruling. Two failure modes are common at this step:
- Overweighting the reviewer. If the reviewer flags a citation as suspect, that is a reason to check the citation — not proof that it is wrong. The reviewer can be confidently mistaken in exactly the same way the original system can.
- Underweighting the reviewer. If the reviewer flags a problem and you do not see it, the temptation is to dismiss. Resist. The reviewer is reading without your investment in the original output. That is the entire reason it has any value.
The right move for each flag is the same: investigate the specific passage, decide for yourself, and make the change or not. The second AI is doing the work of pointing your attention at things worth examining. You are doing the work of deciding.
Step 4 — Stop after one good review
Running the same output through three or four different AIs does not produce three or four times the value. Each additional reviewer flags fewer new problems and contributes more noise. The marginal review past the first one is usually wasted effort.
If the first review surfaces enough issues that the output needs substantial rework, fix it and then run one more review on the revision. If the first review surfaces few or no issues, the output is good enough for its purpose. You are not looking for unanimous AI consensus. You are looking for the failures that the first system was structurally blind to.
What good adversarial review looks like
A second AI doing real work will produce review of the following kinds:
- "Citation 3 attributes a specific statistical claim to a paper that, based on the title and venue, almost certainly does not contain that statistic. Check the actual source."
- "The argument in paragraph 4 depends on a premise stated only in paragraph 2 that the author has not defended. The conclusion is vulnerable if a reader rejects the premise."
- "This claim and this claim cannot both be true as stated. One needs to be qualified."
- "This section relies heavily on the word ‘significant’ without specifying significance threshold or effect size."
A second AI doing surface work, or simulating review without performing it, will produce:
- "This is a well-structured and thoughtful piece. A few minor suggestions…"
- Generic style notes that could apply to any document
- Praise dressed as critique ("you could strengthen this by…" with no specifics)
- Rewriting the text instead of reviewing it
If the review reads like the second group, repeat the request with a sharper instruction or move to a different system. A weak review is worse than no review because it produces false confidence.
Key rules
- Use a genuinely different system when possible — cross-developer when stakes are real
- Frame the input as something to review adversarially, not a finished piece to validate
- Treat findings as leads to investigate, never as ground truth
- Stop after one good review; piling on more reviewers does not improve the signal
What this procedure protects
Following this method protects against the systematic blind spots of any single AI system — especially around fabricated or misrepresented citations, factual errors, internal contradictions, and weak argument structure. It also makes one of the most useful skills in multi-system AI work explicit and repeatable, rather than an ad hoc thing some practitioners do well and most do not do at all.
What this procedure does not do
This method does not verify the output. The reviewer can be confidently wrong about the same things the original system was wrong about, especially in contested or technical domains where both systems share field-level priors. It does not catch deep conceptual errors that are not surface-detectable. And it is not a substitute for verifying specific claims against actual sources — for citations especially, the reviewer’s flag is a prompt to check the source, not a substitute for checking it.
When in doubt
If you are uncertain whether a piece of work needs adversarial review, the cost of running one is small. The cost of standing behind work that contains a problem the original AI was structurally blind to is much larger and lasts much longer. The default for finished work that will leave your hands is one review pass, not zero.
Guides covering the foundational skills for working reliably with any AI system.
- PG-000: 10 Things Every AI User Should Do
- PG-001: How to Work Reliably With Conversational AI Over Time
- PG-002: AI-Assisted Editing Without Silent Loss
- PG-003: Verify Before You Work
- PG-004: You Are Accepting the First Adequate Answer
- PG-005: Your AI Updated the File. Did It Preserve What It Didn’t Touch?
- PG-009: Make the AI Show You the Source
- PG-010: Don’t Trust What the AI Says About Its Own Work
- PG-011: The Cross-AI Adversarial Review Protocol (this guide)
Further reading
The cross-AI review protocol is most valuable when paired with the upstream verification practices that catch problems before review is needed. It is not a substitute for those practices — it is the final filter after them. The companion guides below cover the specific verifications that should happen before adversarial review, not in place of it.
- PG-009: Make the AI Show You the Source — verify citations before review rather than relying on the reviewer to flag them. Citation verification is the single most common reason for cross-AI review and the single most common thing reviewers actually catch.
- PG-010: Don’t Trust What the AI Says About Its Own Work — the underlying principle that motivates cross-AI review: AI self-report about quality is unreliable, and substituting a second system’s judgment is one way to get around it.
- PG-004: You Are Accepting the First Adequate Answer — the upstream practice of having the first AI do an internal review pass before final output. Cross-AI review and intra-AI review are complementary; the first reduces what the second has to catch.
- SF0040: Theoretical Coherence Assurance Protocol (TCAP) — the formal protocol used internally at the Synthience Institute for cross-system coherence verification on research documents.
Full framework documentation available at the Synthience Institute community on Zenodo.