Citation Verification Protocol (CVP)
This document establishes the Citation Verification Protocol (CVP) for Synthience Institute publications. CVP ensures that every citation is (1) real and retrievable and (2) genuinely supports the specific claim it is used to justify. The protocol is designed to be executed using AI systems with live browsing, while producing an auditable verification log that can be independently reviewed.
1. Purpose and Scope
This protocol provides a systematic method for verifying citations used in Synthience Institute manuscripts and research artifacts.
CVP addresses two verification requirements:
- Existence verification: Confirming that cited works exist and match the cited bibliographic identity.
- Support verification: Confirming that cited works actually support the specific claims they are cited to justify.
Critical requirement: This protocol is only effective when using live browsing against real public sources. Prior-based assessments (model memory, training priors, general plausibility judgments) are explicitly insufficient.
1.1 Public Verifiability Constraint (PVC): Mandatory Exclusion Gate
Synthience Institute publications may cite only sources whose full text is publicly accessible for free at the time of verification.
This is a hard exclusion rule. If a source is paywalled, requires authentication, is only partially viewable, or cannot be retrieved in full, it must not appear in the manuscript references.
Access Class (required for every citation)
PASS
- FREE-FULLTEXT: Complete text accessible without payment or authentication at the time of verification.
FAIL
- PAYWALLED: Payment required.
- AUTH-REQUIRED: Institutional login or account required.
- PARTIAL-ONLY: Abstract only, preview only, limited pages, “snippet view,” or any form of incomplete access.
- UNAVAILABLE: Link rot, removed content, dead URL, or otherwise not retrievable.
Clarification: What counts as FREE-FULLTEXT
FREE-FULLTEXT includes any format that allows verification of the full work, including:
- Publisher-hosted HTML full text (even if PDF is blocked), provided the full text is complete.
- Preprint repositories or archives (arXiv, institutional repositories) with full text.
- Author-accepted manuscripts (AAM) or equivalent full manuscripts, provided they are complete and untruncated.
If the verifier cannot access the complete text needed to check claim support, the citation fails PVC.
1.2 Persistence and Archiving Requirement (PAR)
PVC ensures free public access at verification time. Sources can later disappear or change. Therefore, CVP requires a persistence record for each verified citation.
For every citation that passes PVC, the verifier must record at least one of the following in the verification log:
- A stable canonical identifier (DOI, arXiv ID, ACL Anthology ID, OpenReview ID, PubMed ID, etc.) plus the canonical landing page URL, and
- An archival reference (public web archive snapshot URL) when available, or
- A local snapshot record (internal copy) with a content hash, stored for audit continuity.
Local snapshots do not replace PVC. They exist to preserve auditability if the public source later changes or disappears.
2. The Problem This Protocol Addresses
AI systems routinely fail citation verification in predictable ways. CVP is designed to prevent these failures.
2.1 Tool Invocation Failures
- Simulated browsing: the system claims it “checked” a source but did not open it.
- URL-free verification: the system provides conclusions without evidence links.
2.2 Completeness Failures
- Partial verification: verifying only a subset of citations while implying the entire reference list was checked.
- Denominator loss: failure to report how many citations exist versus how many were verified.
2.3 Support Verification Failures
- Existence-only validation: confirming the paper exists but not checking whether it supports the claim.
- Abstract-only validation: using abstracts as if they prove support.
- Prior-based rejection: dismissing valid citations as “fake” based on model priors rather than evidence.
3. Platform Compatibility
CVP may be executed on any AI system that can:
- Open live web pages,
- Retrieve full-text sources,
- Report the accessed URLs,
- Produce a structured verification log.
If live browsing is unavailable or blocked, CVP cannot be performed.
4. Pre-Verification Requirements
4.1 Confirm AI Platform Has Live Browsing Access
Before verifying any manuscript, the verifier must demonstrate live access by opening at least one known public page and recording:
- The URL
- The access date and time
- A short content confirmation (one sentence) that could only be produced by actually opening the page
4.2 Create a Complete Citation Inventory
The verifier must produce an inventory of all citations used in the manuscript, including:
- Total number of citations (denominator)
- Unique identifier per citation (C1, C2, …, CN)
- Full reference text as written
- The manuscript claim(s) each citation supports (Claim IDs)
CVP is incomplete until all citations in the inventory are processed.
4.3 Enforce PVC Before Verification
For each citation, determine Access Class immediately. Any FAIL citation must be removed or replaced before the manuscript is eligible for publication.
4.4 Claim Map Requirement (No “floating citations”)
Every citation must be tied to a specific claim (or set of claims) in the manuscript. Citations not mapped to claims are invalid and must be removed or reattached with explicit claim linkage.
5. Three-Part Verification Method
Each citation must pass all three parts to be retained.
Part A: Identity and Existence Verification
Confirm the cited work’s identity matches the reference:
- Title, authors, year, venue
- Canonical identifier (preferred): DOI / arXiv / PubMed / ACL / OpenReview / ISBN
- Canonical landing page URL
Record any mismatches. If the cited identity cannot be confirmed, the citation fails.
Part B: Access Verification and Canonical Locator Priority
Determine Access Class under PVC.
Prefer canonical locators in this order (when available):
- DOI landing page that provides FREE-FULLTEXT, or links to a FREE-FULLTEXT version
- First-party repository (arXiv, ACL Anthology, PubMed Central, institutional repository)
- Author-hosted full manuscript (stable URL)
If only secondary mirrors exist, the verifier must justify their use in the log and still meet FREE-FULLTEXT.
Part C: Claim Support Verification (Evidence Standard)
For each Claim ID linked to the citation:
- Extract the claim text from the manuscript (verbatim).
- Open the full text of the source (not abstract-only).
- Locate evidence anchors in the source that support the claim.
Evidence anchors must include:
- A pointer: page number(s) if PDF, or section heading(s) if HTML
- A short excerpt (maximum 25 words) or a precise paraphrase if quoting is not feasible
- A one-sentence explanation of why this evidence supports the claim
Support Rating (required):
- FULL SUPPORT: Evidence directly supports the claim as written.
- PARTIAL SUPPORT: Evidence supports only a weaker or narrower version of the claim.
- NO SUPPORT: Evidence does not support the claim.
- AMBIGUOUS: Cannot confidently determine support from the source text (treated as failure).
Disposition Rules:
- Only FULL SUPPORT may pass.
- PARTIAL SUPPORT requires manuscript revision (narrow the claim) or source replacement.
- NO SUPPORT requires source removal or claim removal.
- AMBIGUOUS requires source replacement or claim rework until verifiable.
6. Implementation Procedure
- Confirm live browsing access (Section 4.1).
- Generate complete citation inventory with denominator (Section 4.2).
- For each citation C1…CN:
- Part A: verify identity and capture canonical ID + URL
- Part B: assign Access Class; enforce PVC; apply canonical locator priority
- Part C: verify claim support with evidence anchors and rating
- Produce a Verification Log (Section 9).
- Produce a Certification Statement indicating:
- Total citations: N
- Verified citations: N (must match)
- Passed citations: P
- Removed/replaced citations: R
- Date/time of verification run
7. Verification Tiers
Tiering determines how defensible the verification is under scrutiny.
- Tier 0 (Not Verified): No CVP log. Not publishable under Synthience Institute standards.
- Tier 1 (Single-Platform Verified): Each citation verified on one platform with evidence anchors.
- Tier 2 (Dual-Platform Verified): Each citation verified on two independent platforms or repositories when feasible.
- Tier 3 (Audit-Ready): Tier 2 plus persistence record (PAR) and a complete, reviewer-readable log.
Default for public release: Tier 2 minimum. Tier 3 recommended for flagship documents.
8. Failure Mode Recognition Guide
If any of the following occur, verification must be treated as failed and rerun:
- The verifier provides conclusions without URLs opened.
- The verifier cites abstracts as if they prove claim support.
- The verifier dismisses a source as “fake” without browsing evidence.
- The verifier verifies fewer than N citations but reports completion.
- The verifier cannot provide evidence anchors (page/section pointers) for support claims.
- The verifier cannot classify Access Class for a citation.
9. Documentation Requirements (Verification Log)
A CVP Verification Log must be produced for every manuscript and must include:
Manuscript metadata
- Manuscript title, version, date
- Verifier identity (human or AI platform name); verification date/time
For each citation (C#)
- Full reference text
- Canonical identifier(s) and canonical URL
- Access Class (PVC PASS/FAIL)
- Retrieval URL(s) actually opened
- Claim IDs supported
For each claim
- Claim text (verbatim)
- Evidence anchor pointer (page/section)
- Short excerpt (≤ 25 words) or precise paraphrase
- Support Rating (FULL / PARTIAL / NONE / AMBIGUOUS)
- Disposition (KEEP / REVISE CLAIM / REPLACE SOURCE / REMOVE)
Persistence and Archiving Requirement (PAR) fields
- Access date/time for the full text
- Archival URL if created or found
- Local snapshot hash (if stored internally), plus storage location reference
Certification Statement Template
Total citations inventoried: N = ___
Citations fully processed: ___ / ___
Citations retained after PVC and support verification: ___
Citations removed or replaced: ___
Verification tier achieved: Tier ___
Verification completed on: _______________
Appendix A: Quick Reference Checklist
- Live browsing confirmed (URL + timestamp recorded)
- Citation inventory complete (denominator N present)
- PVC enforced for every citation (Access Class recorded)
- Canonical identifier recorded (DOI/arXiv/etc.)
- Full text opened for every citation
- Each citation mapped to Claim ID(s)
- Evidence anchors recorded for every claim (page/section + excerpt/paraphrase)
- Support rating applied (FULL required to pass)
- Persistence record captured (archival link and/or local hash)
- Certification statement completed