How AI Models Choose What to Cite
You're creating content that AI models quietly skip over, and without knowing why, you can't change the outcome. Alex walks you through the exact gates your content must pass to earn a citation, so you can diagnose what's failing and fix it at the source.
What You'll Learn
AI models don't browse the web the way humans do. They form a brand narrative consensus by pulling from everything available: on-site pages, off-site mentions, owned content, and earned coverage.
Your content gets filtered and cut at every step of that process. Most of it never reaches the final answer.
Earning citations comes from building systems that maintain consistency across your content ecosystem and produce high information gain content that stands out from what's already indexed.
How query fan-out works
When a model needs fresh or deeper context to answer a question, it generates real-time sub-queries behind the scenes. This is called query fan-out.
- ChatGPT averages about 3 sub-queries per fan-out. Google AI Mode averages around 9.
- Fan-out happens roughly 40% of the time, and questions that include specific dates, numbers, or details trigger it more often.
- Even when pages are fetched, only about 15% earn a citation in the final answer.
- The top 10 most-cited domains account for less than 12% of all citations. This is a long-tail game, which means smaller and mid-size sites have real opportunity.
The FACT framework
Alex introduces the FACT framework: four sequential gates your content must pass to earn a citation. FACT stands for Findable, Agent Aligned, Citable, and Trusted.
- Each gate is pass/fail. If you fail at Gate 1, Gates 2 through 4 are irrelevant.
- Each gate maps to a different type of fix:
- Gate 1 (Findable): Technical fixes
- Gate 2 (Agent Aligned): Pre-click signal optimization
- Gate 3 (Citable): Content quality improvements
- Gate 4 (Trusted): Ecosystem and off-site strategy
Keeping your brand information accurate and up to date is a baseline requirement across all four gates.
Gate 1: Findable
This is the technical floor. Can AI models discover and read your content?
Check these fundamentals:
- Your pages are indexed by Google
- Your robots.txt isn't blocking AI crawlers like GPTBot or ClaudeBot
- No paywalls or login walls preventing access
- Pages render quickly and reliably
Retrieval is text-first. Content locked inside images, charts, or video without supporting text copy can't be extracted by AI models. If your key information lives only in visual formats, it's effectively invisible.
Gate 2: Agent aligned
Before a model opens your page, it decides whether to click based on pre-click signals only: URL, title tag, snippet, and freshness date.
- For informational queries, precision matters. Alex shares data showing that titles with around 65% cosine similarity to the query, URL slugs at 67% similarity, and matching phrases like how-to all increase the likelihood of earning a citation.
- For commercial queries, the pattern flips. Semantic breadth beats exact matching. About 25% of cited pages use synonyms rather than the exact query terms, and 90% of cited pages had less than 30% word overlap with the query.
Gate 3: Citable
This is where most content fails. Only 15% of pages that get fetched actually earn a citation.
- Five signals determine whether your content gets cited:
- Information gain: Unique data, original research, proprietary benchmarks, or expert perspective. Content that adds something new to the conversation.
- Freshness: Content updated within the last 3 months is 3x more likely to be cited than older content.
- Answer shape: Clear, structured, extractable passages that a model can pull directly into a response.
- Authority: Named expert bylines correlate with higher citation rates. Attributing content to a real person matters.
- Depth over breadth: Specificity wins. Going deep on a focused topic beats shallow coverage of many topics.
Gate 4: Trusted
Alex flags this as the most commonly missed gate. Even if your on-site content is perfect, the broader web has to back you up.
- AI models form consensus across sources. If third-party content doesn't support your brand claims, you lose credibility in the model's answer.
- For top-of-funnel B2B brand mentions in AI search, 85% come from third-party content: review sites, affiliates, publishers, and community discussions.
- To strengthen this gate, audit your third-party mentions, build off-site presence, maintain entity consistency (how your brand is described across the web), and monitor the quality of how you're represented.
Diagnosing and fixing citation gaps
For any page that isn't earning citations, start by identifying which FACT gate fails first.
Fix that gate before moving to the next one. There's no value in optimizing Gate 3 content quality if Gate 1 technical access is broken.
The next lesson covers the CITED framework, which turns this diagnosis into an execution plan.
Key takeaways
- Commercial queries reward semantic breadth over exact matchingFor commercial queries, 90% of cited pages shared less than 30% of their words with the original question. AI models reward semantic breadth over exact-match phrasing, so synonym-rich, concept-driven content outperforms keyword-stuffed pages.
- E-E-A-T Signals are still importantNamed expert bylines correlate with higher citation rates across AI models. Attributing your content to a specific person with visible expertise signals authority at the exact moment a model decides whether your page is worth quoting.
- Stale content is invisible contentPages updated within the last three months are 3x more likely to earn a citation. Freshness acts as a hard filter that determines whether your page even enters the candidate pool.
- Third-party sources drive 85% of top-of-funnel brand mentionsFor top-of-funnel B2B brand mentions in AI search, 85% come from third-party sources like review sites, publishers, and community discussions. Aligning your off-site presence with your brand positioning strengthens trust signals for AI models.
FAQs
Query fan-out is the process where an AI model breaks a single user question into multiple focused sub-queries to gather better context before generating an answer. ChatGPT averages about 3 sub-queries per fan-out, while Google AI Mode runs around 9. This happens roughly 40% of the time, with questions that include specific dates or details triggering it more frequently.