How AI Models Choose What to Cite

You're creating content that AI models quietly skip over, and without knowing why, you can't change the outcome. Alex walks you through the exact gates your content must pass to earn a citation, so you can diagnose what's failing and fix it at the source.

What You'll Learn

AI models don't browse the web the way humans do. They form a brand narrative consensus by pulling from everything available: on-site pages, off-site mentions, owned content, and earned coverage.

Your content gets filtered and cut at every step of that process. Most of it never reaches the final answer.

Earning citations comes from building systems that maintain consistency across your content ecosystem and produce high information gain content that stands out from what's already indexed.

How query fan-out works

When a model needs fresh or deeper context to answer a question, it generates real-time sub-queries behind the scenes. This is called query fan-out.

ChatGPT averages about 3 sub-queries per fan-out. Google AI Mode averages around 9.
Fan-out happens roughly 40% of the time, and questions that include specific dates, numbers, or details trigger it more often.
Even when pages are fetched, only about 15% earn a citation in the final answer.
The top 10 most-cited domains account for less than 12% of all citations. This is a long-tail game, which means smaller and mid-size sites have real opportunity.

The FACT framework

Alex introduces the FACT framework: four sequential gates your content must pass to earn a citation. FACT stands for Findable, Agent Aligned, Citable, and Trusted.

Each gate is pass/fail. If you fail at Gate 1, Gates 2 through 4 are irrelevant.
Each gate maps to a different type of fix:
- Gate 1 (Findable): Technical fixes
- Gate 2 (Agent Aligned): Pre-click signal optimization
- Gate 3 (Citable): Content quality improvements
- Gate 4 (Trusted): Ecosystem and off-site strategy

Keeping your brand information accurate and up to date is a baseline requirement across all four gates.

Gate 1: Findable

This is the technical floor. Can AI models discover and read your content?

Check these fundamentals:

Your pages are indexed by Google
Your robots.txt isn't blocking AI crawlers like GPTBot or ClaudeBot
No paywalls or login walls preventing access
Pages render quickly and reliably

Retrieval is text-first. Content locked inside images, charts, or video without supporting text copy can't be extracted by AI models. If your key information lives only in visual formats, it's effectively invisible.

Gate 2: Agent aligned

Before a model opens your page, it decides whether to click based on pre-click signals only: URL, title tag, snippet, and freshness date.

For informational queries, precision matters. Alex shares data showing that titles with around 65% cosine similarity to the query, URL slugs at 67% similarity, and matching phrases like how-to all increase the likelihood of earning a citation.
For commercial queries, the pattern flips. Semantic breadth beats exact matching. About 25% of cited pages use synonyms rather than the exact query terms, and 90% of cited pages had less than 30% word overlap with the query.

Gate 3: Citable

This is where most content fails. Only 15% of pages that get fetched actually earn a citation.

Five signals determine whether your content gets cited:
- Information gain: Unique data, original research, proprietary benchmarks, or expert perspective. Content that adds something new to the conversation.
- Freshness: Content updated within the last 3 months is 3x more likely to be cited than older content.
- Answer shape: Clear, structured, extractable passages that a model can pull directly into a response.
- Authority: Named expert bylines correlate with higher citation rates. Attributing content to a real person matters.
- Depth over breadth: Specificity wins. Going deep on a focused topic beats shallow coverage of many topics.

Gate 4: Trusted

Alex flags this as the most commonly missed gate. Even if your on-site content is perfect, the broader web has to back you up.

AI models form consensus across sources. If third-party content doesn't support your brand claims, you lose credibility in the model's answer.
For top-of-funnel B2B brand mentions in AI search, 85% come from third-party content: review sites, affiliates, publishers, and community discussions.
To strengthen this gate, audit your third-party mentions, build off-site presence, maintain entity consistency (how your brand is described across the web), and monitor the quality of how you're represented.

Diagnosing and fixing citation gaps

For any page that isn't earning citations, start by identifying which FACT gate fails first.

Fix that gate before moving to the next one. There's no value in optimizing Gate 3 content quality if Gate 1 technical access is broken.

The next lesson covers the CITED framework, which turns this diagnosis into an execution plan.

Key takeaways

Commercial queries reward semantic breadth over exact matchingFor commercial queries, 90% of cited pages shared less than 30% of their words with the original question. AI models reward semantic breadth over exact-match phrasing, so synonym-rich, concept-driven content outperforms keyword-stuffed pages.
E-E-A-T Signals are still importantNamed expert bylines correlate with higher citation rates across AI models. Attributing your content to a specific person with visible expertise signals authority at the exact moment a model decides whether your page is worth quoting.
Stale content is invisible contentPages updated within the last three months are 3x more likely to earn a citation. Freshness acts as a hard filter that determines whether your page even enters the candidate pool.
Third-party sources drive 85% of top-of-funnel brand mentionsFor top-of-funnel B2B brand mentions in AI search, 85% come from third-party sources like review sites, publishers, and community discussions. Aligning your off-site presence with your brand positioning strengthens trust signals for AI models.

FAQs

Query fan-out is the process where an AI model breaks a single user question into multiple focused sub-queries to gather better context before generating an answer. ChatGPT averages about 3 sub-queries per fan-out, while Google AI Mode runs around 9. This happens roughly 40% of the time, with questions that include specific dates or details triggering it more frequently.

How AI Models Choose What to Cite

What You'll Learn

How query fan-out works

The FACT framework

Gate 1: Findable

Gate 2: Agent aligned

Gate 3: Citable

Gate 4: Trusted

Diagnosing and fixing citation gaps

Key takeaways

FAQs

What is query fan-out in AI search?

What percentage of pages that AI models fetch actually get cited?

How do AI models decide whether to open a page from search results?

What is information gain and why does it matter for AI citations?

Does content freshness affect whether AI models cite your page?

What is the FACT framework for AI citation optimization?

How do you check if AI crawlers can access your website?

How should you structure content so AI models can extract and cite it?

Do you need high domain authority to earn AI citations?

What is entity consistency and why does it matter for AI trust?

Resources