The Science of AI Search: What Princeton's Research Reveals About Getting Cited by ChatGPT
A data-driven analysis of what actually makes websites appear in AI-generated answers, based on peer-reviewed research from Princeton and Georgia Tech.
Until recently, most advice about getting cited by AI was guesswork. People took what they knew about Google SEO, assumed it applied to ChatGPT, and called it a strategy. There was no real data behind any of it.
That changed. We now have peer-reviewed experiments, million-query analyses, and controlled tests that show what actually moves the needle. Some of the results are predictable. Others are genuinely surprising. Keyword stuffing, for instance, does absolutely nothing for AI visibility. Zero. And the sites that gain the most from optimization? The ones that rank worst in traditional search.
I went through every major study published through late 2025 and pulled out the numbers. Here is what the research actually says.
The Princeton/Georgia Tech GEO Study (KDD 2024)
This is the gold standard so far. Published at KDD 2024 (one of the top data science conferences), researchers from Princeton and Georgia Tech ran controlled experiments on how content optimization affects whether AI cites your page. The paper is called "GEO: Generative Engine Optimization," and it introduced both a benchmark and a testing framework.
What They Studied
They built something called GEO-BENCH: roughly 10,000 real user queries covering all kinds of topics and intents. Then they took actual website content and modified it using nine different optimization strategies, measuring each against unmodified baselines.
This was not a survey. It was not correlational. They changed the content, ran it through AI systems, and measured what happened. Clean experimental design.
The Nine Strategies and Their Results
| Optimization Strategy | Visibility Impact | Notes |
|---|---|---|
| Adding statistics with credible sources | +41% average improvement | Strongest single-strategy improvement across all query types |
| Citing authoritative sources | +115% for lower-ranked sites | The most dramatic finding — disproportionately benefits sites outside the top-10 |
| Expert quotations | +28% improvement | Adding named expert quotes with credentials |
| Fluency optimization | +15-30% improvement | Improving readability, sentence structure, and coherence |
| Authoritative tone | Strong effect for historical/factual content | Less effective for subjective or opinion-based queries |
| Keyword stuffing | 0% improvement | No measurable effect on AI visibility whatsoever |
| Technical jargon addition | Minimal to no effect | Unlike traditional SEO, adding keywords without substance is ignored |
| Simple language simplification | Marginal effect | Minor improvements in some categories |
| Unique terminology/wording | Mixed results | Context-dependent; not a reliable strategy |
The Big One: Lower-Ranked Sites Win More
Here is the finding I keep coming back to: sites ranked lower in traditional organic search got the biggest gains from content optimization. Citing authoritative sources gave lower-ranked sites a 115% visibility boost, while sites already in the top positions saw only modest improvements.
In traditional SEO, big sites get bigger. High domain authority compounds. It is a rich-get-richer system. AI search works differently. A well-optimized page from a site nobody has heard of can outperform a lazy page from a household brand. The gap between big and small is still there, but it is much narrower than in Google's organic results.
BrightEdge Industry Data (2024-2025)
BrightEdge is one of the biggest enterprise SEO platforms, and they have been tracking how traditional rankings relate to AI citations across millions of queries. Their data gives us a wide-angle view of what AI search actually pulls from.
Key Metrics
- AI Overview citations overlapping with organic top-10 results grew from 32.3% to 54.5% between early and late 2024. So Google's AI Overviews are increasingly pulling from organically strong pages, but close to half of citations still come from outside the top-10.
- The overlap varies wildly by industry. Healthcare, Insurance, and Education show 68-75% overlap between organic rankings and AI citations. Local services and e-commerce? Much lower.
- YouTube gets about 200 times more AI citations than any other video platform. If you are doing video for AI visibility, YouTube is not one option among many. It is the only option that matters.
- About 10% of AI citations come from social platforms, mostly LinkedIn and Reddit. Not peer-reviewed, but it lines up with what you see when you look at AI responses.
Note: BrightEdge data is proprietary industry research, not peer-reviewed. Their methodology tracks patterns across enterprise clients, but the raw data is not publicly auditable.
Ahrefs: 56 Million AI Overviews Analyzed
Ahrefs looked at roughly 56 million Google AI Overviews to figure out where AI is pulling its sources from. The numbers are hard to ignore.
The Data
| Finding | Data Point | Implication |
|---|---|---|
| Sources NOT ranking organically for the query | 80% | The vast majority of AI-cited pages would not appear in traditional search results for that query |
| Chance of being cited even at organic position #1-3 | ~8% | Ranking at the top of Google gives you less than a 1-in-10 chance of being cited by AI |
| Average number of sources cited per AI response | 6-14 sources | AI casts a much wider net than traditional search, pulling from many pages |
| Sources from outside the top-100 organic results | Significant portion | AI finds and cites content that traditional search does not surface at all |
What This Means
The takeaway is simple and uncomfortable: traditional SEO rankings tell you very little about AI visibility. Four out of five sources cited by AI do not rank organically for that query. And even if you hold the #1 position on Google, your odds of being cited in the AI response are about 8%.
That does not make traditional SEO irrelevant. Organic ranking is one of several signals. But if your entire strategy is built around Google rankings and you are using that as a proxy for AI visibility, you are probably wrong about where you stand.
SearchVIU Study: Schema Markup and AI (October 2025)
A lot of SEO practitioners assumed structured data (JSON-LD schema markup) would directly improve AI visibility. It makes sense on paper. SearchVIU actually tested it.
Methodology and Findings
They looked at whether five AI systems, including ChatGPT, Google's Gemini, and Perplexity, could read and use data from JSON-LD schema markup embedded in web pages.
The answer: none of them could. Zero out of five AI systems pulled data directly from schema markup in the page source.
But that does not mean schema is a waste of time. The study found it works through a side door. Schema helps Google understand and index your content better, and AI systems that source from Google's index then benefit from that improved categorization. The effect is real, just indirect. Schema helps search engines understand what your page is about, and AI picks up on that improved understanding.
Worth noting: you should still implement schema. But the reason it helps with AI is different from what most people think. It is a second-order effect.
Digital Bloom: Multi-Platform Presence and Content Freshness
Digital Bloom looked at two factors that do not get enough attention in the AI visibility conversation: how many platforms you are on, and how recently you have updated your content.
Platform Presence
Businesses active on 4 or more platforms (website, LinkedIn, YouTube, industry directories, Reddit, etc.) got 2.8 times more AI citations than businesses on fewer platforms. This is industry data, not peer-reviewed, but a 2.8x difference is hard to dismiss.
Why would this matter? AI systems check information across multiple sources. If your business shows up consistently on several platforms with matching details, the AI can cross-reference and verify you more easily. It is more likely to cite you with confidence.
Content Freshness
76.4% of top-cited pages had been updated within the past 30 days. AI appears to strongly prefer current content. Pages that were authoritative two years ago but have not been touched since? They get passed over.
Original Research and Data
Pages with original research, proprietary data, or unique datasets had 30-40% higher visibility in AI responses compared to pages that just repackaged information from other sources. AI seems to clearly prefer primary sources over content that summarizes or rephrases what someone else already said.
A caveat on Digital Bloom's data: it comes from their client portfolio and their own industry analysis. The sample is smaller than what BrightEdge or Ahrefs work with, and none of it is peer-reviewed. The direction of the findings is consistent with the other research, but treat the exact percentages as rough rather than precise.
What This Means in Practice
After going through all of this research, a few things are clear. Not as a neat package of conclusions, but as general patterns that hold up across independent data sources.
1. Traditional SEO and AI visibility are different games.
There is overlap. Good content helps in both. But the mechanics are different enough that an 80% non-overlap rate (Ahrefs) is not some minor gap. If you are good at SEO, that does not mean you are visible to AI. They are separate problems.
2. Smaller sites have a real shot in AI search.
The Princeton GEO finding on this is striking: lower-ranked sites gained 115% more visibility just by citing authoritative sources. In traditional SEO, competing against a big established site is brutal. Domain authority, backlink profiles, brand recognition, they all compound against you. In AI search, the content on the page matters more than who published it. Small sites do not win automatically, but they are not locked out either.
3. What you say on the page matters more than who you are.
Every study pointed in the same direction: the factors that most reliably improved AI visibility were things you put in your content. Statistics with sources. Expert quotes. Authoritative citations. Fresh information. Original data. The domain-level stuff (backlinks, domain age, how much traffic you already get) had weaker and less consistent effects.
The 80/20 of AI Visibility: The Factors That Actually Matter
Pulling from all the research, roughly 10 factors appear to drive most AI visibility outcomes. Here is each one with how strong the evidence is:
| # | Factor | Evidence Level | Source(s) |
|---|---|---|---|
| 1 | Statistics with credible source citations in content | Peer-reviewed (+41%) | Princeton GEO (KDD 2024) |
| 2 | Citing authoritative external sources | Peer-reviewed (+115% for lower-ranked sites) | Princeton GEO (KDD 2024) |
| 3 | Content freshness (updated within 30 days) | Industry data (76.4% of top-cited pages) | Digital Bloom |
| 4 | Multi-platform presence (4+ platforms) | Industry data (2.8x more citations) | Digital Bloom |
| 5 | Expert quotations with credentials | Peer-reviewed (+28%) | Princeton GEO (KDD 2024) |
| 6 | Original research/proprietary data | Industry data (30-40% higher visibility) | Digital Bloom |
| 7 | Content fluency and readability | Peer-reviewed (+15-30%) | Princeton GEO (KDD 2024) |
| 8 | YouTube video presence for relevant topics | Industry data (200x more citations than other video) | BrightEdge |
| 9 | Structured data/schema (indirect effect) | Partially confirmed (works via index, not directly) | SearchVIU (Oct 2025) |
| 10 | Authoritative tone for factual content | Peer-reviewed (strong for historical/factual) | Princeton GEO (KDD 2024) |
What Does NOT Work
- Keyword stuffing: 0% improvement (Princeton GEO, peer-reviewed)
- Technical jargon for its own sake: minimal effect (Princeton GEO)
- Schema markup as a direct signal: not read by AI systems (SearchVIU)
- High organic ranking alone: only 8% citation chance even at position #1-3 (Ahrefs)
What We Still Do Not Know
There are real gaps in this research, and it would be dishonest to skip over them.
- The GEO paper tested against specific AI systems at a specific point in time. These models change fast, and what worked in early 2024 testing might work differently now. The paper gives us the best evidence we have, not a permanent playbook.
- The industry data from BrightEdge, Ahrefs, and Digital Bloom is observational. They are seeing correlations, not running experiments. Strong correlations, but still.
- Everything is moving. Google's AI Overviews, ChatGPT's browsing, Perplexity's citation logic, all of it is in active development. The specifics could look different a year from now.
- There is likely sample bias in the industry studies. Enterprise SEO platforms mostly track bigger sites, so the data might not perfectly represent what happens for smaller businesses.
None of that invalidates the findings. The consistent direction across multiple independent sources is meaningful. But the specific numbers will shift as the field develops, and anyone telling you otherwise is selling certainty they do not have.
Key Findings Summary
- AI search is a fundamentally different system from traditional organic search, with only ~20% overlap in sourced content (Ahrefs, 56M AI Overviews).
- Content-level optimization (statistics, citations, expert quotes, freshness) produces measurable gains in AI visibility, backed by peer-reviewed research (Princeton GEO, KDD 2024).
- Smaller, lower-ranked sites benefit disproportionately from AI content optimization, with up to +115% visibility gains (Princeton GEO).
- Keyword stuffing has zero effect on AI visibility. Not weak. Not marginal. Zero.
- Schema markup does not directly influence AI systems but works indirectly through improved search engine indexing (SearchVIU, October 2025).
- Multi-platform presence (4+ platforms) correlates with 2.8x more AI citations (Digital Bloom, industry data).
- Content freshness is critical: 76.4% of top-cited pages were updated within 30 days (Digital Bloom).
- Original research and proprietary data yield 30-40% higher AI visibility than derivative content (Digital Bloom).
Check Your AI Visibility Score
We built Klyva Audit to check all of these research-backed factors automatically. In under 60 seconds, you get a scored breakdown of your site's AI visibility across every category in this article, from content optimization signals to multi-platform presence to structured data.
No guesswork. Just the factors that peer-reviewed research and large-scale industry data say actually matter.
Sources: Aggarwal et al., "GEO: Generative Engine Optimization," KDD 2024; BrightEdge Generative AI Search Research, 2024-2025; Ahrefs AI Overviews Study (56M queries); SearchVIU Schema Markup and AI Study, October 2025; Digital Bloom AI Citation Analysis, 2025.
Check your AI Visibility score
Find out if AI assistants are recommending your business — or your competitors.
Scan My Site FreeFree. No credit card. Results in 20 seconds.