Ordinary RAG embeds your corpus, then grabs the top-k chunks nearest to a query. Perfect for "what does clause 7 say." Useless for "what are the main themes across 1,500 documents" — no single chunk holds that answer. GraphRAG rebuilds retrieval as graph + summarization, so every corner of the corpus gets a vote.
loading…
A local question lands on a few chunks — vector RAG nails it. A global question has its answer smeared across the whole corpus; top-k can only ever see a sliver. GraphRAG answers the second kind by pre-summarizing clusters of the corpus and merging every summary's partial answer.
Vector RAG retrieves the k chunks nearest your query and stuffs them in the prompt. When the answer lives in a handful of chunks, that's perfect. When it's spread across all of them, top-k retrieves a few and misses the rest. Pick a question and watch which chunks each method actually sees.
● chunk seen by the method · ○ chunk never retrieved. A local question concentrates the answer in a few chunks (top-k wins on cost). A global question scatters it — top-k sees only its lucky 4, GraphRAG's summaries cover all 16.
The answer sits in one or two chunks. Nearest-neighbour retrieval lands right on them. Cheap, fast, accurate — vector RAG's home turf.
No chunk contains "the themes." The answer is an emergent property of the whole corpus — exactly what top-k structurally cannot assemble.
"Reasoning over connections to anticipate trajectories." The paper's framing: an analyst's question, not a fact lookup. Structure — who relates to whom — becomes the retrieval signal.
Before any question is asked, GraphRAG does the costly work: chunk the corpus, have an LLM extract entities + relationships from each chunk, then aggregate duplicate mentions into one graph. Drag the chunk size and watch the recall / cost tradeoff that drives the whole design.
Bigger chunks mean fewer LLM calls — cheaper — but the model "forgets" to extract toward the end of a long chunk, so recall drops. The paper found 2,400-token chunks extracted roughly half the entities of 600-token ones.
Gleaning is the fix: after the first pass, ask the model "did you miss any entities?" and re-extract a few rounds. Self-reflection lets you use cheap big chunks and keep recall.
Note: the reference pipeline merges duplicate entities by normalized name match — basically string match, not semantic resolution. "MS" / "Microsoft" / "Microsoft Corp." can fragment into separate nodes. Real deployments bolt on semantic dedup.
A graph alone isn't an answer. GraphRAG runs the Leiden algorithm to partition entities into communities of densely-related nodes — and it's hierarchical, giving nested levels. Then it summarizes each community bottom-up. Press the buttons to detect communities and watch the summaries form.
Every node lands in exactly one community at each level, and the levels nest. That multi-resolution hierarchy — coarse summaries on top of fine ones — is the product. Each summary is a pre-computed "what this cluster of the corpus is about."
Query time is cheap because it only touches summaries, never raw text. Map: ask the question of every community summary and score each partial answer 0–100. Reduce: drop the zeros, sort by score, pack the best into one window, synthesize. Step through it below.
Why shuffle? To spread relevant info across chunks instead of letting it concentrate in one and get crowded out. Every community gets to vote — that's why global search can answer "what are the themes" when top-k can't.
Leiden's levels are a resolution dial. C0 is a few coarse summaries — cheap, broad, loses detail. C3 is many fine ones — expensive, detailed. Slide it and watch the cost / breadth tradeoff. The surprise from the paper: even the coarse top level is a strong default.
The paper's headline result: intermediate and low-level summaries beat a vector-RAG baseline on comprehensiveness and diversity — and even the coarse root C0 still won ~72% comprehensiveness head-to-head, while using 97% fewer context tokens than feeding raw source text.
That's the lever: the cheap coarse summaries are a remarkably strong default. You only pay for the fine levels when the question actually needs the detail. Match the zoom to the question.
They also measured directness ("how concise and on-point") — and GraphRAG loses it to vector RAG, on purpose. Comprehensiveness and conciseness trade off; no method should win all four metrics. Reporting a metric you lose is a credibility signal.
GraphRAG is not a free upgrade over vector RAG. The win is task-dependent, and the index bill is real. Here's the honest ledger and the family of variants that grew up to fix it.
| Variant | Index cost | Key idea | Best for |
|---|---|---|---|
| GraphRAG | high (LLM extract + summarize) | entity graph + Leiden + community summaries | global sensemaking on a stable corpus |
| LazyGraphRAG | ~vector RAG | defer all LLM use to query time; cheap NLP graph | comparable quality, >700× lower query cost |
| DRIFT search | = GraphRAG | blend local + community info | questions between local & global |
| Local search | = GraphRAG | walk entity neighbourhood (graph-guided RAG) | specific, entity-anchored questions |
Multi-pass extraction + gleaning + per-community summaries is a big LLM bill and ~hours of wall-clock. Microsoft's own LazyGraphRAG defers it — a tacit admission the original is too expensive for most use.
Answers are bounded by what got extracted. NLP graph-building has ~15–20% error rates; if a fact never made it into the graph, retrieval can't find it. GraphRAG can underperform plain RAG on precise QA.
Win rates come from an LLM judge comparing answers to LLM-synthesized questions with no ground-truth. Read "72–83%" as relative preference, not measured correctness.
Atlas already builds GraphRAG's first half, for code. Graphify extracts an entity/relation graph (symbols, files, callers/callees, imports) with deterministic AST parsing — steps 1–3 of indexing, already done and cron-fresh, skipping GraphRAG's biggest cost and biggest error source.
The missing pieces are Leiden communities + community summaries over the org graph. Add them and you get global search over the codebase: "what are the architectural themes across the 60 repos," "what does the auth subsystem do as a whole" — org-level sensemaking that today needs a human reading many files.
Reach for GraphRAG when many broad questions hit one stable corpus and you can amortize the index. For one-off queries on fresh data, or precise "what's value X" lookups, plain vector RAG is often as good and far cheaper.
Cost lesson, applied: the LazyGraphRAG stance fits us — keep the cheap deterministic AST graph, compute community reports lazily / incrementally instead of re-summarizing 60 repos on every cron tick.
Local vs global. The entity-graph index and the chunk/gleaning tradeoff. Leiden communities and bottom-up summaries. Map-reduce global search, the zoom dial, and the honest cost ledger. You now know why "retrieve over the graph" answers the question top-k never could — and when not to bother.
The same "many partial views, one router that picks" pattern, but inside the network — sparse experts and a tiny gate.