openalicelabs / academy
COURSE RAG-04 LESSON 04 · 03 TOPIC GRAPHRAG EST. READ ~13 MIN
OPENALICE LABORATORIES · EDUCATION PATH · RETRIEVAL 04 · 03

Retrieve over
the graph.

Ordinary RAG embeds your corpus, then grabs the top-k chunks nearest to a query. Perfect for "what does clause 7 say." Useless for "what are the main themes across 1,500 documents" — no single chunk holds that answer. GraphRAG rebuilds retrieval as graph + summarization, so every corner of the corpus gets a vote.

FIG.00 — CORPUS → GRAPH → SUMMARIES
loading…
FIG.0A — TWO QUESTION CLASSES · LOCAL lookup · GLOBAL sensemaking

A local question lands on a few chunks — vector RAG nails it. A global question has its answer smeared across the whole corpus; top-k can only ever see a sliver. GraphRAG answers the second kind by pre-summarizing clusters of the corpus and merging every summary's partial answer.

INPUTa private/narrative corpus — too big for the context window
INDEXan LLM-extracted entity–relationship knowledge graph
CLUSTERINGLeiden communities — nested C0 → C1 → C2 → C3
QUERY (GLOBAL)map-reduce over pre-computed community summaries
BEST AT"what are the themes / overall story" — not "what is value X"
01 / 06
The failure vector RAG can't fix

Top-k can't see the whole corpus.

Vector RAG retrieves the k chunks nearest your query and stuffs them in the prompt. When the answer lives in a handful of chunks, that's perfect. When it's spread across all of them, top-k retrieves a few and misses the rest. Pick a question and watch which chunks each method actually sees.

FIG.01 — SAME 16-CHUNK CORPUS · VECTOR top-k vs GRAPH global

Vector RAG · top-4— seen
GraphRAG · global— seen

chunk seen by the method · chunk never retrieved. A local question concentrates the answer in a few chunks (top-k wins on cost). A global question scatters it — top-k sees only its lucky 4, GraphRAG's summaries cover all 16.

LOCAL QUESTION

"what's clause 7?"

The answer sits in one or two chunks. Nearest-neighbour retrieval lands right on them. Cheap, fast, accurate — vector RAG's home turf.

GLOBAL QUESTION ★

"what are the themes?"

No chunk contains "the themes." The answer is an emergent property of the whole corpus — exactly what top-k structurally cannot assemble.

SENSEMAKING

connect the dots

"Reasoning over connections to anticipate trajectories." The paper's framing: an analyst's question, not a fact lookup. Structure — who relates to whom — becomes the retrieval signal.

02 / 06
Phase 1 · build the graph (offline, expensive)

From text to an entity graph.

Before any question is asked, GraphRAG does the costly work: chunk the corpus, have an LLM extract entities + relationships from each chunk, then aggregate duplicate mentions into one graph. Drag the chunk size and watch the recall / cost tradeoff that drives the whole design.

FIG.02 — CHUNK SIZE → ENTITIES EXTRACTED · the gleaning knob
600 tok 2400 tok
RELATIVE ENTITY RECALL — one bar per pass

CHUNK SIZE600 tok
LLM CALLS (relative)
ENTITIES EXTRACTED (rel.)
GLEANINGoff
// phase 1 — indexing 1. chunk 600-tok units, 100 overlap 2. extract entities · relations · claims (LLM) 3. glean "did you miss any?" — re-extract 4. aggregate duplicate mentions → nodes/edges edge weight = times the relation was seen

Bigger chunks mean fewer LLM calls — cheaper — but the model "forgets" to extract toward the end of a long chunk, so recall drops. The paper found 2,400-token chunks extracted roughly half the entities of 600-token ones.

Gleaning is the fix: after the first pass, ask the model "did you miss any entities?" and re-extract a few rounds. Self-reflection lets you use cheap big chunks and keep recall.

Note: the reference pipeline merges duplicate entities by normalized name match — basically string match, not semantic resolution. "MS" / "Microsoft" / "Microsoft Corp." can fragment into separate nodes. Real deployments bolt on semantic dedup.

03 / 06
The trick · cluster, then summarize

Leiden carves the graph into communities.

A graph alone isn't an answer. GraphRAG runs the Leiden algorithm to partition entities into communities of densely-related nodes — and it's hierarchical, giving nested levels. Then it summarizes each community bottom-up. Press the buttons to detect communities and watch the summaries form.

FIG.03 — LIVE COMMUNITY DETECTION · color = community
ENTITIES (nodes)18
RELATIONS (edges)
COMMUNITIES FOUND
// phase 1 — clustering + summarize 5. Leiden partition graph → communities recursive: C0 coarse → C3 leaf, nested 6. summarize each community bottom-up: leaf → report from its entities + relations parent → roll up child reports to fit window

Every node lands in exactly one community at each level, and the levels nest. That multi-resolution hierarchy — coarse summaries on top of fine ones — is the product. Each summary is a pre-computed "what this cluster of the corpus is about."

04 / 06
Phase 2 · global search (the headline mode)

Ask every summary, then merge.

Query time is cheap because it only touches summaries, never raw text. Map: ask the question of every community summary and score each partial answer 0–100. Reduce: drop the zeros, sort by score, pack the best into one window, synthesize. Step through it below.

FIG.04 — GLOBAL SEARCH · MAP → SCORE → FILTER → REDUCE
COMMUNITY SUMMARIES · helpfulness 0–100 per chunk
phase map

SUMMARIES MAPPED0
SCORE-0 DROPPED
PACKED INTO ANSWER
// phase 2 — global search 1. shuffle + pack summaries into chunks 2. map answer Q from each chunk alone + emit a helpfulness 0–100 3. filter drop score-0 partials 4. reduce sort by score, fill an 8k window, synthesize one final answer

Why shuffle? To spread relevant info across chunks instead of letting it concentrate in one and get crowded out. Every community gets to vote — that's why global search can answer "what are the themes" when top-k can't.

05 / 06
The hierarchy is the dial

Pick the zoom level for the question.

Leiden's levels are a resolution dial. C0 is a few coarse summaries — cheap, broad, loses detail. C3 is many fine ones — expensive, detailed. Slide it and watch the cost / breadth tradeoff. The surprise from the paper: even the coarse top level is a strong default.

FIG.05 — COMMUNITY LEVEL · summaries vs context cost
C0 coarse C3 leaf
LEVEL C0 — ROOT

COMMUNITIES AT LEVEL
CONTEXT TOKENS (rel.)
RESOLUTION

The paper's headline result: intermediate and low-level summaries beat a vector-RAG baseline on comprehensiveness and diversity — and even the coarse root C0 still won ~72% comprehensiveness head-to-head, while using 97% fewer context tokens than feeding raw source text.

That's the lever: the cheap coarse summaries are a remarkably strong default. You only pay for the fine levels when the question actually needs the detail. Match the zoom to the question.

THE HONEST CONTROL

They also measured directness ("how concise and on-point") — and GraphRAG loses it to vector RAG, on purpose. Comprehensiveness and conciseness trade off; no method should win all four metrics. Reporting a metric you lose is a credibility signal.

06 / 06
When NOT to reach for it · and the zoo

The cost, the cousins, the caveats.

GraphRAG is not a free upgrade over vector RAG. The win is task-dependent, and the index bill is real. Here's the honest ledger and the family of variants that grew up to fix it.

VariantIndex costKey ideaBest for
GraphRAGhigh (LLM extract + summarize)entity graph + Leiden + community summariesglobal sensemaking on a stable corpus
LazyGraphRAG~vector RAGdefer all LLM use to query time; cheap NLP graphcomparable quality, >700× lower query cost
DRIFT search= GraphRAGblend local + community infoquestions between local & global
Local search= GraphRAGwalk entity neighbourhood (graph-guided RAG)specific, entity-anchored questions
COST IS THE DEALBREAKER

hours per million tokens

Multi-pass extraction + gleaning + per-community summaries is a big LLM bill and ~hours of wall-clock. Microsoft's own LazyGraphRAG defers it — a tacit admission the original is too expensive for most use.

COVERAGE GAP

only as good as extraction

Answers are bounded by what got extracted. NLP graph-building has ~15–20% error rates; if a fact never made it into the graph, retrieval can't find it. GraphRAG can underperform plain RAG on precise QA.

JUDGE THE EVAL

LLM-judged, no gold

Win rates come from an LLM judge comparing answers to LLM-synthesized questions with no ground-truth. Read "72–83%" as relative preference, not measured correctness.

HOW IT CONNECTS TO OPENALICE

Atlas already builds GraphRAG's first half, for code. Graphify extracts an entity/relation graph (symbols, files, callers/callees, imports) with deterministic AST parsing — steps 1–3 of indexing, already done and cron-fresh, skipping GraphRAG's biggest cost and biggest error source.

The missing pieces are Leiden communities + community summaries over the org graph. Add them and you get global search over the codebase: "what are the architectural themes across the 60 repos," "what does the auth subsystem do as a whole" — org-level sensemaking that today needs a human reading many files.

WHEN TO ACTUALLY USE IT

Reach for GraphRAG when many broad questions hit one stable corpus and you can amortize the index. For one-off queries on fresh data, or precise "what's value X" lookups, plain vector RAG is often as good and far cheaper.

Cost lesson, applied: the LazyGraphRAG stance fits us — keep the cheap deterministic AST graph, compute community reports lazily / incrementally instead of re-summarizing 60 repos on every cron tick.

04 · 03 — you made it

You queried
the graph.

Local vs global. The entity-graph index and the chunk/gleaning tradeoff. Leiden communities and bottom-up summaries. Map-reduce global search, the zoom dial, and the honest cost ledger. You now know why "retrieve over the graph" answers the question top-k never could — and when not to bother.

04·01 Embeddings & vector RAG · nearest-neighbour retrieval over chunks ✓ done
04·03 GraphRAG · graph + community summaries · global sensemaking ✓ complete
04·04 Agent memory systems · graph + summary recall over long histories next
04·05 Model routing · send broad asks to graph-global, precise asks to vector locked
Next · related path

Mixture-of-Experts →

The same "many partial views, one router that picks" pattern, but inside the network — sparse experts and a tiny gate.

openalicelabs