OpenAlice Academy — 04 · 03 / GraphRAG

01 / 06

The failure vector RAG can't fix

Top-k can't see the whole corpus.

Vector RAG retrieves the k chunks nearest your query and stuffs them in the prompt. When the answer lives in a handful of chunks, that's perfect. When it's spread across all of them, top-k retrieves a few and misses the rest. Pick a question and watch which chunks each method actually sees.

FIG.01 — SAME 16-CHUNK CORPUS · VECTOR top-k vs GRAPH global

Vector RAG · top-4— seen

GraphRAG · global— seen

● chunk seen by the method · ○ chunk never retrieved. A local question concentrates the answer in a few chunks (top-k wins on cost). A global question scatters it — top-k sees only its lucky 4, GraphRAG's summaries cover all 16.

LOCAL QUESTION

"what's clause 7?"

The answer sits in one or two chunks. Nearest-neighbour retrieval lands right on them. Cheap, fast, accurate — vector RAG's home turf.

GLOBAL QUESTION ★

"what are the themes?"

No chunk contains "the themes." The answer is an emergent property of the whole corpus — exactly what top-k structurally cannot assemble.

SENSEMAKING

connect the dots

"Reasoning over connections to anticipate trajectories." The paper's framing: an analyst's question, not a fact lookup. Structure — who relates to whom — becomes the retrieval signal.

02 / 06

Phase 1 · build the graph (offline, expensive)

From text to an entity graph.

Before any question is asked, GraphRAG does the costly work: chunk the corpus, have an LLM extract entities + relationships from each chunk, then aggregate duplicate mentions into one graph. Drag the chunk size and watch the recall / cost tradeoff that drives the whole design.

FIG.02 — CHUNK SIZE → ENTITIES EXTRACTED · the gleaning knob

600 tok 2400 tok

RELATIVE ENTITY RECALL — one bar per pass

CHUNK SIZE600 tok

LLM CALLS (relative)—

ENTITIES EXTRACTED (rel.)—

GLEANINGoff

// phase 1 — indexing 1. chunk 600-tok units, 100 overlap 2. extract entities · relations · claims (LLM) 3. glean "did you miss any?" — re-extract 4. aggregate duplicate mentions → nodes/edges edge weight = times the relation was seen

Bigger chunks mean fewer LLM calls — cheaper — but the model "forgets" to extract toward the end of a long chunk, so recall drops. The paper found 2,400-token chunks extracted roughly half the entities of 600-token ones.

Gleaning is the fix: after the first pass, ask the model "did you miss any entities?" and re-extract a few rounds. Self-reflection lets you use cheap big chunks and keep recall.

Note: the reference pipeline merges duplicate entities by normalized name match — basically string match, not semantic resolution. "MS" / "Microsoft" / "Microsoft Corp." can fragment into separate nodes. Real deployments bolt on semantic dedup.

03 / 06

The trick · cluster, then summarize

Leiden carves the graph into communities.

A graph alone isn't an answer. GraphRAG runs the Leiden algorithm to partition entities into communities of densely-related nodes — and it's hierarchical, giving nested levels. Then it summarizes each community bottom-up. Press the buttons to detect communities and watch the summaries form.

FIG.03 — LIVE COMMUNITY DETECTION · color = community

ENTITIES (nodes)18

RELATIONS (edges)—

COMMUNITIES FOUND—

// phase 1 — clustering + summarize 5. Leiden partition graph → communities recursive: C0 coarse → C3 leaf, nested 6. summarize each community bottom-up: leaf → report from its entities + relations parent → roll up child reports to fit window

Every node lands in exactly one community at each level, and the levels nest. That multi-resolution hierarchy — coarse summaries on top of fine ones — is the product. Each summary is a pre-computed "what this cluster of the corpus is about."

04 / 06

Phase 2 · global search (the headline mode)

Ask every summary, then merge.

Query time is cheap because it only touches summaries, never raw text. Map: ask the question of every community summary and score each partial answer 0–100. Reduce: drop the zeros, sort by score, pack the best into one window, synthesize. Step through it below.

FIG.04 — GLOBAL SEARCH · MAP → SCORE → FILTER → REDUCE

COMMUNITY SUMMARIES · helpfulness 0–100 per chunk

phase map

SUMMARIES MAPPED0

SCORE-0 DROPPED—

PACKED INTO ANSWER—

// phase 2 — global search 1. shuffle + pack summaries into chunks 2. map answer Q from each chunk alone + emit a helpfulness 0–100 3. filter drop score-0 partials 4. reduce sort by score, fill an 8k window, synthesize one final answer

Why shuffle? To spread relevant info across chunks instead of letting it concentrate in one and get crowded out. Every community gets to vote — that's why global search can answer "what are the themes" when top-k can't.

05 / 06

The hierarchy is the dial

Pick the zoom level for the question.

Leiden's levels are a resolution dial. C0 is a few coarse summaries — cheap, broad, loses detail. C3 is many fine ones — expensive, detailed. Slide it and watch the cost / breadth tradeoff. The surprise from the paper: even the coarse top level is a strong default.

FIG.05 — COMMUNITY LEVEL · summaries vs context cost

C0 coarse C3 leaf

LEVEL C0 — ROOT

COMMUNITIES AT LEVEL—

CONTEXT TOKENS (rel.)—

RESOLUTION—

The paper's headline result: intermediate and low-level summaries beat a vector-RAG baseline on comprehensiveness and diversity — and even the coarse root C0 still won ~72% comprehensiveness head-to-head, while using 97% fewer context tokens than feeding raw source text.

That's the lever: the cheap coarse summaries are a remarkably strong default. You only pay for the fine levels when the question actually needs the detail. Match the zoom to the question.

THE HONEST CONTROL

They also measured directness ("how concise and on-point") — and GraphRAG loses it to vector RAG, on purpose. Comprehensiveness and conciseness trade off; no method should win all four metrics. Reporting a metric you lose is a credibility signal.

06 / 06

When NOT to reach for it · and the zoo

The cost, the cousins, the caveats.

GraphRAG is not a free upgrade over vector RAG. The win is task-dependent, and the index bill is real. Here's the honest ledger and the family of variants that grew up to fix it.

Variant	Index cost	Key idea	Best for
GraphRAG	high (LLM extract + summarize)	entity graph + Leiden + community summaries	global sensemaking on a stable corpus
LazyGraphRAG	~vector RAG	defer all LLM use to query time; cheap NLP graph	comparable quality, >700× lower query cost
DRIFT search	= GraphRAG	blend local + community info	questions between local & global
Local search	= GraphRAG	walk entity neighbourhood (graph-guided RAG)	specific, entity-anchored questions

COST IS THE DEALBREAKER

hours per million tokens

Multi-pass extraction + gleaning + per-community summaries is a big LLM bill and ~hours of wall-clock. Microsoft's own LazyGraphRAG defers it — a tacit admission the original is too expensive for most use.

COVERAGE GAP

only as good as extraction

Answers are bounded by what got extracted. NLP graph-building has ~15–20% error rates; if a fact never made it into the graph, retrieval can't find it. GraphRAG can underperform plain RAG on precise QA.

JUDGE THE EVAL

LLM-judged, no gold

Win rates come from an LLM judge comparing answers to LLM-synthesized questions with no ground-truth. Read "72–83%" as relative preference, not measured correctness.

HOW IT CONNECTS TO OPENALICE

Atlas already builds GraphRAG's first half, for code. Graphify extracts an entity/relation graph (symbols, files, callers/callees, imports) with deterministic AST parsing — steps 1–3 of indexing, already done and cron-fresh, skipping GraphRAG's biggest cost and biggest error source.

The missing pieces are Leiden communities + community summaries over the org graph. Add them and you get global search over the codebase: "what are the architectural themes across the 60 repos," "what does the auth subsystem do as a whole" — org-level sensemaking that today needs a human reading many files.

WHEN TO ACTUALLY USE IT

Reach for GraphRAG when many broad questions hit one stable corpus and you can amortize the index. For one-off queries on fresh data, or precise "what's value X" lookups, plain vector RAG is often as good and far cheaper.

Cost lesson, applied: the LazyGraphRAG stance fits us — keep the cheap deterministic AST graph, compute community reports lazily / incrementally instead of re-summarizing 60 repos on every cron tick.

04 · 03 — you made it

You queried
the graph.

Local vs global. The entity-graph index and the chunk/gleaning tradeoff. Leiden communities and bottom-up summaries. Map-reduce global search, the zoom dial, and the honest cost ledger. You now know why "retrieve over the graph" answers the question top-k never could — and when not to bother.

04·01 Embeddings & vector RAG · nearest-neighbour retrieval over chunks ✓ done

04·03 GraphRAG · graph + community summaries · global sensemaking ✓ complete

04·04 Agent memory systems · graph + summary recall over long histories next

04·05 Model routing · send broad asks to graph-global, precise asks to vector locked

Next · related path

Mixture-of-Experts →

The same "many partial views, one router that picks" pattern, but inside the network — sparse experts and a tiny gate.

→

↑ Read it again Replay map-reduce

← Prev · Embeddings · The path

openalicelabs