OpenAlice Academy — 03 · 07 / Graphify

01 / 06

The shape of the thing · nodes & edges

Two artifacts: nodes and edges.

The whole graph is just a list of nodes (the things) and edges (the relationships between them). The pipeline is a flat, stateless chain — each stage hands plain data to the next. Hover the stages to see what each one adds.

FIG.01 — THE STATELESS STAGE CHAIN

// node — a thing in the codebase { id: "auth.charge_card", label: "charge_card", source_file: "auth.py", source_location: "L42" } // edge — a relationship { source: "billing.checkout", target: "auth.charge_card", relation: "calls", confidence: "EXTRACTED" }

That's it. Everything else — the interactive viz, the Neo4j export, the MCP server a team queries over HTTP — is downstream of this minimal schema. The one field that makes Graphify honest is confidence, which rides on every edge.

02 / 06

The crucial distinction · what tree-sitter sees

A syntax tree knows "there's a call."

tree-sitter is a rule-based parser. It does not resolve names, types, or build a symbol table — it captures structure exactly as written. So it can tell you "there is a call expression whose callee is the identifier foo at line 42" — but never which foo. Click a line and watch its syntax tree light up.

FIG.02 — CLICK A LINE · ITS SYNTAX TREE

SOURCE · billing.py — click any line

tree-sitter parse · concrete syntax tree

Notice line 4: the tree records a call_expression with callee charge_card — but the import on line 1 means it might be the imported one, or a local shadow. The tree can't say. That unresolved question is exactly where the next stage earns the INFERRED tag.

03 / 06

The call-graph second pass · edge by edge

Resolve each call site to a definition.

First pass collects the easy facts — imports, same-scope calls — as EXTRACTED edges. The second pass walks every call site and tries to connect it to a definition by name-matching. Step through the code and watch each edge get drawn — and tagged.

FIG.03 — SOURCE · stepping a call site at a time

edge 0 / 0

EXTRACTED — explicit (import / direct call) INFERRED — name-matched heuristic AMBIGUOUS — flagged for a human

live call graph · nodes + resolved edges

The second pass is a heuristic, not a proof. A unique name match → INFERRED. A method call on an object whose type is unknown, or a name with two matching definitions → AMBIGUOUS, listed in GRAPH_REPORT.md for a human.

04 / 06

Confidence as a first-class property · filter the graph

Every edge carries provenance.

Instead of pretending the graph is ground truth, each edge says how it was found. That lets a consumer filter by confidence — a "prove it" view shows only EXTRACTED; turn on INFERRED for recall. Toggle the tiers and watch a real graph thin out and fill in.

FIG.04 — LIVE GRAPH · toggle which confidence tiers are shown

EXTRACTED INFERRED AMBIGUOUS

NODES—

EXTRACTED edges (shown)—

INFERRED edges (shown)—

AMBIGUOUS edges (shown)—

RESOLVED SHARE—

Drop INFERRED + AMBIGUOUS and you're left with the provable skeleton — imports and unmissable same-scope calls. Add them back and the graph fills in, but now some edges are candidates. Same graph, two honesty settings.

The non-code layer (PDFs, images, video) has no AST, so an LLM extracts its edges — those are INFERRED by construction. The deterministic and probabilistic layers never get confused, because the tag travels on every edge.

Tier	Means	Comes from	How to read it
EXTRACTED	explicitly in the source	an import statement, a direct call	trust it — it's a parse fact
INFERRED	a reasonable deduction	call-graph 2nd pass, context co-occurrence, the LLM media layer	treat as a candidate; grep to confirm
AMBIGUOUS	uncertain	multiple matches, dynamic dispatch, unknown type	flagged for human review

05 / 06

Honest caveats · this is fundamental, not a bug

Why static call-graphs are candidate-grade.

Resolving a call site to the definition is a hard static-analysis problem — and in dynamic languages, often impossible from syntax alone. The literature is blunt about it. Step through the cases a name-match pass simply can't crack.

DUCK TYPING

obj.method()

Which class's .save()? Without knowing obj's type, syntax can't say. Becomes AMBIGUOUS or is missed.

DYNAMIC DISPATCH

getattr · reflection

Function pointers, getattr(x, name)(), monkeypatching, dependency injection — the callee isn't even a literal name in the source.

GHOST DUPLICATES

node identity

The same entity can appear as two nodes ("ghost duplicates"), auto-merged at build time — so even node dedup is heuristic.

FIG.05 — THE PRECISION / RECALL FRONTIER · what tools actually achieve

PyCG, the precision-leading academic Python tool, hits ~99% precision but only ~70% recall — and runs out of memory or time on programs past ~2,000 lines. JARVIS trades heavyweight flow-sensitive type inference for more recall.

A tree-sitter name-match pass is far simpler than PyCG. So its resolved edges are best read as candidates — which is precisely why Graphify tags them INFERRED, not EXTRACTED. Honesty over false precision.

06 / 06

In production · the Atlas connection

This is the step Atlas borrowed.

Atlas — OpenAlice's cron-refreshed code & knowledge graph — is exactly this idea at org scale. Its call-graph hook is, verbatim, the plan to "steal Graphify's AST call-graph". The mapping is one-to-one.

OUR CANDIDATE-GRADE = THEIR "INFERRED"

Atlas resolves only ~15–19% of call edges; the rest are external/ambiguous. That isn't a defect to hide — it's the same reality PyCG's 70% recall and Graphify's three-tier model describe. The standing rule: filter call-graph queries by confidence, then grep to confirm the gaps.

The determinism boundary matches too: Atlas's symbol / import index is the EXTRACTED layer (exact, from parsing); semantic search (pgvector) is the INFERRED / concept layer — the same split Graphify draws between tree-sitter and its LLM pass.

THE GREP→GRAPH HOOK

Because resolution is candidate-grade, both tools converge on the same advice: query the graph first, fall back to grep for the gaps. A query returns EXTRACTED edges instantly and INFERRED edges as leads — and where the graph is silent, you grep. The graph doesn't replace search; it tells you where search is and isn't needed.

The upgrade path, if you ever harden resolution beyond name-matching, is the PyCG / JARVIS direction: per-function flow-sensitive type inference. It buys recall at a steep scaling cost — worth knowing before you promise anyone a "full call graph."

03 · 07 — you made it

You built
a code graph.

Parse to a syntax tree, walk it into nodes and edges, resolve the call sites you can, and tag every edge by how sure you are. You saw the second pass run live, watched the confidence tiers filter a real graph, and learned why "static call-graph" and "candidate-grade" mean the same thing. You now read a code graph the way Atlas does.

03·05 MemPalace · a structured, navigable long-term memory ✓ done

03·06 Model Routing · send each query to the cheapest model that can answer ✓ done

03·07 Graphify · source → AST → call-graph · the confidence tiers ✓ complete

03·08 The LLM-maintained Wiki · a living knowledge base an LLM keeps fresh next

Next · 03 · 08

The LLM-maintained Wiki →

A graph is structured memory about code. Now let an LLM keep a whole knowledge base fresh — the same provenance discipline, applied to prose.

→

↑ Read it again Replay the build

← Prev · GraphRAG · The path

openalicelabs