From RAG to GraphRAG: Why Context Structure Matters

What broke when we used vector search for contract analysis, and what we replaced it with.

-

60-Second Summary

Vector search finds semantically similar text. That is necessary but not sufficient for enterprise knowledge retrieval. When a user asks “Which obligations in the SOW are affected by Amendment 2?”, vector search returns relevant-looking chunks. But it cannot traverse the relationship between documents. It does not know that Amendment 2 modifies Section 4.3 of the MSA which is referenced by the SOW. GraphRAG adds structure: entities, relationships, and traversal. We use a hybrid approach — vector for semantic matching, graph for structural reasoning — and this is what we have learned about when each helps and when each hurts.

Where Vector Search Fails

We started with a standard RAG pipeline for contract analysis: chunk documents, embed them, retrieve by similarity, pass to LLM. It worked for simple questions — “What is the payment term?” retrieves the right clause and the model extracts the answer.

It broke on questions that require structural reasoning:

  • “Which obligations changed after Amendment 2?” — Vector search cannot traverse the amendment-to-clause relationship. It finds chunks that mention amendments and chunks that mention obligations, but it does not know which amendment modified which obligation.
  • “Are there conflicting liability terms across our contract set?” — This requires cross-document comparison. Vector search retrieves similar chunks, but “similar” and “conflicting” are different relationships.
  • “What is the effective indemnification cap considering all amendments?” — This requires following a chain: original cap → Amendment 1 modification → Amendment 2 override → side letter exception. Vector search gives you fragments. Not the chain.

Here is the pattern: vector search handles “find me content about X” well. It fails at “show me the relationship between X and Y” and “trace the chain from X to Z through Y.”

What GraphRAG Adds

GraphRAG builds a knowledge graph over your documents: entities (clauses, parties, obligations, dates, amounts) become nodes, and relationships (modifies, supersedes, references, constrains) become edges. When a query arrives, you do two things in parallel: vector search for semantic relevance, and graph traversal for structural context. Then you merge and rerank.

What This Looks Like in Practice

For our contract intelligence system, the graph contains:

  • Document nodes: MSA, SOW, Amendments, Side Letters — each with metadata (date, parties, status)
  • Clause nodes: Individual clauses extracted and classified (indemnification, liability, payment, termination)
  • Relationship edges: “Amendment 2 modifies MSA Section 4.3”, “Side Letter overrides Amendment 1 Section 2”, “SOW references MSA Section 7”
  • Entity nodes: Parties, dates, amounts, obligations — linked to their source clauses

When someone asks “What is the current liability cap?”, the graph traversal finds the original cap in the MSA, follows modification edges through amendments, and returns the chain. The vector search simultaneously retrieves relevant text. The LLM receives both: the structured chain for accuracy, the raw text for grounding.

Hybrid Retrieval Architecture

When Graph Helps, When It Hurts

Use CaseVector OnlyGraph HelpsWhy
Simple factual lookupSufficientOverkillOne chunk answers the question. No traversal needed.
Multi-hop questionsFailsEssentialAnswer requires following relationships across documents.
Cross-document comparisonPartialStrongGraph connects related clauses across documents explicitly.
Rapidly changing corpusGoodExpensiveGraph extraction and updates add latency to ingestion.
Small corpus (<100 docs)SufficientOver-engineeringVector search coverage is good enough at small scale.
Compliance / audit trailWeakStrongGraph provides citation chains: which clause, which version, which modification.

Our honest assessment: if your queries are all simple lookups and your corpus is under 200 documents, vector search with good chunking and metadata filtering is probably enough. Graph adds value when you have relational questions, cross-document dependencies, or compliance requirements that demand citation chains.

Where Retrieval Fails

Two Failures That Shaped Our Approach

Failure 1: The entity extraction cascade. Our initial graph was built with aggressive entity extraction. Every noun phrase became a node. The graph was huge, noisy, and slow. Queries that should have returned 3 relevant clauses were returning 40+ nodes. The model drowned in context. The fix: we restricted entity extraction to a curated taxonomy — document types, clause types, party names, dates, monetary amounts, and obligation types. Everything else is handled by vector search. Less graph, better graph.

Failure 2: The amendment that did not link. A master agreement was modified by three amendments. Our entity extraction correctly identified all four documents. But it missed the relationship between Amendment 3 and the specific clause it modified, because the amendment text said “Section 4.3 is hereby replaced” without repeating the original clause text. The embedding similarity between the amendment and the original clause was low. Our fix: we added explicit document-structure parsing that detects modification language (“is hereby replaced,” “is amended to read,” “notwithstanding Section X”) and creates graph edges from these patterns, not just from embedding similarity.

Decisions and Trade-offs

DecisionWhat We ChoseWhat We Gave Up
Graph scopeCurated taxonomy (limited entity types)Coverage of edge-case entities. But noise dropped 80% and query latency dropped 60%.
Retrieval strategyHybrid: vector + graph + rerankSimplicity. Two retrieval paths to maintain, merge, and test. But accuracy on relational queries went from ~55% to ~85%.
Graph update strategyRe-extract on document change, batch nightly full refreshReal-time freshness. A document change takes 5-10 minutes to reflect in the graph. Acceptable for our use case (contracts change infrequently).
ChunkingSection-aware: respect document structure, keep clause boundaries intactUniform chunk sizes. Some chunks are 50 tokens, some are 500. But clause boundaries are never broken.

Your Retrieval Architecture Checklist

  • Categorize your queries: what percentage require relational reasoning vs simple lookup? If under 20%, vector-only may be enough.
  • If using a graph, restrict entity extraction to a curated taxonomy. More entities is not better.
  • Chunk at semantic boundaries (section, clause, paragraph), not arbitrary token counts
  • Set retrieval score thresholds: below threshold means “I don’t know”, not “here’s my best guess”
  • Test retrieval independently from generation: does the right content reach the model?
  • Monitor retrieval score distributions over time — drift indicates corpus or query distribution changes
  • Measure graph freshness: how long after a document change does the graph reflect it?

Key Takeaways

  • Vector search handles “find content about X.” It fails at “show me the relationship between X and Y.” Know which queries you have.
  • GraphRAG is not a universal upgrade. It adds value for multi-hop queries, cross-document reasoning, and compliance trails. For simple lookups, it is over-engineering.
  • Hybrid retrieval (vector + graph + rerank) is the practical pattern for enterprise knowledge. Neither alone is sufficient.
  • Entity extraction quality is the bottleneck. A noisy graph is worse than no graph. Curate your taxonomy.
  • Section-aware chunking that respects document structure is more important than any embedding model upgrade.

References

  1. Microsoft GraphRAG — Graph-based retrieval augmented generation
  2. GraphRAG Paper (Microsoft Research) — Original research paper
  3. Neo4j — Knowledge Graphs + LLMs
  4. LlamaIndex — Framework we evaluated for hybrid retrieval
  5. Pinecone — Chunking Strategies
  6. Weaviate — Hybrid Search Explained
  7. Jason Liu — When RAG Isn’t Enough

CONTENTS

Latest reads

From RAG to GraphRAG: Why Context Structure Matters

Where Vector Search Fails We started with a standard RAG pipeline for contract analysis: chunk…

Voice AI is Not About Speech. It is About Latency and Turn-Taking.

The Real Problem: One Second Human conversation has a natural turn-taking rhythm. Research shows the…

Reliability is the New Credibility in AI Systems

The Demo That Went Too Well We once ran a contract analysis demo that went…

Sign up for more like this