From RAG to GraphRAG: Why Context Structure Matters

What broke when we used vector search for contract analysis, and what we replaced it with.

Written by : gaurav

February 24, 2026

60-Second Summary

Vector search finds semantically similar text. That is necessary but not sufficient for enterprise knowledge retrieval. When a user asks “Which obligations in the SOW are affected by Amendment 2?”, vector search returns relevant-looking chunks. But it cannot traverse the relationship between documents. It does not know that Amendment 2 modifies Section 4.3 of the MSA which is referenced by the SOW. GraphRAG adds structure: entities, relationships, and traversal. We use a hybrid approach — vector for semantic matching, graph for structural reasoning — and this is what we have learned about when each helps and when each hurts.

Where Vector Search Fails

We started with a standard RAG pipeline for contract analysis: chunk documents, embed them, retrieve by similarity, pass to LLM. It worked for simple questions — “What is the payment term?” retrieves the right clause and the model extracts the answer.

It broke on questions that require structural reasoning:

“Which obligations changed after Amendment 2?” — Vector search cannot traverse the amendment-to-clause relationship. It finds chunks that mention amendments and chunks that mention obligations, but it does not know which amendment modified which obligation.
“Are there conflicting liability terms across our contract set?” — This requires cross-document comparison. Vector search retrieves similar chunks, but “similar” and “conflicting” are different relationships.
“What is the effective indemnification cap considering all amendments?” — This requires following a chain: original cap → Amendment 1 modification → Amendment 2 override → side letter exception. Vector search gives you fragments. Not the chain.

Here is the pattern: vector search handles “find me content about X” well. It fails at “show me the relationship between X and Y” and “trace the chain from X to Z through Y.”

What GraphRAG Adds

GraphRAG builds a knowledge graph over your documents: entities (clauses, parties, obligations, dates, amounts) become nodes, and relationships (modifies, supersedes, references, constrains) become edges. When a query arrives, you do two things in parallel: vector search for semantic relevance, and graph traversal for structural context. Then you merge and rerank.

What This Looks Like in Practice

For our contract intelligence system, the graph contains:

Document nodes: MSA, SOW, Amendments, Side Letters — each with metadata (date, parties, status)
Clause nodes: Individual clauses extracted and classified (indemnification, liability, payment, termination)
Relationship edges: “Amendment 2 modifies MSA Section 4.3”, “Side Letter overrides Amendment 1 Section 2”, “SOW references MSA Section 7”
Entity nodes: Parties, dates, amounts, obligations — linked to their source clauses

When someone asks “What is the current liability cap?”, the graph traversal finds the original cap in the MSA, follows modification edges through amendments, and returns the chain. The vector search simultaneously retrieves relevant text. The LLM receives both: the structured chain for accuracy, the raw text for grounding.

Hybrid Retrieval Architecture

When Graph Helps, When It Hurts

Use Case	Vector Only	Graph Helps	Why
Simple factual lookup	Sufficient	Overkill	One chunk answers the question. No traversal needed.
Multi-hop questions	Fails	Essential	Answer requires following relationships across documents.
Cross-document comparison	Partial	Strong	Graph connects related clauses across documents explicitly.
Rapidly changing corpus	Good	Expensive	Graph extraction and updates add latency to ingestion.
Small corpus (<100 docs)	Sufficient	Over-engineering	Vector search coverage is good enough at small scale.
Compliance / audit trail	Weak	Strong	Graph provides citation chains: which clause, which version, which modification.

Our honest assessment: if your queries are all simple lookups and your corpus is under 200 documents, vector search with good chunking and metadata filtering is probably enough. Graph adds value when you have relational questions, cross-document dependencies, or compliance requirements that demand citation chains.

Where Retrieval Fails

Two Failures That Shaped Our Approach

Failure 1: The entity extraction cascade. Our initial graph was built with aggressive entity extraction. Every noun phrase became a node. The graph was huge, noisy, and slow. Queries that should have returned 3 relevant clauses were returning 40+ nodes. The model drowned in context. The fix: we restricted entity extraction to a curated taxonomy — document types, clause types, party names, dates, monetary amounts, and obligation types. Everything else is handled by vector search. Less graph, better graph.

Failure 2: The amendment that did not link. A master agreement was modified by three amendments. Our entity extraction correctly identified all four documents. But it missed the relationship between Amendment 3 and the specific clause it modified, because the amendment text said “Section 4.3 is hereby replaced” without repeating the original clause text. The embedding similarity between the amendment and the original clause was low. Our fix: we added explicit document-structure parsing that detects modification language (“is hereby replaced,” “is amended to read,” “notwithstanding Section X”) and creates graph edges from these patterns, not just from embedding similarity.

Decisions and Trade-offs

Decision	What We Chose	What We Gave Up
Graph scope	Curated taxonomy (limited entity types)	Coverage of edge-case entities. But noise dropped 80% and query latency dropped 60%.
Retrieval strategy	Hybrid: vector + graph + rerank	Simplicity. Two retrieval paths to maintain, merge, and test. But accuracy on relational queries went from ~55% to ~85%.
Graph update strategy	Re-extract on document change, batch nightly full refresh	Real-time freshness. A document change takes 5-10 minutes to reflect in the graph. Acceptable for our use case (contracts change infrequently).
Chunking	Section-aware: respect document structure, keep clause boundaries intact	Uniform chunk sizes. Some chunks are 50 tokens, some are 500. But clause boundaries are never broken.

Your Retrieval Architecture Checklist

Categorize your queries: what percentage require relational reasoning vs simple lookup? If under 20%, vector-only may be enough.
If using a graph, restrict entity extraction to a curated taxonomy. More entities is not better.
Chunk at semantic boundaries (section, clause, paragraph), not arbitrary token counts
Set retrieval score thresholds: below threshold means “I don’t know”, not “here’s my best guess”
Test retrieval independently from generation: does the right content reach the model?
Monitor retrieval score distributions over time — drift indicates corpus or query distribution changes
Measure graph freshness: how long after a document change does the graph reflect it?

Key Takeaways

Vector search handles “find content about X.” It fails at “show me the relationship between X and Y.” Know which queries you have.
GraphRAG is not a universal upgrade. It adds value for multi-hop queries, cross-document reasoning, and compliance trails. For simple lookups, it is over-engineering.
Hybrid retrieval (vector + graph + rerank) is the practical pattern for enterprise knowledge. Neither alone is sufficient.
Entity extraction quality is the bottleneck. A noisy graph is worse than no graph. Curate your taxonomy.
Section-aware chunking that respects document structure is more important than any embedding model upgrade.

References

Microsoft GraphRAG — Graph-based retrieval augmented generation
GraphRAG Paper (Microsoft Research) — Original research paper
Neo4j — Knowledge Graphs + LLMs
LlamaIndex — Framework we evaluated for hybrid retrieval
Pinecone — Chunking Strategies
Weaviate — Hybrid Search Explained
Jason Liu — When RAG Isn’t Enough

Latest reads

Blog