Beyond Vector Search: Building Production Knowledge Graphs with LLMs
Vector embeddings are hitting a wall. Learn how to build a robust, queryable knowledge graph from unstructured text using LLMs, Pydantic, and Graph databases for true multi-hop reasoning.

Last month, I watched a $10,000/month RAG pipeline fail because it couldn't answer a simple question: 'How did the CEO's 2023 strategy shift affect the Q1 2024 supply chain logistics?' The data was all there in the vector store, but the semantic relationship was buried across twelve different PDF fragments. If your system can't connect a legal clause on page 50 to a financial risk on page 200, you don't need a better embedding model; you need a Knowledge Graph (KG).
In 2026, the industry has finally moved past 'naive RAG.' We've realized that while vector databases are excellent for finding similar text, they are fundamentally incapable of structural reasoning or multi-hop traversal. Knowledge Graphs provide the deterministic scaffolding that LLMs need to navigate complex domains. By converting unstructured text into a series of directed triples—(Subject, Predicate, Object)—we create a machine-readable map of human knowledge. This shift from 'searching' to 'traversing' is what separates a toy demo from a production-grade AI system.
The Extraction Pipeline: From Text to Triples
The first mistake engineers make is asking an LLM to 'extract everything important.' You'll end up with a mess of inconsistent nodes. To build a production-grade graph, you must use structured extraction with a strict schema. I use the instructor library (v2.4.0) paired with Pydantic to force the LLM into a specific output format.
We don't just want nodes; we want typed entities and defined relationships. Here is the pattern I use for building a financial intelligence graph.
import instructor
from pydantic import BaseModel, Field
from typing import List, Literal
from openai import OpenAI
client = instructor.patch(OpenAI())
class Node(BaseModel):
id: str = Field(..., description="Unique identifier like 'Apple_Inc' or 'Tim_Cook'")
label: Literal["ORGANIZATION", "PERSON", "STRATEGY", "EVENT"]
properties: dict = Field(default_factory=dict)
class Edge(BaseModel):
source: str
target: str
relationship: str = Field(..., description="The action or link, e.g., 'IMPLEMENTED', 'ACQUIRED'")
context: str = Field(..., description="Evidence from the text")
class KnowledgeGraph(BaseModel):
nodes: List[Node]
edges: List[Edge]
def extract_graph(text: str) -> KnowledgeGraph:
return client.chat.completions.create(
model="gpt-5-turbo", # Using the 2026 frontier model
response_model=KnowledgeGraph,
messages=[{"role": "user", "content": f"Extract entities and relations from: {text}"}]
)
Example usage
raw_data = "In late 2023, Apple Inc. pivoted to Project Titan's AI focus, led by Kevin Lynch." graph_output = extract_graph(raw_data)
The Entity Resolution Problem
This is where the 'happy path' ends. If you process 1,000 documents, the LLM will extract 'Apple', 'Apple Inc.', and 'Apple, Inc.' as three different nodes. Your graph will be fragmented.
To solve this, we implement a two-step resolution process. First, we use a fast vector search (HNSW) to find candidate nodes for a new extraction. Then, we use a smaller, cheaper LLM (like Llama 4 8B) to decide if 'Entity A' and 'Entity B' are the same. Never trust the LLM to generate unique IDs on its own; always provide a reference list of existing IDs in the prompt context if possible.
Schema-First vs. Schema-Less Extraction
I’ve tried both. Schema-less extraction (letting the LLM define predicates) leads to 'predicate explosion.' You'll end up with relationships like is_ceo_of, serves_as_ceo, and leads_company. This makes Cypher queries impossible to write.
In production, I enforce a Strongly Typed Ontology. Define your predicates upfront. If the LLM finds a relationship that doesn't fit, it must map it to the closest existing type or flag it for human review. This ensures that when you query MATCH (p:Person)-[:WORKS_AT]->(o:Org), you actually get all the results.
Storage and Querying with Neo4j
Once you have your triples, you need to sink them into a graph database. Neo4j remains the gold standard for this. When inserting, use MERGE instead of CREATE to handle deduplication at the database level.
from neo4j import GraphDatabase
class GraphStore:
def __init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def upsert_graph(self, kg: KnowledgeGraph):
with self.driver.session() as session:
# Upsert Nodes
for node in kg.nodes:
session.run("""
MERGE (n:Entity {id: $id})
SET n.label = $label, n += $props
""", id=node.id, label=node.label, props=node.properties)
# Upsert Edges
for edge in kg.edges:
session.run(f"""
MATCH (a:Entity {{id: $source}})
MATCH (b:Entity {{id: $target}})
MERGE (a)-[r:{edge.relationship.upper()}]->(b)
SET r.context = $context
""", source=edge.source, target=edge.target, context=edge.context)
Pro-tip: Sanitize relationship strings to be valid Cypher types (UPPER_SNAKE_CASE)
Gotchas: What the Docs Don't Tell You
- The Context Window Trap: If you feed a 100k token document into an LLM and ask for a graph, it will miss 80% of the relationships in the middle. You must chunk the text (2k-4k tokens), extract local graphs, and then merge them. Use overlapping chunks to ensure relationships crossing the boundaries are captured.
- Graph Density: Too many edges are as bad as too few. If every node is connected to a 'Date' node, that node becomes a 'supernode' that slows down traversals and adds zero semantic value. Filter out high-cardinality, low-value connections.
- Cost: Extraction is expensive. For a 1-million-page corpus, GPT-5 costs will destroy your budget. Use a 'distilled' model for extraction. I’ve had great success fine-tuning a Llama-3-70B specifically on our domain's ontology to achieve 95% of GPT-5's accuracy at 1/10th the cost.
Takeaway
Don't build another basic RAG app. Start by defining a small ontology of 5 entity types and 10 relationship types relevant to your business. Use the instructor pattern above to extract a mini-graph from your most valuable 100 documents. Once you see a multi-hop query actually work, you'll never go back to pure vector search.