Knowledge Graph
File Palaces builds a lightweight temporal knowledge graph alongside the vector index. This graph lets the LLM answer time-scoped questions that would be difficult to answer from chunk retrieval alone — questions like "what changed between March and June?" or "list everything related to Project Atlas".
What is stored
During mining, File Palaces's NLP pipeline extracts named entities and relationships from each text chunk:
| Entity type | Examples |
|---|---|
| Person | Alice Chen, Bob Martinez |
| Organisation | Acme Corp, Engineering Team |
| Date / time | 2024-03-15, Q2 2024, last Tuesday |
| Location | London office, AWS us-east-1 |
| Concept | budget approval, deployment pipeline |
Extracted entities and their co-occurrence relationships are stored as a graph in a local SQLite database (alongside the ChromaDB palace). Dates are normalised to ISO-8601 when possible, enabling temporal range queries.
Graph structure
[Entity A] ──(relationship)──► [Entity B]
metadata:
- source Drawer ID
- Wing / Room
- date (if present)
- confidence score
Relationships are labelled (e.g. mentioned_with, reports_to, approved, created) and carry a reference back to the source Drawer so the LLM can retrieve the original context.
How queries use the graph
When you ask a question, the query planner checks whether it contains temporal or entity-scoped signals:
- Date ranges: "what happened in Q3" → graph query filters relationships by date range
- Entity focus: "everything about Alice" → graph query retrieves all entities co-occurring with
Alice Chen - Change detection: "what changed between versions" → diff between entity states at two time points
The graph query returns relevant Drawer IDs which are merged with the hybrid search results before being passed to the LLM.
The knowledge graph is a supplementary retrieval layer — it does not replace vector search. For most queries, hybrid search dominates. The graph contributes most on temporal and entity-centric questions.
Limitations
- Extraction quality depends on the NLP model (currently spaCy
en_core_web_sm). Domain-specific jargon may not be recognised. - Dates expressed informally ("next quarter", "a few weeks ago") may not normalise correctly relative to file creation date.
- The graph is rebuilt from scratch on a full re-mine. Incremental updates are handled by the watchdog.
Future improvements
- Swap spaCy for a fine-tuned NER model for higher entity extraction accuracy
- Add cross-document entity resolution (de-duplication of aliases like "Alice" and "Alice Chen")
- Expose a graph visualisation panel in the Palace Map view
- Support user-defined relationship types and entity tags