File Palaces iconFilePalaces

Knowledge Graph

File Palaces builds a lightweight temporal knowledge graph alongside the vector index. This graph lets the LLM answer time-scoped questions that would be difficult to answer from chunk retrieval alone — questions like "what changed between March and June?" or "list everything related to Project Atlas".

What is stored

During mining, File Palaces's NLP pipeline extracts named entities and relationships from each text chunk:

Entity typeExamples
PersonAlice Chen, Bob Martinez
OrganisationAcme Corp, Engineering Team
Date / time2024-03-15, Q2 2024, last Tuesday
LocationLondon office, AWS us-east-1
Conceptbudget approval, deployment pipeline

Extracted entities and their co-occurrence relationships are stored as a graph in a local SQLite database (alongside the ChromaDB palace). Dates are normalised to ISO-8601 when possible, enabling temporal range queries.

Graph structure

[Entity A] ──(relationship)──► [Entity B]
              metadata:
              - source Drawer ID
              - Wing / Room
              - date (if present)
              - confidence score

Relationships are labelled (e.g. mentioned_with, reports_to, approved, created) and carry a reference back to the source Drawer so the LLM can retrieve the original context.

How queries use the graph

When you ask a question, the query planner checks whether it contains temporal or entity-scoped signals:

  • Date ranges: "what happened in Q3" → graph query filters relationships by date range
  • Entity focus: "everything about Alice" → graph query retrieves all entities co-occurring with Alice Chen
  • Change detection: "what changed between versions" → diff between entity states at two time points

The graph query returns relevant Drawer IDs which are merged with the hybrid search results before being passed to the LLM.

NOTE

The knowledge graph is a supplementary retrieval layer — it does not replace vector search. For most queries, hybrid search dominates. The graph contributes most on temporal and entity-centric questions.

Limitations

  • Extraction quality depends on the NLP model (currently spaCy en_core_web_sm). Domain-specific jargon may not be recognised.
  • Dates expressed informally ("next quarter", "a few weeks ago") may not normalise correctly relative to file creation date.
  • The graph is rebuilt from scratch on a full re-mine. Incremental updates are handled by the watchdog.

Future improvements

  • Swap spaCy for a fine-tuned NER model for higher entity extraction accuracy
  • Add cross-document entity resolution (de-duplication of aliases like "Alice" and "Alice Chen")
  • Expose a graph visualisation panel in the Palace Map view
  • Support user-defined relationship types and entity tags