Hybrid Search

File Palaces does not rely on vector search alone. Every query runs through a hybrid retrieval pipeline that combines sparse keyword matching with dense semantic search, then fuses the results using Reciprocal Rank Fusion. This produces more robust results than either method alone.

Why hybrid?

Method	Strengths	Weaknesses
Vector (dense)	Understands meaning, handles paraphrasing, works across languages	Struggles with exact terms, numbers, proper nouns, rare words
BM25 (sparse)	Exact keyword match, works great for named entities and jargon	No semantic understanding, sensitive to phrasing
Hybrid	Gets both	Slightly more compute

A query like "what did the Q3 2023 report say about EBITDA margins?" benefits enormously from hybrid search: the vector component handles semantic similarity to financial content, while BM25 ensures exact matches on Q3 2023 and EBITDA are ranked highly.

Pipeline

Query
  │
  ├─── BM25 index ──► sparse scores
  │
  ├─── sentence-transformer ──► query embedding ──► ChromaDB ──► dense scores
  │
  └─── Reciprocal Rank Fusion ──► merged ranked list ──► Top-K chunks ──► LLM

1. BM25 (sparse)

File Palaces maintains an in-memory BM25 index alongside ChromaDB. BM25 is the same ranking function used by Elasticsearch and most search engines. It scores documents based on term frequency and inverse document frequency with saturation and length normalisation.

The BM25 index is rebuilt on sidecar startup from the current palace contents and updated incrementally as new Drawers are added.

2. Dense vector search (ChromaDB)

The query is embedded with the same local sentence-transformer model used during mining (default: all-MiniLM-L6-v2). ChromaDB retrieves the nearest neighbours from its HNSW index using cosine similarity.

Both retrieval steps run independently and can return different (overlapping) result sets.

3. Reciprocal Rank Fusion (RRF)

RRF is a robust rank merging algorithm that does not require score normalisation between the two lists. For each candidate chunk, its RRF score is computed as the sum of 1 / (k + rank) across each ranker, where k = 60 is a smoothing constant. The final ranked list is sorted by descending RRF score. The top-K entries are passed to the LLM as context.

RRF is preferred over simple score averaging because BM25 and cosine similarity operate on incompatible scales.

Re-ranking (optional)

Future versions of File Palaces will optionally run a cross-encoder re-ranker over the RRF top-K before passing results to the LLM. Cross-encoders evaluate query–chunk pairs jointly and produce highly accurate relevance scores, but they are slower than bi-encoder vector search.

Tuning search behaviour

Setting	Effect
Top-K	Number of chunks after RRF. Higher = more context. Default: `8`.
Similarity threshold	Minimum cosine similarity from the vector stage. Chunks below the threshold are excluded before RRF. Default: `0.3`.

TIP

For documents with lots of technical jargon or product names, BM25 does a lot of the heavy lifting. For conceptual, conversational queries ("summarise the key themes"), the vector component dominates.