File Palaces iconFilePalaces

File Formats

File Palaces supports a wide range of file types. Text is extracted automatically during mining — you don't need to convert files before adding them to a Wing.

Supported formats

FormatExtensionsLibraryNotes
PDF.pdfpypdfText-layer PDFs only. Scanned PDFs require OCR (not yet built-in).
Word.docxpython-docxFull text + tables extracted. Headers and footers included.
Excel.xlsxopenpyxlEach sheet is extracted separately as a Room.
Excel (legacy).xlsxlrdOlder binary format. Read-only extraction.
CSV.csvstdlibEach row treated as a document unit.
Plain text.txtstdlibUTF-8 and common encodings auto-detected.
Markdown.md, .mdxstdlibMarkdown syntax preserved in extracted text.
Audio.mp3, .wav, .ogg, .flac, .m4aOpenAI WhisperTranscribed to text locally using the Whisper model.
Video.mp4, .mov, .mkvOpenAI WhisperAudio track extracted, then transcribed.
Email.eml, .msgstdlib / extract-msgSubject, sender, date, and body indexed.
ZIP archive.zipstdlibContents extracted and each file processed individually.
Web URLURL inputhttpx + BeautifulSoupPage text scraped and indexed. JavaScript-heavy pages may be incomplete.
WHISPER TRANSCRIPTION

Audio and video transcription uses OpenAI Whisper running locally — no API key required and no audio leaves your machine. The default model is base (~74 MB). For better accuracy on noisy recordings, switch to small or medium in Settings → Transcription.

Transcription is significantly slower than text extraction: expect ~1× real-time for base on a modern CPU.

Planned formats

The following formats are on the roadmap but not yet supported:

FormatStatus
PowerPoint (.pptx)Planned — v0.6
EPUBPlanned
HTML (local files)Planned
RTFPlanned
Scanned PDF (OCR)Planned — requires Tesseract or a cloud OCR option
Notion exportPlanned

Unsupported files

Files with unsupported extensions are skipped silently during mining. They appear in the mining error log (if enabled) but do not block other files from being indexed.

Binary files (images, executables, compiled code) are always skipped.

Encoding detection

For plain-text files, File Palaces uses chardet to detect encoding before decoding. Files that cannot be decoded as any known encoding are skipped.

Large files

There is no hard file size limit, but very large files (>50 MB of text) can slow down mining significantly. Consider splitting large documents if mining speed is a concern.

The chunker caps individual chunks at 512 tokens, so even very long files are ingested correctly — they just produce more Drawers.