keep put

Add or update a document in the store.

Usage

Four input modes, auto-detected:

keep put "my note"                    # Text mode (inline content)
keep put file:///path/to/doc.pdf      # URI mode (fetch and index)
keep put https://example.com/page     # URI mode (web content)
keep put /path/to/folder/             # Directory mode (index all files)
keep put -                            # Stdin mode (explicit)
echo "piped content" | keep put       # Stdin mode (detected)

Directory mode indexes all regular files in the folder (non-recursive by default).

Options

OptionDescription
-t, --tag KEY=VALUETag as key=value (repeatable)
-i, --id IDCustom document ID (auto-generated for text/stdin)
--summary TEXTUser-provided summary (skips auto-summarization)
-r, --recurseRecurse into subdirectories (directory mode)
-x, --exclude PATTERNGlob pattern to exclude (repeatable, directory mode)
--watchSet up a daemon watch — re-index automatically on file changes
--unwatchRemove an existing watch
--interval DURATIONPolling interval for watches (ISO 8601 duration, e.g. PT5M)
-f, --forceForce re-index even if content is unchanged

Directory mode

Index a folder of files. By default non-recursive; use -r to include subdirectories:

keep put ./docs/                           # All files in docs/ (flat)
keep put ./docs/ -r                        # All files in docs/ (recursive)
keep put ./src/ -r -x "*.pyc" -x "__pycache__"  # Recursive with excludes

Excludes use glob patterns matched against the relative path from the directory root. Hidden files and symlinks are always skipped.

Watching for changes

The --watch flag sets up a daemon-driven watch that re-indexes files automatically when they change:

keep put ./notes/ -r --watch               # Index + watch for changes
keep put ./notes/ -r --watch -x "*.log"    # With exclude patterns
keep put https://example.com/doc --watch   # Watch a URL for changes

Watches persist across sessions — the daemon polls for changes in the background. Use keep pending to see active watches. Excludes are captured at watch-creation time.

Global ignore patterns

The .ignore system doc contains glob patterns that are automatically excluded from all directory walks and watches — in addition to .gitignore and per-watch --exclude patterns.

keep get .ignore                           # View current patterns
keep edit .ignore                          # Edit in $EDITOR

Updating .ignore retroactively purges matching file:// items from the store and cancels their pending work. Ships with sensible defaults for build artifacts, lock files, bytecode, and binaries.

Text mode and content-addressed IDs

Text mode uses content-addressed IDs for automatic versioning:

keep put "my note"              # Creates %a1b2c3d4e5f6
keep put "my note" -t done      # Same ID, new version (tag change)
keep put "different note"       # Different ID (new document)

Same content = same ID = enables versioning through tag changes.

Smart summary behavior

The LLM prompt used for summarization is configurable. Create a .prompt/summarize/* document whose match rules target specific tags, and its ## Prompt section replaces the default summarization prompt for matching documents. See PROMPTS.md for details.

Update behavior

When updating an existing document (same ID):

Contextual summarization

When you provide tags during indexing, the summarizer uses context from related items to produce more relevant summaries.

  1. System finds similar items sharing your tags
  2. Items with more matching tags rank higher (+20% score boost per tag)
  3. Top related summaries are passed as context to the LLM
  4. Summary highlights relevance to that context

Tag changes trigger re-summarization:

keep put doc.pdf                       # Generic summary
keep put doc.pdf -t topic=auth         # Re-queued for contextual summary

Git changelog

When a directory is a git repository, put -r queues the commit history for background indexing:

keep put ./myproject/ -r
# 42 indexed, 0 errors from myproject/
# git: changelog ingest queued

Each commit becomes a searchable item (ID: git://repo#sha) with the commit message as its summary. Files get a git\_commit edge tag linking to their last commit. Git tags and releases are indexed as separate items (ID: git://repo@tag).

Incremental: On re-scan (or via a watch), only new commits since the last ingest are processed. A git_watermark tag on the directory tracks the last ingested SHA.

Querying git history:

keep find "why was the auth flow changed"     # Finds commit messages by meaning
keep find "auth" --deep                       # File results + linked commit context
keep list 'git://myproject#*'                 # All indexed commits
keep list 'git://myproject@*'                 # All indexed tags/releases
keep get 'git://myproject@v1.0'               # A specific release

Supported formats

FormatExtensionsContent extractedAuto-tags
Text.md, .txt, .py, .js, .json, .yaml, ...Full text
PDF.pdfText from all pages; scanned pages OCR'd in background†
HTML.html, .htmText (scripts/styles removed)
DOCX.docxParagraphs + tablesauthor, title
PPTX.pptxSlides + notesauthor, title
Audio.mp3, .flac, .ogg, .wav, .aiff, .m4a, .wmaStructured metadata (+ transcription\*)artist, album, genre, year, title
Images.jpg, .png, .tiff, .webpEXIF metadata + OCR text† (+ description\*)dimensions, camera, date

\* When a media description provider is configured ([media] in keep.toml), images get vision-model descriptions and audio files get speech-to-text transcription, appended to the extracted metadata. See QUICKSTART.md for setup.

OCR (optical character recognition): Scanned PDF pages (pages with no extractable text) and all image files are automatically queued for background OCR when an OCR provider is available. Keep auto-detects Ollama (using glm-ocr, pulled automatically on first use) or MLX (mlx-vlm on Apple Silicon). A placeholder is stored immediately so the item is indexed right away; the full OCR text replaces it once background processing completes via keep pending. No configuration needed — if Ollama is running, OCR just works.

Auto-extracted tags merge with user-provided tags. User tags win on collision:

keep put file:///path/to/song.mp3                    # Auto-tags: artist, album, genre, year
keep put file:///path/to/song.mp3 -t genre="Nu Jazz" # Overrides auto-extracted genre
keep put file:///path/to/photo.jpg -t topic=vacation  # Adds topic alongside auto camera/date

Indexing documents

Index important documents encountered during work:

keep put "https://docs.example.com/auth" -t topic=auth -t project=myapp
keep put "file:///path/to/design.pdf" -t type=reference -t topic=architecture

See Also