Skip to main content

Personal Knowledge Base

This workflow captures YouTube transcripts, indexes them with qmd, searches with BM25 or hybrid search, and extends the same index with your own documents.

Initialize

bgng init
bgng init --api-key sd_live_xxxxx

init validates the SupaData key, creates the workspace directories, writes default transcripts and notes collections, initializes qmd, and creates an empty Markdown queue.

Capture transcripts

One at a time:

bgng url "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

In bulk:

bgng queue add "https://www.youtube.com/watch?v=A"
bgng queue add "https://youtu.be/B"
cat urls.txt | bgng queue add -
bgng queue list
bgng url batch-process

The batch processor rewrites the queue after each completed URL, so interruption leaves completed items in Processed and the in-flight item pending.

Search and retrieve

bgng search "rust ownership"
bgng query "how do borrow checkers work"
bgng get "2026-04-13/my-video"
bgng get "#a3f2c1" --from 20 --lines 50

Use search for exact terms and fast lookups. Use query for semantic questions, paraphrases, and ambiguous language.

Import local content

Register a directory:

bgng import ~/Documents/Notes --collection notes
bgng import ~/Projects/docs --pattern "**/*.{md,txt}"

Import a single file:

bgng import ./meeting-notes.md --collection notes

Single files are copied into ~/.bgng/<collection>/<basename> before indexing. Directories are registered in-place.

Manage collections and context

bgng collection list
bgng collection add code ~/Projects/awesome --pattern "**/*.ts"
bgng collection add archive ~/Old --no-default
bgng context global "Personal knowledge base of transcripts, notes, and project docs"
bgng context set transcripts /ThePrimeagen "Primeagen videos about Rust, vim, and productivity"

Context descriptions travel with search results and help downstream LLMs interpret matches.

Maintain the index

bgng status
bgng reindex
bgng reindex --force
bgng reindex -c notes

Use --force when you suspect stale or corrupt embeddings. It re-embeds all matching documents and can be slow on large collections.