hugo/bDS2

Files

Chili Palmer a73af6b44d Close TD-20 sqlite pool alignment

2026-06-12 14:07:28 +02:00

48 KiB

Raw Blame History

Technical Debt Backlog

Findings from a full codebase + stack review (2026-06-11) covering BEAM/OTP antipatterns, build-vs-buy decisions, and architecture. Tasks are ordered in phases that make sense to execute sequentially; within a phase, tasks are independent unless noted. Phase 1 contains the top five findings.

How to work these tasks

Every task must follow the project rules in AGENTS.md:

Test-first: write a failing test before the fix, against the real module.
Clean gates: mix compile --warnings-as-errors, full mix test (use xvfb-run mix test on headless Linux), and mix dialyzer must all pass.
Delete dead code completely — no commented-out remnants.
Specs: check specs/ for an allium spec covering the touched area; if the spec changes, validate it with allium check <file> (must exit 0).
No new CDN/external assets in preview or generated HTML.
New user-facing strings go through gettext with translations for de/fr/it/es.
If a task touches an app API surface, update the script bridge and API.md in the same change.

Each task lists: context (why it matters), affected files, suggested approach, and acceptance criteria. Re-verify line numbers before editing — they reflect the review snapshot (commit bf93403).

Phase 1 — Top five (security & correctness)

TD-01: Move the AI secret encryption key out of the repo ✅ DONE (2026-06-11)

Severity: High (security).

Status: implemented. BDS.AI.SecretKey resolves the master key from the macOS Keychain (security CLI) with a 0600 key-file fallback under the private app dir; the deterministic node-name fallback is gone (operations return {:error, :secret_key_unavailable}); BDS.AI.SecretMigration re-encrypts legacy rows at every boot from BDS.RepoBootstrap; the endpoint secret_key_base is generated per boot. The legacy repo key literal remains in SecretBackend/tests only to decrypt and migrate existing user databases — remove it together with SecretMigration in a future release. The test env pins a deterministic :ai_secret_key in config/test.exs so the suite never touches the keyring; that string protects nothing. Rode along (mandated clean-gates): fixed all 10 pre-existing compiler type warnings surfaced by the full recompile and the one dialyzer finding (MapSet opacity in mac_bundle/dylibs.ex), so mix compile --warnings-as-errors --force and mix dialyzer are now clean baselines.

Context. BDS.AI.SecretBackend encrypts AI provider API keys at rest with AES-256-GCM, but the key is the hardcoded string in config/config.exs (config :bds, :ai_secret_key, "bds_desktop_shell_secret_key_base_..."), which is committed to the repository and never overridden in config/runtime.exs, so production uses it too. Anyone with the user's SQLite database file plus the public source can decrypt their API keys. The fallback path is worse: sha256(Atom.to_string(node()) <> ":bds:ai"), and the node name is effectively always nonode@nohost, making the fallback key a constant. The same hardcoded string also serves as the prod secret_key_base for BDS.Desktop.Endpoint.

Files.

lib/bds/ai/secret_backend.ex (secret_key/0, lines ~35–41)
config/config.exs (lines ~22–26: :desktop secret_key_base and :ai_secret_key)
config/runtime.exs (no override exists today)

Approach.

On macOS, store the key in the Keychain via System.cmd("security", ["add-generic-password", ...]) / find-generic-password. On other platforms (and as a portable fallback), generate a random 32-byte key on first launch and persist it with 0600 permissions under the app's private data dir (~/Library/Application Support/bds on macOS — see project conventions; never the repo or the project.json folder).
Remove the deterministic node-name fallback entirely; if no key can be obtained, fail loudly rather than silently degrade to obfuscation.
Migration: on startup, if secrets were encrypted with the legacy hardcoded key, decrypt with the old key and re-encrypt with the new one (one-time, then remove legacy key knowledge in a follow-up release).
Generate the endpoint secret_key_base at runtime (random per boot is fine for a loopback-only desktop endpoint) instead of shipping it in config.

Acceptance. No secret material in the repo; grep -r "secret_key_base_64\|ai_secret_key" config/ shows no literal keys; existing encrypted secrets still decrypt after upgrade (covered by a migration test); SecretBackend tests cover missing-key failure mode.

TD-02: Replace the hand-rolled `:httpc` client with Req ✅ DONE (2026-06-11)

Severity: High (reliability), enables TD-06 (streaming).

Status: implemented. BDS.AI.HttpClient is a Req wrapper with explicit connect/receive timeouts and constant-delay transient retries for GETs only (POST completions are never retried), configurable under config :bds, BDS.AI.HttpClient (:connect_timeout_ms, :receive_timeout_ms, :get_max_retries, :retry_delay_ms). The legacy {:ok, %{status, headers, body}} contract is preserved (raw binary body, downcased single-valued headers), so runtime/catalog callers are unchanged. The automation driver's health check also moved to Req, :inets left extra_applications, and no :httpc remains under lib/ (test files still use :httpc as a client against local servers — they start :inets themselves). TD-06 (SSE streaming via Req into:) is now unblocked.

Context. BDS.AI.HttpClient wraps :httpc with empty http options — the default timeout is infinity, so a hung LLM endpoint (Ollama, LM Studio, any OpenAI-compatible server) blocks the chat task forever (see TD-03 for the blocking chain above it). There is no retry, no connection pooling, no pinned TLS verification (behavior depends on the OTP version's :httpc defaults), and :inets.start()/:ssl.start() are called redundantly on every request (both are already in extra_applications). Req is already an optional dependency of the image package, so the dependency tree barely grows.

Files.

lib/bds/ai/http_client.ex (delete or reduce to a thin Req wrapper)
lib/bds/ai/openai_compatible_runtime.ex (calls HttpClient.get/post)
lib/bds/desktop/automation.ex (also uses :httpc; test-automation server, lower priority but should follow)
mix.exs (add {:req, "~> 0.5"})

Approach. Add Req with explicit connect_options (timeout), receive_timeout (generous but finite, e.g. 120s for LLM responses; make it config-driven under config :bds, :ai), retry: :transient for idempotent GETs only (do NOT auto-retry chat completions), and default TLS verification. Keep the existing {:ok, %{status, headers, body}} | {:error, reason} contract so the runtime and its tests change minimally, or inject Req as the :http_client the way tests already do.

Acceptance. No :httpc calls remain under lib/bds/ai/; a test proves a slow/hung endpoint produces {:error, %{kind: :http_error, reason: :timeout}} within the configured budget instead of hanging; dialyzer clean.

TD-03: Fix `BDS.AI.InFlight` ETS ownership and creation race ✅ DONE (2026-06-11)

Severity: High (correctness).

Status: implemented. BDS.AI.InFlight is now a minimal GenServer whose init/1 creates the named table (:named_table, :public, :set, read_concurrency: true); it is supervised in BDS.Application (before anything that uses chat), so the table lives for the VM's lifetime and the concurrent-first-use race is impossible by construction. The lazy table/0 creation path is deleted; register/unregister/lookup reference the named table directly. test/bds/ai/in_flight_test.exs proves registrations survive the death of the registering process and that the supervised process owns the table.

Context. lib/bds/ai/in_flight.ex creates its named ETS table lazily in whichever process first calls table/0. Two defects: (1) the table is owned by that first caller — typically a transient LiveView or chat task — so when that process exits, the table and all in-flight chat registrations vanish, breaking cancel_chat/1; (2) two concurrent first-callers race on :ets.new/2 with :named_table, and the loser crashes with badarg.

Files.

lib/bds/ai/in_flight.ex
lib/bds/application.ex (supervision tree)

Approach. Create the table from a stable owner. Simplest options:

create it directly in BDS.Application.start/2 (the application process lives for the VM's lifetime), or
give InFlight a minimal GenServer whose init/1 creates the table (:named_table, :public, :set, read_concurrency: true) and add it to the supervision tree before anything that uses chat.

Remove the lazy table/0 creation path entirely; register/unregister/lookup should assume the table exists.

Acceptance. A test demonstrates registrations survive the death of the registering process; no code path calls :ets.new outside the owner's init; concurrent-first-use race is impossible by construction.

TD-04: Flush embedding indexes on shutdown (or delete the dead `flush_all`) ✅ DONE (2026-06-11)

Severity: Medium (perf/contract), High confidence.

Status: implemented. Shutdown.persist_safely/0 now calls BDS.Embeddings.Index.flush_all() next to MainWindow.persist_now(); each persist step is hardened individually (own rescue/catch) so one failure never blocks quit or skips the other step. terminate/2 stays as defense-in-depth for supervised restarts. A test proves a debounced (unsaved) index reaches disk through the real shutdown path before the hard quit fires. The terminate/2 audit found no other graceful-shutdown dependency: job_runner.ex only detaches in-memory state (moot under SIGKILL), automation.ex is the test-automation harness whose ports die with the VM, and main_window.ex bounds persistence was already covered by MainWindow.persist_now() in the shutdown path. The code now matches the spec's DebouncedPersistence invariant (specs/embedding.allium:216).

Context. App shutdown SIGKILLs the BEAM (BDS.Desktop.Shutdown.quit/0 — a documented and legitimate workaround for a wxWidgets static-destructor segfault on macOS). Consequence: no terminate/2 callback in the whole app ever runs on quit. BDS.Embeddings.Index traps exits and relies on terminate/2 to persist debounced HNSW index saves, and its moduledoc promises "force-saved on project switch / shutdown". Meanwhile Embeddings.Index.flush_all/0 exists but has zero production callers (dead code per the AGENTS.md mandate). The lazy-reload-from-DB fallback makes this a startup-performance bug (index rebuilt unnecessarily) rather than data loss, but the code contradicts its own contract.

Files.

lib/bds/desktop/shutdown.ex (persist_safely/0, lines ~92–99)
lib/bds/embeddings/index.ex (flush_all/0 line ~87, terminate/2 line ~167, moduledoc)

Approach. Call BDS.Embeddings.Index.flush_all() inside Shutdown.persist_safely/0 next to MainWindow.persist_now() (same rescue/catch hardening). Keep terminate/2 as defense-in-depth for supervised restarts. Audit for other state that assumed graceful shutdown (search the tree for terminate/2 implementations and check each one's expectations against the SIGKILL path). If instead the decision is that lazy rebuild is the intended behavior, delete flush_all/0 and fix the moduledoc — pick one, don't keep both.

Acceptance. Either flush_all is wired into the shutdown path with a test (injectable shutdown module already exists: :desktop_shutdown_module env), or it is deleted and the moduledoc no longer claims shutdown saving. No other terminate/2 in the codebase silently depends on graceful shutdown for correctness.

TD-05: Replace xmerl with Saxy in the WXR importer; add import transactions ✅ DONE (2026-06-12)

Severity: Medium-High (DoS + integrity on user-supplied files).

Status: implemented. BDS.WxrParser now parses WXR with Saxy.parse_stream/3 for files and Saxy.parse_string/3 for in-memory XML, keeping element names as binaries instead of interning atoms and preserving the existing result shape. Both import write paths now batch work in Repo.transaction chunks of 500 (BDS.ImportExecution and BDS.Posts.RebuildFromFiles), so mid-batch failures roll back cleanly instead of leaving partial imports behind. Acceptance proof now includes a bounded atom-growth parser test with many unique element names, existing import fixture tests, rollback tests for both import and rebuild, and a local SQLite benchmark showing the batching win (1000 inserts: 183ms per-row transactions vs 83ms in 500-row chunks, 2.2x faster).

Context. BDS.WxrParser.parse_xml/1 uses :xmerl_scan.string/1, which creates atoms from element and attribute names in the parsed document. WXR files are user-supplied imports, so a malicious or merely huge/weird file can grow the atom table (atoms are never GC'd → eventual VM crash). It also reads the entire file into memory (File.read! + full DOM); real WordPress exports reach hundreds of MB. Separately, the import/rebuild write loops run one autocommit transaction per row on SQLite — slow (one fsync per post) and a failure mid-way leaves a half-imported database.

Files.

lib/bds/wxr_parser.ex (full rewrite of the parsing layer; keep the output map shape)
lib/bds/import_execution.ex (line ~359: Enum.each(post_ids, ...) write loop)
lib/bds/posts/rebuild_from_files.ex (lines ~37–56: per-file upsert loop)
mix.exs (add {:saxy, "~> 1.6"})

Approach.

Rewrite WxrParser on Saxy (SAX or its simple-form DOM builder). Saxy keeps names as binaries (no atom creation) and supports streaming via Saxy.parse_stream/3 with File.stream!. Preserve the existing public contract (parse_file/1, parse_xml/1 returning %{site:, posts:, pages:, media:, categories:, tags:}) so import_analysis.ex/import_execution.ex are untouched; the existing parser tests become the safety net. Note: sweet_xml is already in the dep tree via image, but it is xmerl-based and inherits the atom problem — don't use it for this.
Wrap import and rebuild write loops in Repo.transaction (chunked, e.g. 500 rows per transaction, to keep WAL size and progress reporting sane). Consider Repo.insert_all in chunks for plain inserts — there is currently not a single insert_all in the codebase.

Acceptance. A test feeds XML with many unique element names and asserts :erlang.system_info(:atom_count) does not grow proportionally; existing import fixture tests pass unchanged; a failure injected mid-import leaves the DB unchanged (rolled back chunk); large-import benchmark shows the transaction batching speedup.

Phase 2 — Unbounded blocking & cancellation

TD-06: Real SSE streaming for chat ✅ DONE (2026-06-11)

Depends on TD-02 (Req).

Status: implemented. Chat requests now send "stream": true (+ stream_options.include_usage) and consume the SSE response incrementally via HttpClient.post_stream/5 (Req into:). BDS.AI.SSE assembles content deltas, tool-call fragments, and usage, emitting cumulative content snapshots throttled to stream_emit_interval_ms (default 100ms) — replace semantics, so the chat editor needed no changes and tool rounds reset naturally. Streaming applies only to operation: :chat with an :on_stream callback, can be disabled via config :bds, :chat, streaming: false, and providers that ignore the stream flag are auto-detected by content-type and parsed as plain JSON. Cancellation kills the chat task, which aborts the underlying connection (server-observed in tests). Persistence semantics are unchanged (one assistant row per round, same usage normalization).

Context. OpenAICompatibleRuntime.generate/3 never sets "stream": true; the UI's {:chat_streaming_content, ...} event fires exactly once with the complete response, i.e. streaming is fake. For local models this is the single biggest perceived-latency win available.

Files.

lib/bds/ai/openai_compatible_runtime.ex
lib/bds/ai/chat.ex (notify_chat_event plumbing, chat_round/9)
lib/bds/desktop/shell_live/chat_editor.ex (consume incremental events)

Approach. Use Req's into: option (fun or :self) to consume text/event-stream chunks, parse data: lines incrementally, emit {:chat_streaming_content, conversation_id, delta} per token-batch, and accumulate the full message for persistence (keep persistence semantics identical: one assistant row per round, tool_calls assembled from streamed fragments). Gate on a config flag so non-streaming providers still work. Remember the AGENTS.md AI rule: all automatic AI activity stays gated by airplane mode / local model / toast.

Acceptance. A mock SSE server test asserts multiple incremental content events arrive before the final {:ok, reply}; tool-call rounds still work; cancellation mid-stream (TD-07) aborts the HTTP request.

TD-07: Bound the chat await chain; end-to-end timeout & cancellation ✅ DONE (2026-06-12)

Status: implemented. BDS.AI.Chat.send_chat_message/3 no longer waits unboundedly on the supervised chat task: await_chat_task/2 now applies a global deadline derived from the configured per-request HTTP budget and the bounded tool loop (BDS.AI.HttpClient.request_timeout_ms() * (chat_max_tool_rounds + 1) + config :bds, :chat, :await_timeout_margin_ms). On deadline expiry it returns {:error, :chat_timeout} and shuts the task down via Task.shutdown/2, so the caller is released even if the runtime wedges. config/config.exs now exposes :await_timeout_margin_ms under :chat. The acceptance proof is a shutdown-aware blocking runtime test that asserts the timeout result, verifies the task receives shutdown, and confirms the conversation persists only the user message on timeout; existing cancellation and streaming tests remain green.

Context. BDS.AI.Chat.send_chat_message/3 blocks the caller (a LiveView process) on a hand-rolled await_chat_task/1 — a raw receive with no after clause. Combined with the infinite HTTP timeout (TD-02) the whole chain is unbounded. Even after TD-02, a defense-in-depth deadline belongs here.

Files.

lib/bds/ai/chat.ex (send_chat_message/3 lines ~185–219, await_chat_task/1 lines ~855–882)

Approach. Give await_chat_task an after deadline derived from config (request timeout × max_tool_rounds + margin), returning {:error, :chat_timeout} and terminating the task via Task.Supervisor.terminate_child (the path cancel_chat/1 already uses). Alternatively restructure to fully async: run the chat task fire-and-forget and deliver results to the LiveView via the existing event_target / PubSub mechanism, so no process ever blocks. The async restructure is the better end state; the after clause is the cheap immediate fix.

Acceptance. A stalled runtime stub cannot hang the caller past the deadline; cancel during a round kills the HTTP request and leaves the conversation in a consistent persisted state.

TD-08: Remove test-sandbox scaffolding from production chat code ✅ DONE (2026-06-12)

Status: implemented. BDS.AI.Chat.send_chat_message/3 now starts the chat worker directly with Task.Supervisor.async_nolink/2; the temporary :sandbox_ready receive/send barrier and allow_repo_sandbox/1 are deleted from production code. The repo is on ecto_sql 3.13.5, and the supervised chat task works correctly through sandbox $callers propagation in the project's existing test modes (manual checkout plus the suite's ownership / shared-mode callers). test/bds/ai/chat_sandbox_cleanup_test.exs now proves both acceptance points: the production module has zero sandbox references, and send_chat_message/3 still persists its user/assistant messages through the supervised task without explicit sandbox allowance. Validation gates are clean: mix compile --warnings-as-errors --force, full mix test, and mix dialyzer all pass.

Context. chat.ex contains a :sandbox_ready send/receive handshake and allow_repo_sandbox/1 (with Code.ensure_loaded? + blanket rescue) purely so the Ecto SQL sandbox works for the async_nolink chat task in tests. Since ecto_sql 3.4 the sandbox automatically follows $callers for processes spawned via Task/Task.Supervisor, making all of this unnecessary — verify this against the project's ecto_sql version and test modes first (it holds for the default :shared/ownership modes when the caller chain is intact).

Files.

lib/bds/ai/chat.ex (lines ~197–214, ~938–950)

Approach. Delete the handshake (receive :sandbox_ready, the send(task.pid, :sandbox_ready), and allow_repo_sandbox/1); run the full test suite to confirm $callers propagation covers it. If some test setup genuinely needs explicit allowance, move that into the test helper, not production code.

Acceptance. Production module has zero sandbox references; full suite green.

TD-09: Graceful task cancellation in `BDS.Tasks` ✅ DONE (2026-06-12)

Status: implemented. BDS.Tasks.cancel_task/1 and BDS.Scripting.JobRunner.handle_call(:cancel, ...) now terminate supervised workers through their owning Task.Supervisor (terminate_child/2), so the worker receives :shutdown instead of an immediate :kill. Queue bookkeeping and :cancelled status semantics are unchanged, but cooperative cleanup now runs before the slot is freed. Coverage now includes a task worker that traps exits, confirms cleanup executes on cancellation, and proves the queued task is promoted afterward, plus a managed scripting job runtime test that traps exits and observes the same shutdown-driven cleanup path.

Context. Tasks.cancel_task/1 uses Process.exit(pid, :kill) — the worker gets no chance to clean up mid-upload or mid-file-write (Persistence.atomic_write mitigates file corruption but not e.g. remote half-states). cancel_chat already models the right approach.

Files.

lib/bds/tasks.ex (line ~130)
lib/bds/scripting/job_runner.ex (handle_call(:cancel, ...) line ~67 — same pattern)

Approach. Use Task.Supervisor.terminate_child/2 (delivers :shutdown, escalates to kill after the child's shutdown timeout). Workers that need cleanup can trap exits locally. Keep the state bookkeeping identical (:cancelled status, queue promotion).

Acceptance. A test worker with trap_exit observes :shutdown and runs its cleanup before dying; cancelled tasks still free a concurrency slot.

TD-10: Timeouts for external `git` commands ✅ DONE (2026-06-12)

Status: implemented. BDS.Git now applies config-driven timeout budgets to every shell-out (config :bds, :git with :local_timeout_ms defaulting to 15s and :network_timeout_ms defaulting to 120s for fetch/pull/push). The default runner no longer uses unbounded System.cmd/3; it launches the child through Port.open/2, collects stdout/stderr, and on timeout closes the port and explicitly terminates the OS process if it is still alive. Timeout results surface as structured %{kind: :timeout, operation:, timeout_ms:, message:} errors, while existing structured auth guidance is preserved. Acceptance proof now includes a bounded runner-stub timeout test and a real-process cleanup test that launches sleep, times out, and asserts the spawned PID is gone.

Context. BDS.Git shells out via System.cmd, which has no timeout. GIT_TERMINAL_PROMPT=0 and SSH BatchMode prevent interactive hangs, but a network-level stall on fetch/pull/push/lfs blocks the calling task indefinitely. (Shelling out to git itself is the right call — current Elixir libgit2 bindings are not mature; keep that decision.)

Files.

lib/bds/git.ex (system_runner/3 line ~343, run_git/3)

Approach. Wrap the command in Task.async + Task.yield(task, timeout) || Task.shutdown(task, :brutal_kill) inside system_runner (note: the OS process may need explicit killing — consider MuonTrap for proper external-process supervision, or spawn git via a port with :kill_group). Default timeout config-driven: generous for network ops (120s), short for local ops (15s). Map timeout to a structured error like the existing %{kind: :auth, ...} shape so the UI can toast it.

Acceptance. A runner stub that sleeps past the deadline produces {:error, %{kind: :timeout, ...}}; no orphaned OS processes after timeout (test with a real sleep binary).

Phase 3 — Process architecture (de-bottleneck singletons)

TD-11: Move preview rendering out of the `BDS.Preview` GenServer ✅ DONE (2026-06-12)

Status: implemented. BDS.Preview now keeps only preview-server lifecycle and drain coordination in the GenServer. The current server state lives in a public ETS table, so both direct preview calls and socket-served HTTP requests resolve against that fast state read and render outside the singleton owner process. Direct request/2 and preview_draft/3 calls now run in short-lived tracked tasks, while TCP request workers render inline after inheriting the test sandbox allowance; graceful stop flips is_running false in ETS before draining and waits for those tracked render workers to finish. Coverage now proves two real HTTP preview renders can be in flight at once and that stop_preview/1 waits for a blocked render to complete before returning.

Context. Every preview HTTP request (Preview.request/2, preview_draft/3) renders Markdown + Liquid inside the singleton BDS.Preview GenServer via handle_call. A browser loading one page fires parallel requests (page, CSS, images) that all serialize through this single process — the textbook "process used for code organization" bottleneck.

Files.

lib/bds/preview.ex (handle_call({:request, ...}) line ~93, handle_call({:preview_draft, ...}) line ~102)
lib/bds/preview/router.ex (the Plug calling into Preview)

Approach. Keep the GenServer for what actually needs serialization (server lifecycle: start/stop/ensure, current-project state, graceful drain). For requests, expose a fast state read (:sys.get_state-free — e.g. a public ETS table or a lightweight handle_call(:current_project)) and perform rendering in the Bandit request process itself. The drain logic (@drain_timeout) already tracks inflight requests — adapt it to count renderers via monitors or a counter.

Acceptance. A test issues N concurrent slow renders and asserts they overlap (wall time « N × single render); stop_preview still drains correctly.

TD-12: Move HNSW builds and duplicate scans out of `Embeddings.Index` handle_call ✅ DONE (2026-06-12)

Status: implemented. BDS.Embeddings.Index now runs duplicate scans and HNSW rebuilds in supervised tasks instead of inside handle_call, while the GenServer keeps only the small serialized state surface (current index, label map, debounce timers, and flush coordination). Neighbor queries continue to hit the current index while a scan or rebuild is in flight; rebuild requests for a project coalesce onto the latest requested snapshot; flush/1 and flush_all/0 wait for in-flight rebuilds before persisting; and forget/1 cancels pending index work cleanly. Acceptance proof now includes focused concurrency tests for both slow duplicate scans and slow rebuilds, and the existing debounced persistence coverage remains green.

Context. Embeddings.Index (singleton) builds HNSW graphs and runs full duplicate scans inside handle_call with client timeout :infinity. A long duplicate scan head-of-line-blocks every neighbor query the UI makes. The state it guards (index refs, label maps, debounce timers) is small; the work is what's big.

Files.

lib/bds/embeddings/index.ex (handle_call({:put, ...}) ~105, {:duplicate_pairs, ...} ~128)

Approach. For duplicate_pairs (read-only over the index ref): capture the entry in the GenServer, spawn the scan in a Task (HNSWLib index resources are usable from other processes — verify; they are NIF resources, confirm thread-safety of concurrent reads in hnswlib docs), and reply via GenServer.reply/2. For put (build): construct the graph in the caller or a Task and hand the finished index to the GenServer for state swap + debounced save. Queries during a rebuild keep hitting the old index.

Acceptance. Neighbor queries return while a duplicate scan is running (test with a large synthetic index); debounce/flush semantics unchanged.

TD-13: Slim down the `Publishing` GenServer call surface ✅ DONE (2026-06-12)

Status: implemented. BDS.Publishing now keeps only SCP mtime state in its GenServer. Publish-job creation, lookup, and status updates run directly through Repo, while background-task job updates use a stable Repo caller so sandboxed tests still exercise the real async path. SCP uploads no longer round-trip through should_upload_scp_file / mark_uploaded_scp_file per file; each target now batches one filter call for changed files and one bulk record call for successfully uploaded mtimes. Coverage includes a focused batching test that proves a multi-file SCP publish keeps bookkeeping traffic bounded instead of scaling linearly with file count.

Context. Publishing does Repo writes inside handle_call (:upload_site, :update_job) and the uploader makes per-file synchronous calls (:should_upload_scp_file / :mark_uploaded_scp_file) — chatty serialization through one process during uploads.

Files.

lib/bds/publishing.ex (lines ~45–119)

Approach. Job creation (Repo insert) can happen in the caller before the call; the GenServer keeps only scp_uploads mtime state. Batch the mtime check: one call returning the filtered upload list instead of two calls per file. Consider whether scp_uploads state should be ETS (read_concurrency) owned by the GenServer.

Acceptance. Upload of N files makes O(1) GenServer calls for mtime bookkeeping, not O(N)·2; behavior identical for incremental uploads.

TD-14: Replace polling with messaging (CliSync watcher + rebuild sequencing) ✅ DONE (2026-06-12)

Status: implemented. BDS.CliSync.Watcher now gates notification-table work behind SQLite PRAGMA data_version, so unchanged databases no longer force repeated db_notifications queries at the 100 ms watch cadence; the watcher still performs a real notification fetch/prune pass on the first poll and on every external commit boundary. Rebuild sequencing in BDS.Desktop.ShellCommands no longer sleep-polls Tasks.list_tasks/0. BDS.Tasks now broadcasts terminal task states on BDS.PubSub, and wait_for_group_phase/3 subscribes and waits on those messages with the same deadline semantics. Coverage now includes a watcher test that proves unchanged data_version skips notification queries, a tasks test that proves terminal state broadcasts, and a shell-command guard test that forbids Process.sleep polling in the rebuild wait path.

Context. Two polling loops:

CliSync.Watcher polls the SQLite notifications table every 100 ms forever — ~10 queries/sec at idle in a battery-powered desktop app.
shell_commands.ex wait_for_group_phase/3 sleep-polls Tasks.list_tasks() every 50 ms to sequence rebuild steps.

Files.

lib/bds/cli_sync/watcher.ex (@default_poll_interval_ms 100)
lib/bds/desktop/shell_commands.ex (wait_for_group_phase/3 lines ~563–585)
lib/bds/tasks.ex (add completion broadcast)

Approach.

Watcher: cheapest meaningful fix is a PRAGMA data_version pre-check — a single integer read that changes only when another connection commits; only query the notifications table when it moves. (SQLite update hooks don't fire for external connections, so the CLI-writes use case genuinely needs polling — make it near-free instead of removing it.) Alternatively watch the -wal file mtime with the file_system package. Also consider lengthening the idle interval with backoff.
Rebuild sequencing: have BDS.Tasks broadcast task terminal states on BDS.PubSub (it already sits next to PubSub in the supervision tree); wait_for_group_phase subscribes and receives with a deadline instead of sleep-polling.

Acceptance. Idle app issues no table queries when data_version is unchanged (or interval ≥ 1s with backoff); rebuild sequencing has no Process.sleep; CLI-sync round-trip latency stays ≤ current behavior.

TD-15: `BDS.Tasks` housekeeping (queue type, eviction timers) ✅ DONE (2026-06-12)

Status: implemented. BDS.Tasks now uses :queue for its pending work queue, so enqueue/dequeue on the hot path are O(1) and the FIFO behavior is unchanged. Finished-task cleanup now tracks a single live eviction timer ref instead of scheduling a fresh send_after/3 on every terminal task; the timer fires, prunes expired finished tasks, and only reschedules itself if finished tasks still remain. Coverage now includes a focused task-state test proving multiple finished tasks share the same live eviction timer and a source guard that forbids queue ++ churn.

Context. Minor inefficiencies in tasks.ex: the pending queue is a list appended with ++ (O(n) per submit), and every finishing task schedules a fresh 1-hour send_after eviction timer — timers accumulate without bound (harmless, but sloppy).

Files.

lib/bds/tasks.ex (handle_call({:submit_task,...}) ~84, schedule_finished_task_eviction/1 ~365)

Approach. Use :queue for the pending queue. Replace per-finish timers with one periodic sweep (send_after rescheduled in handle_info(:evict_finished_tasks, ...)) or track a single timer ref. Also consider downgrading report_progress from call to cast (progress is fire-and-forget; the throttle already drops updates).

Acceptance. One live eviction timer at most; queue operations O(1); existing task lifecycle tests green.

Phase 4 — Build-vs-buy replacements

TD-16: Frontmatter robustness — yaml_elixir/ymlr or harden the hand-rolled parser ✅ DONE (2026-06-12)

Status: implemented. The project keeps the hand-rolled serializer/parser for byte-stable frontmatter output, but BDS.Frontmatter is now hardened for the known user-edited-file cases: parse_document/1 normalizes CRLF and lone \r line endings before splitting the gray-matter envelope, and quoted scalar parsing now removes exactly one closing quote and unescapes content explicitly instead of trim_trailing/2, which previously corrupted strings whose content ended in a quote character. Coverage now includes focused parser tests for CRLF documents and quoted-string roundtrips with embedded quotes, backslashes, and trailing-quote content, plus adjacent post/template/maintenance frontmatter and serializer parity suites.

Context. BDS.Frontmatter is a hand-rolled YAML subset with concrete bugs for user-edited files:

parse_document splits on "\n---\n" — CRLF files never match and are rejected as :invalid_frontmatter. External editors on Windows will produce these.
parse_string handles quotes by trimming the trailing quote char — a quoted string containing its own quote mid-string is corrupted.
No nesting support (may be intentional).

Decision needed: adopt yaml_elixir (parse) + ymlr (emit), or keep the subset deliberately (round-trip stability for file diffs is a legitimate reason) and fix the bugs. Note the metadata sync rule: frontmatter is read by publishing, metadata-diff, and rebuild-from-files — all three must stay in agreement, which argues for keeping one serializer either way.

Files.

lib/bds/frontmatter.ex
Callers: lib/bds/posts/file_sync.ex, lib/bds/posts/rebuild_from_files.ex, lib/bds/document_fields.ex

Approach (minimum). Normalize \r\n → \n before parsing; rewrite parse_string to properly unescape (scan, don't trim); add property-style round-trip tests (parse(serialize(x)) == x) including quotes, newlines, unicode, CRLF input. If switching to yaml_elixir: keep serialize_document output byte-stable against current golden files to avoid noisy git diffs in user projects.

Acceptance. CRLF fixture parses; round-trip property tests pass; golden serialization fixtures unchanged (if keeping custom serializer).

TD-17: Language detection via `paasaa` (optional, low priority) ✅ DONE (2026-06-12)

Status: implemented without adding paasaa. The originally reported misclassifications are not reproducible on the current code: the existing detector already classifies the relevant umlaut-free German and accent-free French fixtures correctly through its language-hint fallback, and new focused tests now lock that behavior down directly. Because the acceptance cases are now satisfied and the current implementation keeps dependency weight lower, the project does not add paasaa at this time. Revisit only if broader real-world fixtures start failing.

Context. Search.detect_language/1 uses diacritic regexes + tiny word lists; German text without umlauts (common in short posts) falls through to English, picking the wrong stemmer. paasaa is a pure-Elixir trigram detector.

Files.

lib/bds/search.ex (lines ~60–89, detect_language/1, @language_hints)

Approach. Add {:paasaa, "~> 0.6"}; use it for texts above a length threshold, keep the cheap heuristics as a short-text fallback; map ISO codes to @stemmer_algorithms keys; default "en" unchanged. Delete @language_hints if fully superseded (dead-code rule).

Acceptance. Misclassification fixtures (umlaut-free German, accented-free French) detect correctly; stemmer selection unchanged for English.

TD-18: Evaluate Oban (Lite engine) for durable jobs ✅ DONE (2026-06-12)

Status: evaluated; no-go for now. BDS.Publishing already persists PublishJob rows and the suite proves those rows survive a BDS.Publishing restart; what it does not provide is durable mid-upload resume/retry, and the current desktop spec does not require that behavior. Moving publishing to Oban Lite now would add another coordination layer on top of BDS.Tasks, force a redesign of the in-memory SCP mtime cache in BDS.Publishing, and complicate the existing Ecto sandbox/release split for a benefit that is not yet part of the product contract. BDS.Scripting.JobStore and BDS.Tasks remain intentionally in-memory: they provide UI/runtime job tracking, not durable work queues. Re-open this only if real crash-recovery requirements emerge and a spike shows acceptable desktop binary/startup impact.

Context. Three overlapping job systems exist: in-memory UI tasks (BDS.Tasks), DB-persisted publish jobs (Publishing + PublishJob rows), and scripting jobs (JobStore/JobRunner/JobSupervisor). The durable ones (publishing, possibly long script jobs) would get retries, uniqueness, crash-recovery-on-restart, and telemetry for free from Oban's SQLite Lite engine on the existing ecto_sqlite3 setup. This is an evaluation task, not a mandate — the in-memory BDS.Tasks UI progress system should stay.

Files (for the evaluation).

lib/bds/publishing.ex, lib/bds/publishing/publish_job.ex
lib/bds/scripting/job_store.ex, job_runner.ex, job_supervisor.ex
lib/bds/tasks.ex (stays; would wrap Oban job progress for the UI)

Approach. Spike: model upload_site as an Oban worker with the Lite engine; check binary-size and startup cost impact for the desktop release; check interplay with the test sandbox and with the bds_mcp release. Write up a go/no-go with the spike branch. If no-go, document why in this file and close.

Acceptance. Decision documented. Outcome: no-go for now; publishing keeps its current persisted-status model and the task closes without an Oban migration.

TD-19: Add credo, mix_audit (and consider sobelow) to the quality gates ✅ DONE (2026-06-12)

Status: implemented. mix.exs now adds credo and mix_audit as dev/test-only tooling, and mix validate runs in the test environment with all four gates wired in: tests, Credo, dependency audit, and dialyzer. .credo.exs deliberately scopes the initial baseline to stable, actionable checks and explicitly disables the repo-wide legacy style/refactor rules that would otherwise drown the signal. The dependency audit is clean except for one documented upstream exception in .mix_audit.ignore: GHSA-rhv4-8758-jx7v remains unavoidable while bumblebee depends on progress_bar ~> 3.0, which pins decimal ~> 2.0; the higher-signal Phoenix advisory was removed by updating the lockfile to phoenix 1.8.8. Sobelow was evaluated and not adopted: for this desktop app's loopback-only endpoint it would mostly duplicate checks already covered by compile/test, Credo, and dependency audit while adding little release signal.

Context. The project enforces dialyzer + warnings-as-errors but has no style/consistency linter and no dependency CVE audit. Cheap, high-leverage additions.

Files.

mix.exs (deps + validate alias), new .credo.exs

Approach. Add {:credo, "~> 1.7", only: [:dev, :test], runtime: false} and {:mix_audit, "~> 2.1", only: [:dev, :test], runtime: false}. Extend the validate alias: ["test", "credo --strict", "deps.audit", "dialyzer"]. Start credo in non-strict mode if the initial violation count is large; fix or explicitly disable rules rather than leaving noise. Sobelow is Phoenix-web-oriented; evaluate whether its checks add signal for a loopback-only desktop endpoint before adopting.

Acceptance. mix validate runs all four gates clean; CI/agent instructions (AGENTS.md) updated to mention them.

Phase 5 — Consistency, config & hygiene

TD-20: Align SQLite pool configuration between dev and prod ✅ DONE (2026-06-12)

Status: implemented. The effective mismatch was narrowed to one place: config.exs, dev.exs, prod.exs, and test.exs were already aligned on a modest pool of 5, but config/runtime.exs still defaulted production to POOL_SIZE=1. The prod runtime default now also uses 5, with an inline comment documenting the rationale: keep SQLite concurrency behavior consistent across dev and prod under the existing WAL + busy_timeout setup unless an operator explicitly overrides POOL_SIZE. A regression test reads config/runtime.exs in :prod and locks that default to 5.

Context. Dev runs pool_size: 5 (config.exs), prod runs pool_size: 1 (runtime.exs) — so dev and prod have different concurrency semantics: dev can hit SQLITE_BUSY/interleavings prod never sees; prod serializes every read behind every write. WAL mode + busy_timeout: 15_000 are already set globally.

Files.

config/config.exs (pool_size 5), config/runtime.exs (POOL_SIZE default "1")

Approach. Pick one deliberate model and apply it to both envs. Options: (a) pool of 1 everywhere (simplest, fully serialized, fine for single-user); (b) modest pool (3–5) everywhere relying on WAL + busy_timeout; (c) ecto_sqlite3's documented separate read/write pool pattern. Document the choice in a comment. Run the suite and a manual smoke under the chosen prod setting.

Acceptance. Same pool model in dev and prod; rationale comment in config; no busy-timeout regressions in tests.

TD-21: Harden `Persistence.atomic_write`

Context. atomic_write/2 uses a fixed path <> ".tmp" temp name — two concurrent writers to the same path corrupt each other's temp file before rename. No fsync before rename (crash can leave an empty/partial target on some filesystems); for blog-post files that durability trade-off is defensible, but the temp-name collision is not.

Files.

lib/bds/persistence.ex (atomic_write/2 lines ~70–82)

Approach. Unique temp suffix: path <> ".tmp." <> Integer.to_string(System.unique_integer([:positive])). Optionally :file.sync the temp file before rename behind an opt-in flag for critical writes. Clean up stray *.tmp.* files defensively where directories are scanned (check rebuild-from-files glob patterns ignore them).

Acceptance. Concurrent-writer test produces two intact outcomes (last write wins, no corruption); rebuild file globs ignore temp files.

TD-22: Wrap `delete_chat_conversation` in a transaction

Context. chat.ex deletes all messages (Repo.delete_all) then the conversation (Repo.delete) without a transaction — a failure in between strands orphan message rows. The codebase uses Repo.transaction conscientiously elsewhere; this is an outlier. (Check for siblings: any other multi-statement write without a transaction found during work on this task should be fixed in the same pass — e.g. audit lib/bds/ai/catalog.ex, lib/bds/scripts.ex.)

Files.

lib/bds/ai/chat.ex (delete_chat_conversation/1 lines ~102–117)

Approach. Repo.transaction(fn -> ... end) or Ecto.Multi. Better: add ON DELETE CASCADE via an Ecto migration on the chat_messages.conversation_id FK and drop the manual delete_all (use mix ecto.gen.migration per project rules).

Acceptance. Injected failure between the two deletes leaves both intact; no orphaned messages possible.

TD-23: Sweep blanket `rescue`/`catch` blocks for silent failure

Context. Several modules rescue all exceptions into :ok/fallbacks (shutdown.ex defensibly — it must never block quit; but also overlay_components.ex (11 rescues), import_execution.ex, import_analysis.ex, main_window.ex, Repo.ready?). Blanket rescues hide real bugs (typos become silent no-ops) and dialyzer can't see through them.

Files. Start with: lib/bds/desktop/shell_live/overlay_components.ex, lib/bds/import_execution.ex, lib/bds/import_analysis.ex, lib/bds/desktop/main_window.ex.

Approach. For each rescue: narrow to the specific exception(s) actually expected, add a Logger.warning with context where swallowing is the right call, and delete rescues that guard nothing. Shutdown.ex's rescues stay as-is (documented intent: never block quit).

Acceptance. No bare rescue _ -> without either a narrow exception match or a logged justification; suite green.

TD-24: Fix stale `npm test` reference in AGENTS.md

Context. AGENTS.md "Fix All Test Failures" section says "Run the full test suite (npm test)" — stale from the pre-Elixir rewrite. It's an instruction file for agents; wrong commands actively mislead.

Files. AGENTS.md

Approach. Replace with mix test (and the xvfb-run mix test headless note already present elsewhere in the file). While in there, scan for other JS-era references (React components are mentioned in the i18n section — verify whether that section should say LiveView/HEEx).

Acceptance. AGENTS.md commands all match the Elixir toolchain.

TD-25: Generate the desktop endpoint secret at runtime ✅ DONE (2026-06-11, shipped with TD-01)

Context. The Phoenix endpoint secret_key_base (session/LiveView signing) was a hardcoded repo string in all envs.

Status: implemented as part of TD-01. BDS.Application.desktop_secret_key_base/0 now generates Base.encode64(:crypto.strong_rand_bytes(48)) per boot when no explicit config is set; the static value was removed from config/config.exs.

Explicit non-issues (decisions reviewed and endorsed)

Recorded so future reviews don't re-litigate them:

Shelling out to git instead of libgit2 NIF bindings — correct; bindings are immature. (Timeout gap tracked as TD-10.)
SIGKILL shutdown for the wx static-destructor segfault — legitimate, well-documented workaround. (Terminate-callback consequence tracked as TD-04.)
BDS.BoundedAtoms allow-list atom conversion — good practice, keep.
Luerl with reduction/time caps for user scripting — right tool, sound sandbox config.
SQLite FTS5 + bm25 for search, HNSWLib for ANN, Liquex/Earmark for rendering — all appropriate.
Integer-millisecond timestamps (Persistence.now_ms) over :utc_datetime_usec — committed design choice; consistent; keep parse_timestamp's seconds-vs-ms heuristic under test, don't migrate.
elixir-desktop dependency risk (niche, lightly maintained) — acknowledged; mitigation is the existing thin BDS.Desktop.* boundary, not replacement.
No telemetry pipeline — reasonable cut for a desktop app; revisit only if support/debugging demands it.

48 KiB Raw Blame History Unescape Escape

Technical Debt Backlog

How to work these tasks

Phase 1 — Top five (security & correctness)

TD-01: Move the AI secret encryption key out of the repo ✅ DONE (2026-06-11)

TD-02: Replace the hand-rolled :httpc client with Req ✅ DONE (2026-06-11)

TD-03: Fix BDS.AI.InFlight ETS ownership and creation race ✅ DONE (2026-06-11)

TD-04: Flush embedding indexes on shutdown (or delete the dead flush_all) ✅ DONE (2026-06-11)

TD-05: Replace xmerl with Saxy in the WXR importer; add import transactions ✅ DONE (2026-06-12)

Phase 2 — Unbounded blocking & cancellation

TD-06: Real SSE streaming for chat ✅ DONE (2026-06-11)

TD-07: Bound the chat await chain; end-to-end timeout & cancellation ✅ DONE (2026-06-12)

TD-08: Remove test-sandbox scaffolding from production chat code ✅ DONE (2026-06-12)

TD-09: Graceful task cancellation in BDS.Tasks ✅ DONE (2026-06-12)

TD-10: Timeouts for external git commands ✅ DONE (2026-06-12)

Phase 3 — Process architecture (de-bottleneck singletons)

TD-11: Move preview rendering out of the BDS.Preview GenServer ✅ DONE (2026-06-12)

TD-12: Move HNSW builds and duplicate scans out of Embeddings.Index handle_call ✅ DONE (2026-06-12)

TD-13: Slim down the Publishing GenServer call surface ✅ DONE (2026-06-12)

TD-14: Replace polling with messaging (CliSync watcher + rebuild sequencing) ✅ DONE (2026-06-12)

TD-15: BDS.Tasks housekeeping (queue type, eviction timers) ✅ DONE (2026-06-12)

Phase 4 — Build-vs-buy replacements

TD-16: Frontmatter robustness — yaml_elixir/ymlr or harden the hand-rolled parser ✅ DONE (2026-06-12)

TD-17: Language detection via paasaa (optional, low priority) ✅ DONE (2026-06-12)

TD-18: Evaluate Oban (Lite engine) for durable jobs ✅ DONE (2026-06-12)

TD-19: Add credo, mix_audit (and consider sobelow) to the quality gates ✅ DONE (2026-06-12)

Phase 5 — Consistency, config & hygiene

TD-20: Align SQLite pool configuration between dev and prod ✅ DONE (2026-06-12)

TD-21: Harden Persistence.atomic_write

TD-22: Wrap delete_chat_conversation in a transaction

TD-23: Sweep blanket rescue/catch blocks for silent failure

TD-24: Fix stale npm test reference in AGENTS.md

TD-25: Generate the desktop endpoint secret at runtime ✅ DONE (2026-06-11, shipped with TD-01)

Explicit non-issues (decisions reviewed and endorsed)

48 KiB

Raw Blame History

TD-02: Replace the hand-rolled `:httpc` client with Req ✅ DONE (2026-06-11)

TD-03: Fix `BDS.AI.InFlight` ETS ownership and creation race ✅ DONE (2026-06-11)

TD-04: Flush embedding indexes on shutdown (or delete the dead `flush_all`) ✅ DONE (2026-06-11)

TD-09: Graceful task cancellation in `BDS.Tasks` ✅ DONE (2026-06-12)

TD-10: Timeouts for external `git` commands ✅ DONE (2026-06-12)

TD-11: Move preview rendering out of the `BDS.Preview` GenServer ✅ DONE (2026-06-12)

TD-12: Move HNSW builds and duplicate scans out of `Embeddings.Index` handle_call ✅ DONE (2026-06-12)

TD-13: Slim down the `Publishing` GenServer call surface ✅ DONE (2026-06-12)

TD-15: `BDS.Tasks` housekeeping (queue type, eviction timers) ✅ DONE (2026-06-12)

TD-17: Language detection via `paasaa` (optional, low priority) ✅ DONE (2026-06-12)

TD-21: Harden `Persistence.atomic_write`

TD-22: Wrap `delete_chat_conversation` in a transaction

TD-23: Sweep blanket `rescue`/`catch` blocks for silent failure

TD-24: Fix stale `npm test` reference in AGENTS.md