fix: A1-16 keep public project content out of repo via per-user content location and machine-local project registry

This commit is contained in:
2026-05-29 21:58:46 +02:00
parent 9d5764b251
commit cf8b0af15f
17 changed files with 148 additions and 408 deletions

View File

@@ -27,7 +27,7 @@ Gap categories: **SC** = spec correct, fix code | **CS** = code correct, update
| A1-14b | ~~USearch HNSW ANN index + debounced persistence not implemented~~ | embedding.allium config/FindSimilar/DebouncedPersistence | `Embeddings.Index` is now an HNSW (hnswlib) ANN index with debounced persistence | **Resolved:** rewrote `Embeddings.Index` as a DB-free GenServer wrapping an hnswlib HNSW graph (cosine, M=16, efConstruction=128, efSearch=64) — O(n·log n) build, O(log n) queries, replacing the O(n²) JSON cosine snapshot; per-project in-memory index + `label→post_id` map; 5s debounced `save_index` + `.meta.json` sidecar, force-save on project switch (`set_active_project`) and shutdown (`terminate`), `forget/1` on project delete; lazy reload from disk with rebuild-from-DB self-heal on miss; `find_similar`/`find_duplicates`/`compute_similarities` rewired (no brute-force fallback); USearch has no Elixir binding so hnswlib provides the identical HNSW algorithm/params (spec reconciled); supervision + dialyzer PLT updated; tests updated for debounced/binary persistence + self-heal. Follow-up hardening: explicit rebuild now forces re-embedding regardless of content_hash (ReindexAll), and model-unavailable errors propagate cleanly (post saves degrade to unindexed + log; rebuild/index return `{:error, reason}` surfaced as a failed task with a user-facing message instead of crashing). |
| A1-14c | ~~Embedding model runs on CPU only; no Apple GPU acceleration~~ | embedding.allium invariant NativeAcceleratedExecution | `Backends.Neural` now selects the defn compiler at serving-build time: Apple GPU via EMLX (MLX/Metal) on arm64 macOS, EXLA-CPU elsewhere | **Resolved:** added `{:emlx, "~> 0.2.0"}` dep (ships precompiled MLX binaries; EMLX 0.2.0 implements both `EMLX.Backend` and the `Nx.Defn.Compiler` behaviour, GPU-default); `Backends.Neural` gained a pure `select_accelerator/3` policy (`:auto` prefers EMLX only when available **and** on Apple Silicon; explicit `:emlx`/`:exla` honoured; forced `:emlx` degrades to EXLA when unavailable so misconfigured hosts still run), `current_accelerator/0`, and `defn_options/1`; `build_serving` places params on `{EMLX.Backend, device: :gpu}` and compiles with `EMLX` for the EMLX path, keeps `EXLA` otherwise; new `accelerator: :auto` config key; spec `NativeAcceleratedExecution` + `EmbeddingModel` updated; PLT app added; 7 tests added (offline — test config still uses the InApp stub). |
| A1-15 | ~~Preview vs generation content source strategy undocumented~~ | preview.allium (no invariant), generation.allium (no invariant) | Generation uses only published .md file content (`Generation.Data` snapshots set `content: nil`); preview includes published+draft posts and prefers DB content over file (`Preview.Router` queries `:published`/`:draft`, uses `editor_body`) | **Resolved:** added `PreviewDraftOverlay` invariant to preview.allium and `GenerationPublishedOnly` invariant to generation.allium; both cross-reference each other; code already correct, 3 tests added for draft-in-preview behavior |
| A1-16 | Public project content + data_path discovery not compliant with storage-location spec | project.allium `PublicContentLivesInProjectFolder` / `PrivateArtifactsLiveInOsAppDir` / `DataPathNotPersistedInProjectJson` / `DiscoverProjectDataPath` (newly added) | **Private side done:** `Projects.project_cache_root/0` now falls back to the OS private app dir (`:filename.basedir(:user_config, "bds")` → macOS `~/Library/Application Support/bds`) instead of `priv/data`, so the embeddings index no longer lands in the repo. **Still non-compliant (public side):** `project_data_dir/0` (projects.ex:97-99) falls back to `priv/data/projects/<id>` when `data_path` is nil, so the default project's *public* content (posts, media, templates, scripts, `meta/`, generated `html/`) is written into the application repo; there is no discovery of `data_path` from the `meta/project.json` location, and the `default` project is created with `data_path: nil` (projects.ex:80). | Implement project-folder discovery: `data_path` := the folder containing `meta/project.json` (never stored in project.json, keeping projects movable — `DiscoverProjectDataPath`); create the default project's folder at a per-user default content location on first launch (never in repo/private_dir); drop the `priv/data/projects/<id>` fallback in `project_data_dir/0`; persist the current project-folder location as a machine-local pointer (project registry) under `private_dir`. Migrate the committed `priv/data/projects/default/` content out of the repo. |
| A1-16 | ~~Public project content + data_path discovery not compliant with storage-location spec~~ | project.allium `PublicContentLivesInProjectFolder` / `PrivateArtifactsLiveInOsAppDir` / `DataPathNotPersistedInProjectJson` / `DiscoverProjectDataPath` | Public content now lives under a per-user default content location, never the repo | **Resolved:** `project_data_dir/1` drops the `priv/data/projects/<id>` repo fallback — a project without an explicit `data_path` resolves to `default_content_root()/<id>` (configurable via `:default_content_root`, else `~/bds`), never the repo or `private_dir`; the `default` project is now created on first launch with an explicit `data_path` under that location and its folder is `mkdir`'d (`PublicContentLivesInProjectFolder`); added `Projects.private_dir/0`, `default_content_root/0`, and a machine-local project registry (`registry_path/0``project_registry.json` under `private_dir`, written on create/ensure-default, removed on delete) that remembers each project's folder without embedding it in `meta/project.json` (`DataPathNotPersistedInProjectJson`/`DiscoverProjectDataPath` — already satisfied since `project.json` never serializes `data_path`); `delete_project` removes app-managed folders (those under `default_content_root`) but preserves user-chosen external folders; committed `priv/data/projects/default/` content removed from the repo and `/priv/data/projects/` git-ignored; test config redirects `:default_content_root` to a temp dir; 4 tests added (default folder outside repo/private, no-repo fallback, registry round-trip, registry cleanup on delete). |
### A2. Spec Should Update (code is normative)
@@ -188,7 +188,7 @@ All reconciled to follow code. Specs must be self-consistent and match code.
## Priority Order for Resolution
1. ~~**A1-1 through A1-15**~~ — all resolved: auto-save, on-demand preview, template lookup, validation gates, real Pagefind, graceful shutdown, real embedding model, HNSW ANN index, Apple GPU/EMLX acceleration (A1-14c), and preview/generation content strategy (A1-15)
1b. **A1-16** — storage-location compliance: private side done (embeddings index → OS app dir); public side open (data_path discovery from meta/project.json, drop the `priv/data/projects/<id>` fallback, migrate committed default project out of repo)
1b. ~~**A1-16**~~ — storage-location compliance resolved: public content now lives under a per-user default content location (never the repo/private dir), `priv/data/projects/<id>` fallback dropped, machine-local project registry added, committed default project content removed from repo
2. **D1-1 through D1-18** — untested invariants/guarantees
3. **C-1 through C-3** — internal spec inconsistencies (reconcile to code)
4. **B1-1 through B1-6** — major code behaviors missing from spec