fix: added things around project folder pollution from program runs

This commit is contained in:
2026-05-29 21:45:15 +02:00
parent 3a77761f96
commit 9d5764b251
11 changed files with 122 additions and 23 deletions

3
.gitignore vendored
View File

@@ -10,4 +10,7 @@
/priv/data/*.db /priv/data/*.db
/priv/data/*.db-shm /priv/data/*.db-shm
/priv/data/*.db-wal /priv/data/*.db-wal
# Embeddings index artifacts are per-project runtime caches, never committed.
*.usearch
*.usearch.meta.json
*.eztmp/ *.eztmp/

View File

@@ -27,6 +27,7 @@ Gap categories: **SC** = spec correct, fix code | **CS** = code correct, update
| A1-14b | ~~USearch HNSW ANN index + debounced persistence not implemented~~ | embedding.allium config/FindSimilar/DebouncedPersistence | `Embeddings.Index` is now an HNSW (hnswlib) ANN index with debounced persistence | **Resolved:** rewrote `Embeddings.Index` as a DB-free GenServer wrapping an hnswlib HNSW graph (cosine, M=16, efConstruction=128, efSearch=64) — O(n·log n) build, O(log n) queries, replacing the O(n²) JSON cosine snapshot; per-project in-memory index + `label→post_id` map; 5s debounced `save_index` + `.meta.json` sidecar, force-save on project switch (`set_active_project`) and shutdown (`terminate`), `forget/1` on project delete; lazy reload from disk with rebuild-from-DB self-heal on miss; `find_similar`/`find_duplicates`/`compute_similarities` rewired (no brute-force fallback); USearch has no Elixir binding so hnswlib provides the identical HNSW algorithm/params (spec reconciled); supervision + dialyzer PLT updated; tests updated for debounced/binary persistence + self-heal. Follow-up hardening: explicit rebuild now forces re-embedding regardless of content_hash (ReindexAll), and model-unavailable errors propagate cleanly (post saves degrade to unindexed + log; rebuild/index return `{:error, reason}` surfaced as a failed task with a user-facing message instead of crashing). | | A1-14b | ~~USearch HNSW ANN index + debounced persistence not implemented~~ | embedding.allium config/FindSimilar/DebouncedPersistence | `Embeddings.Index` is now an HNSW (hnswlib) ANN index with debounced persistence | **Resolved:** rewrote `Embeddings.Index` as a DB-free GenServer wrapping an hnswlib HNSW graph (cosine, M=16, efConstruction=128, efSearch=64) — O(n·log n) build, O(log n) queries, replacing the O(n²) JSON cosine snapshot; per-project in-memory index + `label→post_id` map; 5s debounced `save_index` + `.meta.json` sidecar, force-save on project switch (`set_active_project`) and shutdown (`terminate`), `forget/1` on project delete; lazy reload from disk with rebuild-from-DB self-heal on miss; `find_similar`/`find_duplicates`/`compute_similarities` rewired (no brute-force fallback); USearch has no Elixir binding so hnswlib provides the identical HNSW algorithm/params (spec reconciled); supervision + dialyzer PLT updated; tests updated for debounced/binary persistence + self-heal. Follow-up hardening: explicit rebuild now forces re-embedding regardless of content_hash (ReindexAll), and model-unavailable errors propagate cleanly (post saves degrade to unindexed + log; rebuild/index return `{:error, reason}` surfaced as a failed task with a user-facing message instead of crashing). |
| A1-14c | ~~Embedding model runs on CPU only; no Apple GPU acceleration~~ | embedding.allium invariant NativeAcceleratedExecution | `Backends.Neural` now selects the defn compiler at serving-build time: Apple GPU via EMLX (MLX/Metal) on arm64 macOS, EXLA-CPU elsewhere | **Resolved:** added `{:emlx, "~> 0.2.0"}` dep (ships precompiled MLX binaries; EMLX 0.2.0 implements both `EMLX.Backend` and the `Nx.Defn.Compiler` behaviour, GPU-default); `Backends.Neural` gained a pure `select_accelerator/3` policy (`:auto` prefers EMLX only when available **and** on Apple Silicon; explicit `:emlx`/`:exla` honoured; forced `:emlx` degrades to EXLA when unavailable so misconfigured hosts still run), `current_accelerator/0`, and `defn_options/1`; `build_serving` places params on `{EMLX.Backend, device: :gpu}` and compiles with `EMLX` for the EMLX path, keeps `EXLA` otherwise; new `accelerator: :auto` config key; spec `NativeAcceleratedExecution` + `EmbeddingModel` updated; PLT app added; 7 tests added (offline — test config still uses the InApp stub). | | A1-14c | ~~Embedding model runs on CPU only; no Apple GPU acceleration~~ | embedding.allium invariant NativeAcceleratedExecution | `Backends.Neural` now selects the defn compiler at serving-build time: Apple GPU via EMLX (MLX/Metal) on arm64 macOS, EXLA-CPU elsewhere | **Resolved:** added `{:emlx, "~> 0.2.0"}` dep (ships precompiled MLX binaries; EMLX 0.2.0 implements both `EMLX.Backend` and the `Nx.Defn.Compiler` behaviour, GPU-default); `Backends.Neural` gained a pure `select_accelerator/3` policy (`:auto` prefers EMLX only when available **and** on Apple Silicon; explicit `:emlx`/`:exla` honoured; forced `:emlx` degrades to EXLA when unavailable so misconfigured hosts still run), `current_accelerator/0`, and `defn_options/1`; `build_serving` places params on `{EMLX.Backend, device: :gpu}` and compiles with `EMLX` for the EMLX path, keeps `EXLA` otherwise; new `accelerator: :auto` config key; spec `NativeAcceleratedExecution` + `EmbeddingModel` updated; PLT app added; 7 tests added (offline — test config still uses the InApp stub). |
| A1-15 | ~~Preview vs generation content source strategy undocumented~~ | preview.allium (no invariant), generation.allium (no invariant) | Generation uses only published .md file content (`Generation.Data` snapshots set `content: nil`); preview includes published+draft posts and prefers DB content over file (`Preview.Router` queries `:published`/`:draft`, uses `editor_body`) | **Resolved:** added `PreviewDraftOverlay` invariant to preview.allium and `GenerationPublishedOnly` invariant to generation.allium; both cross-reference each other; code already correct, 3 tests added for draft-in-preview behavior | | A1-15 | ~~Preview vs generation content source strategy undocumented~~ | preview.allium (no invariant), generation.allium (no invariant) | Generation uses only published .md file content (`Generation.Data` snapshots set `content: nil`); preview includes published+draft posts and prefers DB content over file (`Preview.Router` queries `:published`/`:draft`, uses `editor_body`) | **Resolved:** added `PreviewDraftOverlay` invariant to preview.allium and `GenerationPublishedOnly` invariant to generation.allium; both cross-reference each other; code already correct, 3 tests added for draft-in-preview behavior |
| A1-16 | Public project content + data_path discovery not compliant with storage-location spec | project.allium `PublicContentLivesInProjectFolder` / `PrivateArtifactsLiveInOsAppDir` / `DataPathNotPersistedInProjectJson` / `DiscoverProjectDataPath` (newly added) | **Private side done:** `Projects.project_cache_root/0` now falls back to the OS private app dir (`:filename.basedir(:user_config, "bds")` → macOS `~/Library/Application Support/bds`) instead of `priv/data`, so the embeddings index no longer lands in the repo. **Still non-compliant (public side):** `project_data_dir/0` (projects.ex:97-99) falls back to `priv/data/projects/<id>` when `data_path` is nil, so the default project's *public* content (posts, media, templates, scripts, `meta/`, generated `html/`) is written into the application repo; there is no discovery of `data_path` from the `meta/project.json` location, and the `default` project is created with `data_path: nil` (projects.ex:80). | Implement project-folder discovery: `data_path` := the folder containing `meta/project.json` (never stored in project.json, keeping projects movable — `DiscoverProjectDataPath`); create the default project's folder at a per-user default content location on first launch (never in repo/private_dir); drop the `priv/data/projects/<id>` fallback in `project_data_dir/0`; persist the current project-folder location as a machine-local pointer (project registry) under `private_dir`. Migrate the committed `priv/data/projects/default/` content out of the repo. |
### A2. Spec Should Update (code is normative) ### A2. Spec Should Update (code is normative)
@@ -186,7 +187,8 @@ All reconciled to follow code. Specs must be self-consistent and match code.
## Priority Order for Resolution ## Priority Order for Resolution
1. ~~**A1-1 through A1-14c**~~ — all resolved: auto-save, on-demand preview, template lookup, validation gates, real Pagefind, graceful shutdown, real embedding model, HNSW ANN index, and Apple GPU/EMLX acceleration (A1-14c) 1. ~~**A1-1 through A1-15**~~ — all resolved: auto-save, on-demand preview, template lookup, validation gates, real Pagefind, graceful shutdown, real embedding model, HNSW ANN index, Apple GPU/EMLX acceleration (A1-14c), and preview/generation content strategy (A1-15)
1b. **A1-16** — storage-location compliance: private side done (embeddings index → OS app dir); public side open (data_path discovery from meta/project.json, drop the `priv/data/projects/<id>` fallback, migrate committed default project out of repo)
2. **D1-1 through D1-18** — untested invariants/guarantees 2. **D1-1 through D1-18** — untested invariants/guarantees
3. **C-1 through C-3** — internal spec inconsistencies (reconcile to code) 3. **C-1 through C-3** — internal spec inconsistencies (reconcile to code)
4. **B1-1 through B1-6** — major code behaviors missing from spec 4. **B1-1 through B1-6** — major code behaviors missing from spec

View File

@@ -268,20 +268,25 @@ defmodule BDS.Projects do
not Repo.exists?(from project in Project, where: project.slug == ^slug) not Repo.exists?(from project in Project, where: project.slug == ^slug)
end end
defp repo_data_dir do
Application.fetch_env!(:bds, BDS.Repo)
|> Keyword.fetch!(:database)
|> Path.expand()
|> Path.dirname()
end
defp project_cache_root do defp project_cache_root do
case Application.get_env(:bds, :project_cache_root) do case Application.get_env(:bds, :project_cache_root) do
root when is_binary(root) -> Path.expand(root) root when is_binary(root) -> Path.expand(root)
_other -> repo_data_dir() # Private app-internal artifacts (e.g. the embeddings index) live under the
# OS private app directory — on macOS ~/Library/Application Support/bds —
# never inside the repo or a project's public folder. Colocating them with
# project_data_dir would pollute (and historically committed to) the repo.
_other -> private_app_dir()
end end
end end
defp private_app_dir do
case :filename.basedir(:user_config, "bds") do
path when is_list(path) -> List.to_string(path)
path -> path
end
|> Path.expand()
end
defp attr(attrs, key) do defp attr(attrs, key) do
cond do cond do
Map.has_key?(attrs, key) -> Map.get(attrs, key) Map.has_key?(attrs, key) -> Map.get(attrs, key)

File diff suppressed because one or more lines are too long

View File

@@ -186,7 +186,7 @@ rule UpsertMediaTranslation {
rule RebuildMediaFromFiles { rule RebuildMediaFromFiles {
when: RebuildMediaFromFilesRequested(project) when: RebuildMediaFromFilesRequested(project)
-- Scans media directory for .meta sidecars, reimports to DB -- Scans media directory for .meta sidecars, reimports to DB
for sidecar in scan_directory(project.effective_data_dir + "/media", "*.meta"): for sidecar in scan_directory(project.public_dir + "/media", "*.meta"):
let parsed = parse_sidecar(sidecar) let parsed = parse_sidecar(sidecar)
ensures: Media.created(parsed) ensures: Media.created(parsed)
-- or updated if already exists -- or updated if already exists

View File

@@ -61,7 +61,7 @@ rule RunMetadataDiff {
) )
-- Detect orphan files (on disk but not in DB) -- Detect orphan files (on disk but not in DB)
for file in scan_directory(project.effective_data_dir + "/posts", "*.md"): for file in scan_directory(project.public_dir + "/posts", "*.md"):
let matching = Posts where file_path = file let matching = Posts where file_path = file
if matching.count = 0: if matching.count = 0:
ensures: OrphanReport.created(file_path: file) ensures: OrphanReport.created(file_path: file)

View File

@@ -8,6 +8,7 @@ surface ProjectControlSurface {
provides: provides:
CreateProjectRequested(name, data_path) CreateProjectRequested(name, data_path)
OpenProjectRequested(folder_path)
SetActiveProjectRequested(project) SetActiveProjectRequested(project)
DeleteProjectRequested(project) DeleteProjectRequested(project)
} }
@@ -27,11 +28,33 @@ entity Project {
tags: Tag with project = this tags: Tag with project = this
-- Derived -- Derived
internal_base_dir: String --
-- {user_data}/projects/{id}/ -- data_path is the project folder: the directory that CONTAINS
-- Contains: meta/, thumbnails/, tags.json -- meta/project.json. It is DISCOVERED from the project.json location at
effective_data_dir: data_path ?? internal_base_dir -- load time and is never written into project.json itself — so the folder
-- Custom data path overrides default -- can be moved/renamed freely. The current location is remembered only as a
-- machine-local pointer (a project registry under private_dir), never
-- embedded in the portable project. See DataPathNotPersistedInProjectJson.
public_dir: data_path
-- All user-owned, portable, webserver-bound content lives here, under
-- the project folder:
-- posts (.md), media + thumbnails, templates/, scripts/,
-- meta/ (project.json, categories.json, category-meta.json,
-- publishing.json), tags.json, menu.opml, and generated html/ output.
-- See PublicContentLivesInProjectFolder.
private_dir: String
-- The OS per-user app-data directory. Holds app-internal,
-- machine-specific, regenerable artifacts ONLY — never inside the repo
-- or the project folder:
-- the SQLite database, the per-project embeddings index
-- (projects/{id}/embeddings.usearch + .meta.json sidecar), the
-- downloaded embedding-model cache, the project registry, and
-- window/UI state.
-- OS-specific base (resolved via :filename.basedir, app name "bds"):
-- macOS: ~/Library/Application Support/bds (:user_config)
-- Linux: $XDG_CONFIG_HOME/bds (default ~/.config/bds)
-- Windows: %APPDATA%\\bds
-- See PrivateArtifactsLiveInOsAppDir.
} }
surface ProjectSurface { surface ProjectSurface {
@@ -48,8 +71,8 @@ surface ProjectSurface {
project.posts.count project.posts.count
project.media.count project.media.count
project.tags.count project.tags.count
project.internal_base_dir project.public_dir
project.effective_data_dir project.private_dir
} }
invariant SingleActiveProject { invariant SingleActiveProject {
@@ -73,11 +96,52 @@ rule CreateProject {
data_path: data_path, data_path: data_path,
is_active: false is_active: false
) )
@guidance
-- data_path is the chosen project folder. CreateProject writes
-- meta/project.json into {data_path}/meta/ but never records data_path
-- inside it (DataPathNotPersistedInProjectJson). The default project's
-- folder is created at a per-user default content location on first
-- launch — never inside the application repo or private_dir.
}
rule DiscoverProjectDataPath {
-- Opening or loading a project folder deduces its data_path from the
-- on-disk location of meta/project.json rather than from any stored value,
-- keeping projects movable (DataPathNotPersistedInProjectJson).
when: OpenProjectRequested(folder_path)
-- folder_path contains meta/project.json
let project = Projects where data_path = folder_path
ensures: project.data_path = folder_path
} }
invariant ProjectTemplatesDirectoryReservedForUserTemplates { invariant ProjectTemplatesDirectoryReservedForUserTemplates {
-- The project templates directory stores only user-managed templates. -- The project templates directory stores only user-managed templates.
-- Creating a project does not populate effective_data_dir/templates with bundled defaults. -- Creating a project does not populate public_dir/templates with bundled defaults.
}
invariant PublicContentLivesInProjectFolder {
-- Every file the user owns or that must be served by the webserver lives
-- under public_dir (= the project folder containing meta/project.json):
-- posts, media + thumbnails, templates, scripts, the meta/ JSON files,
-- tags.json, menu.opml, and the generated html/ output. None of this
-- content is ever written to private_dir or into the application repo.
}
invariant PrivateArtifactsLiveInOsAppDir {
-- App-internal, machine-specific, regenerable artifacts — the SQLite
-- database, the per-project embeddings index and its sidecar, the
-- model cache, the project registry, and window/UI state — live ONLY
-- under private_dir (the OS per-user app-data directory). They are never
-- written into a project folder (public_dir) or into the application repo.
-- A regenerable artifact (e.g. the embeddings index) may be rebuilt from
-- the database when absent.
}
invariant DataPathNotPersistedInProjectJson {
-- meta/project.json never stores its own path or data_path. A project's
-- location is determined solely by where its meta/project.json file sits,
-- so the folder can be moved or renamed without editing any file. The app
-- remembers the current location as a machine-local pointer in private_dir.
} }
rule SetActiveProject { rule SetActiveProject {

View File

@@ -268,7 +268,7 @@ rule ExecuteTransform {
rule RebuildScriptsFromFiles { rule RebuildScriptsFromFiles {
when: RebuildScriptsFromFilesRequested(project) when: RebuildScriptsFromFilesRequested(project)
for file in scan_directory(project.effective_data_dir + "/scripts", "*." + config.script_extension): for file in scan_directory(project.public_dir + "/scripts", "*." + config.script_extension):
let parsed = parse_script_file(file) let parsed = parse_script_file(file)
ensures: Script.created(parsed) ensures: Script.created(parsed)
} }

View File

@@ -81,7 +81,7 @@ invariant UserTemplateDirectoryOverridesBundledDefaults {
} }
invariant RebuildTemplatesIndexesOnlyProjectTemplates { invariant RebuildTemplatesIndexesOnlyProjectTemplates {
-- Rebuild-from-files scans only project.effective_data_dir/templates. -- Rebuild-from-files scans only project.public_dir/templates.
-- Bundled defaults are render-time fallbacks and are not indexed into Templates -- Bundled defaults are render-time fallbacks and are not indexed into Templates
-- unless the user has created matching project files. -- unless the user has created matching project files.
} }
@@ -171,7 +171,7 @@ rule CascadeSlugUpdate {
rule RebuildTemplatesFromFiles { rule RebuildTemplatesFromFiles {
when: RebuildTemplatesFromFilesRequested(project) when: RebuildTemplatesFromFilesRequested(project)
for file in scan_directory(project.effective_data_dir + "/templates", "*.liquid"): for file in scan_directory(project.public_dir + "/templates", "*.liquid"):
let parsed = parse_template_file(file) let parsed = parse_template_file(file)
ensures: Template.created(parsed) ensures: Template.created(parsed)
-- or updated if slug already exists -- or updated if slug already exists

View File

@@ -117,6 +117,32 @@ defmodule BDS.ProjectsTest do
assert Enum.count(BDS.Projects.list_projects(), & &1.is_active) == 1 assert Enum.count(BDS.Projects.list_projects(), & &1.is_active) == 1
end end
test "project_cache_dir never falls back into the project data directory" do
# Private app-internal artifacts (the embeddings index) must live under the
# OS private app directory (macOS: ~/Library/Application Support/bds), never
# inside priv/data/projects/<id> — leaving them in the project tree pollutes
# the repository.
saved = Application.get_env(:bds, :project_cache_root)
Application.delete_env(:bds, :project_cache_root)
on_exit(fn -> Application.put_env(:bds, :project_cache_root, saved) end)
project_id = "fallback-#{System.unique_integer([:positive])}"
cache_dir = BDS.Projects.project_cache_dir(project_id)
data_dir = Path.expand("../../priv/data/projects/#{project_id}", __DIR__)
refute cache_dir == data_dir
refute String.starts_with?(cache_dir, Path.expand("../../priv/data", __DIR__))
private_app_dir =
case :filename.basedir(:user_config, "bds") do
path when is_list(path) -> List.to_string(path)
path -> path
end
|> Path.expand()
assert String.starts_with?(cache_dir, private_app_dir)
end
test "ensure_default_project creates the default project once and keeps it active" do test "ensure_default_project creates the default project once and keeps it active" do
Repo.delete_all(Project) Repo.delete_all(Project)