fix: added things around project folder pollution from program runs
This commit is contained in:
3
.gitignore
vendored
3
.gitignore
vendored
@@ -10,4 +10,7 @@
|
|||||||
/priv/data/*.db
|
/priv/data/*.db
|
||||||
/priv/data/*.db-shm
|
/priv/data/*.db-shm
|
||||||
/priv/data/*.db-wal
|
/priv/data/*.db-wal
|
||||||
|
# Embeddings index artifacts are per-project runtime caches, never committed.
|
||||||
|
*.usearch
|
||||||
|
*.usearch.meta.json
|
||||||
*.eztmp/
|
*.eztmp/
|
||||||
|
|||||||
@@ -27,6 +27,7 @@ Gap categories: **SC** = spec correct, fix code | **CS** = code correct, update
|
|||||||
| A1-14b | ~~USearch HNSW ANN index + debounced persistence not implemented~~ | embedding.allium config/FindSimilar/DebouncedPersistence | `Embeddings.Index` is now an HNSW (hnswlib) ANN index with debounced persistence | **Resolved:** rewrote `Embeddings.Index` as a DB-free GenServer wrapping an hnswlib HNSW graph (cosine, M=16, efConstruction=128, efSearch=64) — O(n·log n) build, O(log n) queries, replacing the O(n²) JSON cosine snapshot; per-project in-memory index + `label→post_id` map; 5s debounced `save_index` + `.meta.json` sidecar, force-save on project switch (`set_active_project`) and shutdown (`terminate`), `forget/1` on project delete; lazy reload from disk with rebuild-from-DB self-heal on miss; `find_similar`/`find_duplicates`/`compute_similarities` rewired (no brute-force fallback); USearch has no Elixir binding so hnswlib provides the identical HNSW algorithm/params (spec reconciled); supervision + dialyzer PLT updated; tests updated for debounced/binary persistence + self-heal. Follow-up hardening: explicit rebuild now forces re-embedding regardless of content_hash (ReindexAll), and model-unavailable errors propagate cleanly (post saves degrade to unindexed + log; rebuild/index return `{:error, reason}` surfaced as a failed task with a user-facing message instead of crashing). |
|
| A1-14b | ~~USearch HNSW ANN index + debounced persistence not implemented~~ | embedding.allium config/FindSimilar/DebouncedPersistence | `Embeddings.Index` is now an HNSW (hnswlib) ANN index with debounced persistence | **Resolved:** rewrote `Embeddings.Index` as a DB-free GenServer wrapping an hnswlib HNSW graph (cosine, M=16, efConstruction=128, efSearch=64) — O(n·log n) build, O(log n) queries, replacing the O(n²) JSON cosine snapshot; per-project in-memory index + `label→post_id` map; 5s debounced `save_index` + `.meta.json` sidecar, force-save on project switch (`set_active_project`) and shutdown (`terminate`), `forget/1` on project delete; lazy reload from disk with rebuild-from-DB self-heal on miss; `find_similar`/`find_duplicates`/`compute_similarities` rewired (no brute-force fallback); USearch has no Elixir binding so hnswlib provides the identical HNSW algorithm/params (spec reconciled); supervision + dialyzer PLT updated; tests updated for debounced/binary persistence + self-heal. Follow-up hardening: explicit rebuild now forces re-embedding regardless of content_hash (ReindexAll), and model-unavailable errors propagate cleanly (post saves degrade to unindexed + log; rebuild/index return `{:error, reason}` surfaced as a failed task with a user-facing message instead of crashing). |
|
||||||
| A1-14c | ~~Embedding model runs on CPU only; no Apple GPU acceleration~~ | embedding.allium invariant NativeAcceleratedExecution | `Backends.Neural` now selects the defn compiler at serving-build time: Apple GPU via EMLX (MLX/Metal) on arm64 macOS, EXLA-CPU elsewhere | **Resolved:** added `{:emlx, "~> 0.2.0"}` dep (ships precompiled MLX binaries; EMLX 0.2.0 implements both `EMLX.Backend` and the `Nx.Defn.Compiler` behaviour, GPU-default); `Backends.Neural` gained a pure `select_accelerator/3` policy (`:auto` prefers EMLX only when available **and** on Apple Silicon; explicit `:emlx`/`:exla` honoured; forced `:emlx` degrades to EXLA when unavailable so misconfigured hosts still run), `current_accelerator/0`, and `defn_options/1`; `build_serving` places params on `{EMLX.Backend, device: :gpu}` and compiles with `EMLX` for the EMLX path, keeps `EXLA` otherwise; new `accelerator: :auto` config key; spec `NativeAcceleratedExecution` + `EmbeddingModel` updated; PLT app added; 7 tests added (offline — test config still uses the InApp stub). |
|
| A1-14c | ~~Embedding model runs on CPU only; no Apple GPU acceleration~~ | embedding.allium invariant NativeAcceleratedExecution | `Backends.Neural` now selects the defn compiler at serving-build time: Apple GPU via EMLX (MLX/Metal) on arm64 macOS, EXLA-CPU elsewhere | **Resolved:** added `{:emlx, "~> 0.2.0"}` dep (ships precompiled MLX binaries; EMLX 0.2.0 implements both `EMLX.Backend` and the `Nx.Defn.Compiler` behaviour, GPU-default); `Backends.Neural` gained a pure `select_accelerator/3` policy (`:auto` prefers EMLX only when available **and** on Apple Silicon; explicit `:emlx`/`:exla` honoured; forced `:emlx` degrades to EXLA when unavailable so misconfigured hosts still run), `current_accelerator/0`, and `defn_options/1`; `build_serving` places params on `{EMLX.Backend, device: :gpu}` and compiles with `EMLX` for the EMLX path, keeps `EXLA` otherwise; new `accelerator: :auto` config key; spec `NativeAcceleratedExecution` + `EmbeddingModel` updated; PLT app added; 7 tests added (offline — test config still uses the InApp stub). |
|
||||||
| A1-15 | ~~Preview vs generation content source strategy undocumented~~ | preview.allium (no invariant), generation.allium (no invariant) | Generation uses only published .md file content (`Generation.Data` snapshots set `content: nil`); preview includes published+draft posts and prefers DB content over file (`Preview.Router` queries `:published`/`:draft`, uses `editor_body`) | **Resolved:** added `PreviewDraftOverlay` invariant to preview.allium and `GenerationPublishedOnly` invariant to generation.allium; both cross-reference each other; code already correct, 3 tests added for draft-in-preview behavior |
|
| A1-15 | ~~Preview vs generation content source strategy undocumented~~ | preview.allium (no invariant), generation.allium (no invariant) | Generation uses only published .md file content (`Generation.Data` snapshots set `content: nil`); preview includes published+draft posts and prefers DB content over file (`Preview.Router` queries `:published`/`:draft`, uses `editor_body`) | **Resolved:** added `PreviewDraftOverlay` invariant to preview.allium and `GenerationPublishedOnly` invariant to generation.allium; both cross-reference each other; code already correct, 3 tests added for draft-in-preview behavior |
|
||||||
|
| A1-16 | Public project content + data_path discovery not compliant with storage-location spec | project.allium `PublicContentLivesInProjectFolder` / `PrivateArtifactsLiveInOsAppDir` / `DataPathNotPersistedInProjectJson` / `DiscoverProjectDataPath` (newly added) | **Private side done:** `Projects.project_cache_root/0` now falls back to the OS private app dir (`:filename.basedir(:user_config, "bds")` → macOS `~/Library/Application Support/bds`) instead of `priv/data`, so the embeddings index no longer lands in the repo. **Still non-compliant (public side):** `project_data_dir/0` (projects.ex:97-99) falls back to `priv/data/projects/<id>` when `data_path` is nil, so the default project's *public* content (posts, media, templates, scripts, `meta/`, generated `html/`) is written into the application repo; there is no discovery of `data_path` from the `meta/project.json` location, and the `default` project is created with `data_path: nil` (projects.ex:80). | Implement project-folder discovery: `data_path` := the folder containing `meta/project.json` (never stored in project.json, keeping projects movable — `DiscoverProjectDataPath`); create the default project's folder at a per-user default content location on first launch (never in repo/private_dir); drop the `priv/data/projects/<id>` fallback in `project_data_dir/0`; persist the current project-folder location as a machine-local pointer (project registry) under `private_dir`. Migrate the committed `priv/data/projects/default/` content out of the repo. |
|
||||||
|
|
||||||
### A2. Spec Should Update (code is normative)
|
### A2. Spec Should Update (code is normative)
|
||||||
|
|
||||||
@@ -186,7 +187,8 @@ All reconciled to follow code. Specs must be self-consistent and match code.
|
|||||||
|
|
||||||
## Priority Order for Resolution
|
## Priority Order for Resolution
|
||||||
|
|
||||||
1. ~~**A1-1 through A1-14c**~~ — all resolved: auto-save, on-demand preview, template lookup, validation gates, real Pagefind, graceful shutdown, real embedding model, HNSW ANN index, and Apple GPU/EMLX acceleration (A1-14c)
|
1. ~~**A1-1 through A1-15**~~ — all resolved: auto-save, on-demand preview, template lookup, validation gates, real Pagefind, graceful shutdown, real embedding model, HNSW ANN index, Apple GPU/EMLX acceleration (A1-14c), and preview/generation content strategy (A1-15)
|
||||||
|
1b. **A1-16** — storage-location compliance: private side done (embeddings index → OS app dir); public side open (data_path discovery from meta/project.json, drop the `priv/data/projects/<id>` fallback, migrate committed default project out of repo)
|
||||||
2. **D1-1 through D1-18** — untested invariants/guarantees
|
2. **D1-1 through D1-18** — untested invariants/guarantees
|
||||||
3. **C-1 through C-3** — internal spec inconsistencies (reconcile to code)
|
3. **C-1 through C-3** — internal spec inconsistencies (reconcile to code)
|
||||||
4. **B1-1 through B1-6** — major code behaviors missing from spec
|
4. **B1-1 through B1-6** — major code behaviors missing from spec
|
||||||
|
|||||||
@@ -268,20 +268,25 @@ defmodule BDS.Projects do
|
|||||||
not Repo.exists?(from project in Project, where: project.slug == ^slug)
|
not Repo.exists?(from project in Project, where: project.slug == ^slug)
|
||||||
end
|
end
|
||||||
|
|
||||||
defp repo_data_dir do
|
|
||||||
Application.fetch_env!(:bds, BDS.Repo)
|
|
||||||
|> Keyword.fetch!(:database)
|
|
||||||
|> Path.expand()
|
|
||||||
|> Path.dirname()
|
|
||||||
end
|
|
||||||
|
|
||||||
defp project_cache_root do
|
defp project_cache_root do
|
||||||
case Application.get_env(:bds, :project_cache_root) do
|
case Application.get_env(:bds, :project_cache_root) do
|
||||||
root when is_binary(root) -> Path.expand(root)
|
root when is_binary(root) -> Path.expand(root)
|
||||||
_other -> repo_data_dir()
|
# Private app-internal artifacts (e.g. the embeddings index) live under the
|
||||||
|
# OS private app directory — on macOS ~/Library/Application Support/bds —
|
||||||
|
# never inside the repo or a project's public folder. Colocating them with
|
||||||
|
# project_data_dir would pollute (and historically committed to) the repo.
|
||||||
|
_other -> private_app_dir()
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
|
defp private_app_dir do
|
||||||
|
case :filename.basedir(:user_config, "bds") do
|
||||||
|
path when is_list(path) -> List.to_string(path)
|
||||||
|
path -> path
|
||||||
|
end
|
||||||
|
|> Path.expand()
|
||||||
|
end
|
||||||
|
|
||||||
defp attr(attrs, key) do
|
defp attr(attrs, key) do
|
||||||
cond do
|
cond do
|
||||||
Map.has_key?(attrs, key) -> Map.get(attrs, key)
|
Map.has_key?(attrs, key) -> Map.get(attrs, key)
|
||||||
|
|||||||
Binary file not shown.
File diff suppressed because one or more lines are too long
@@ -186,7 +186,7 @@ rule UpsertMediaTranslation {
|
|||||||
rule RebuildMediaFromFiles {
|
rule RebuildMediaFromFiles {
|
||||||
when: RebuildMediaFromFilesRequested(project)
|
when: RebuildMediaFromFilesRequested(project)
|
||||||
-- Scans media directory for .meta sidecars, reimports to DB
|
-- Scans media directory for .meta sidecars, reimports to DB
|
||||||
for sidecar in scan_directory(project.effective_data_dir + "/media", "*.meta"):
|
for sidecar in scan_directory(project.public_dir + "/media", "*.meta"):
|
||||||
let parsed = parse_sidecar(sidecar)
|
let parsed = parse_sidecar(sidecar)
|
||||||
ensures: Media.created(parsed)
|
ensures: Media.created(parsed)
|
||||||
-- or updated if already exists
|
-- or updated if already exists
|
||||||
|
|||||||
@@ -61,7 +61,7 @@ rule RunMetadataDiff {
|
|||||||
)
|
)
|
||||||
|
|
||||||
-- Detect orphan files (on disk but not in DB)
|
-- Detect orphan files (on disk but not in DB)
|
||||||
for file in scan_directory(project.effective_data_dir + "/posts", "*.md"):
|
for file in scan_directory(project.public_dir + "/posts", "*.md"):
|
||||||
let matching = Posts where file_path = file
|
let matching = Posts where file_path = file
|
||||||
if matching.count = 0:
|
if matching.count = 0:
|
||||||
ensures: OrphanReport.created(file_path: file)
|
ensures: OrphanReport.created(file_path: file)
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ surface ProjectControlSurface {
|
|||||||
|
|
||||||
provides:
|
provides:
|
||||||
CreateProjectRequested(name, data_path)
|
CreateProjectRequested(name, data_path)
|
||||||
|
OpenProjectRequested(folder_path)
|
||||||
SetActiveProjectRequested(project)
|
SetActiveProjectRequested(project)
|
||||||
DeleteProjectRequested(project)
|
DeleteProjectRequested(project)
|
||||||
}
|
}
|
||||||
@@ -27,11 +28,33 @@ entity Project {
|
|||||||
tags: Tag with project = this
|
tags: Tag with project = this
|
||||||
|
|
||||||
-- Derived
|
-- Derived
|
||||||
internal_base_dir: String
|
--
|
||||||
-- {user_data}/projects/{id}/
|
-- data_path is the project folder: the directory that CONTAINS
|
||||||
-- Contains: meta/, thumbnails/, tags.json
|
-- meta/project.json. It is DISCOVERED from the project.json location at
|
||||||
effective_data_dir: data_path ?? internal_base_dir
|
-- load time and is never written into project.json itself — so the folder
|
||||||
-- Custom data path overrides default
|
-- can be moved/renamed freely. The current location is remembered only as a
|
||||||
|
-- machine-local pointer (a project registry under private_dir), never
|
||||||
|
-- embedded in the portable project. See DataPathNotPersistedInProjectJson.
|
||||||
|
public_dir: data_path
|
||||||
|
-- All user-owned, portable, webserver-bound content lives here, under
|
||||||
|
-- the project folder:
|
||||||
|
-- posts (.md), media + thumbnails, templates/, scripts/,
|
||||||
|
-- meta/ (project.json, categories.json, category-meta.json,
|
||||||
|
-- publishing.json), tags.json, menu.opml, and generated html/ output.
|
||||||
|
-- See PublicContentLivesInProjectFolder.
|
||||||
|
private_dir: String
|
||||||
|
-- The OS per-user app-data directory. Holds app-internal,
|
||||||
|
-- machine-specific, regenerable artifacts ONLY — never inside the repo
|
||||||
|
-- or the project folder:
|
||||||
|
-- the SQLite database, the per-project embeddings index
|
||||||
|
-- (projects/{id}/embeddings.usearch + .meta.json sidecar), the
|
||||||
|
-- downloaded embedding-model cache, the project registry, and
|
||||||
|
-- window/UI state.
|
||||||
|
-- OS-specific base (resolved via :filename.basedir, app name "bds"):
|
||||||
|
-- macOS: ~/Library/Application Support/bds (:user_config)
|
||||||
|
-- Linux: $XDG_CONFIG_HOME/bds (default ~/.config/bds)
|
||||||
|
-- Windows: %APPDATA%\\bds
|
||||||
|
-- See PrivateArtifactsLiveInOsAppDir.
|
||||||
}
|
}
|
||||||
|
|
||||||
surface ProjectSurface {
|
surface ProjectSurface {
|
||||||
@@ -48,8 +71,8 @@ surface ProjectSurface {
|
|||||||
project.posts.count
|
project.posts.count
|
||||||
project.media.count
|
project.media.count
|
||||||
project.tags.count
|
project.tags.count
|
||||||
project.internal_base_dir
|
project.public_dir
|
||||||
project.effective_data_dir
|
project.private_dir
|
||||||
}
|
}
|
||||||
|
|
||||||
invariant SingleActiveProject {
|
invariant SingleActiveProject {
|
||||||
@@ -73,11 +96,52 @@ rule CreateProject {
|
|||||||
data_path: data_path,
|
data_path: data_path,
|
||||||
is_active: false
|
is_active: false
|
||||||
)
|
)
|
||||||
|
@guidance
|
||||||
|
-- data_path is the chosen project folder. CreateProject writes
|
||||||
|
-- meta/project.json into {data_path}/meta/ but never records data_path
|
||||||
|
-- inside it (DataPathNotPersistedInProjectJson). The default project's
|
||||||
|
-- folder is created at a per-user default content location on first
|
||||||
|
-- launch — never inside the application repo or private_dir.
|
||||||
|
}
|
||||||
|
|
||||||
|
rule DiscoverProjectDataPath {
|
||||||
|
-- Opening or loading a project folder deduces its data_path from the
|
||||||
|
-- on-disk location of meta/project.json rather than from any stored value,
|
||||||
|
-- keeping projects movable (DataPathNotPersistedInProjectJson).
|
||||||
|
when: OpenProjectRequested(folder_path)
|
||||||
|
-- folder_path contains meta/project.json
|
||||||
|
let project = Projects where data_path = folder_path
|
||||||
|
ensures: project.data_path = folder_path
|
||||||
}
|
}
|
||||||
|
|
||||||
invariant ProjectTemplatesDirectoryReservedForUserTemplates {
|
invariant ProjectTemplatesDirectoryReservedForUserTemplates {
|
||||||
-- The project templates directory stores only user-managed templates.
|
-- The project templates directory stores only user-managed templates.
|
||||||
-- Creating a project does not populate effective_data_dir/templates with bundled defaults.
|
-- Creating a project does not populate public_dir/templates with bundled defaults.
|
||||||
|
}
|
||||||
|
|
||||||
|
invariant PublicContentLivesInProjectFolder {
|
||||||
|
-- Every file the user owns or that must be served by the webserver lives
|
||||||
|
-- under public_dir (= the project folder containing meta/project.json):
|
||||||
|
-- posts, media + thumbnails, templates, scripts, the meta/ JSON files,
|
||||||
|
-- tags.json, menu.opml, and the generated html/ output. None of this
|
||||||
|
-- content is ever written to private_dir or into the application repo.
|
||||||
|
}
|
||||||
|
|
||||||
|
invariant PrivateArtifactsLiveInOsAppDir {
|
||||||
|
-- App-internal, machine-specific, regenerable artifacts — the SQLite
|
||||||
|
-- database, the per-project embeddings index and its sidecar, the
|
||||||
|
-- model cache, the project registry, and window/UI state — live ONLY
|
||||||
|
-- under private_dir (the OS per-user app-data directory). They are never
|
||||||
|
-- written into a project folder (public_dir) or into the application repo.
|
||||||
|
-- A regenerable artifact (e.g. the embeddings index) may be rebuilt from
|
||||||
|
-- the database when absent.
|
||||||
|
}
|
||||||
|
|
||||||
|
invariant DataPathNotPersistedInProjectJson {
|
||||||
|
-- meta/project.json never stores its own path or data_path. A project's
|
||||||
|
-- location is determined solely by where its meta/project.json file sits,
|
||||||
|
-- so the folder can be moved or renamed without editing any file. The app
|
||||||
|
-- remembers the current location as a machine-local pointer in private_dir.
|
||||||
}
|
}
|
||||||
|
|
||||||
rule SetActiveProject {
|
rule SetActiveProject {
|
||||||
|
|||||||
@@ -268,7 +268,7 @@ rule ExecuteTransform {
|
|||||||
|
|
||||||
rule RebuildScriptsFromFiles {
|
rule RebuildScriptsFromFiles {
|
||||||
when: RebuildScriptsFromFilesRequested(project)
|
when: RebuildScriptsFromFilesRequested(project)
|
||||||
for file in scan_directory(project.effective_data_dir + "/scripts", "*." + config.script_extension):
|
for file in scan_directory(project.public_dir + "/scripts", "*." + config.script_extension):
|
||||||
let parsed = parse_script_file(file)
|
let parsed = parse_script_file(file)
|
||||||
ensures: Script.created(parsed)
|
ensures: Script.created(parsed)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -81,7 +81,7 @@ invariant UserTemplateDirectoryOverridesBundledDefaults {
|
|||||||
}
|
}
|
||||||
|
|
||||||
invariant RebuildTemplatesIndexesOnlyProjectTemplates {
|
invariant RebuildTemplatesIndexesOnlyProjectTemplates {
|
||||||
-- Rebuild-from-files scans only project.effective_data_dir/templates.
|
-- Rebuild-from-files scans only project.public_dir/templates.
|
||||||
-- Bundled defaults are render-time fallbacks and are not indexed into Templates
|
-- Bundled defaults are render-time fallbacks and are not indexed into Templates
|
||||||
-- unless the user has created matching project files.
|
-- unless the user has created matching project files.
|
||||||
}
|
}
|
||||||
@@ -171,7 +171,7 @@ rule CascadeSlugUpdate {
|
|||||||
|
|
||||||
rule RebuildTemplatesFromFiles {
|
rule RebuildTemplatesFromFiles {
|
||||||
when: RebuildTemplatesFromFilesRequested(project)
|
when: RebuildTemplatesFromFilesRequested(project)
|
||||||
for file in scan_directory(project.effective_data_dir + "/templates", "*.liquid"):
|
for file in scan_directory(project.public_dir + "/templates", "*.liquid"):
|
||||||
let parsed = parse_template_file(file)
|
let parsed = parse_template_file(file)
|
||||||
ensures: Template.created(parsed)
|
ensures: Template.created(parsed)
|
||||||
-- or updated if slug already exists
|
-- or updated if slug already exists
|
||||||
|
|||||||
@@ -117,6 +117,32 @@ defmodule BDS.ProjectsTest do
|
|||||||
assert Enum.count(BDS.Projects.list_projects(), & &1.is_active) == 1
|
assert Enum.count(BDS.Projects.list_projects(), & &1.is_active) == 1
|
||||||
end
|
end
|
||||||
|
|
||||||
|
test "project_cache_dir never falls back into the project data directory" do
|
||||||
|
# Private app-internal artifacts (the embeddings index) must live under the
|
||||||
|
# OS private app directory (macOS: ~/Library/Application Support/bds), never
|
||||||
|
# inside priv/data/projects/<id> — leaving them in the project tree pollutes
|
||||||
|
# the repository.
|
||||||
|
saved = Application.get_env(:bds, :project_cache_root)
|
||||||
|
Application.delete_env(:bds, :project_cache_root)
|
||||||
|
on_exit(fn -> Application.put_env(:bds, :project_cache_root, saved) end)
|
||||||
|
|
||||||
|
project_id = "fallback-#{System.unique_integer([:positive])}"
|
||||||
|
cache_dir = BDS.Projects.project_cache_dir(project_id)
|
||||||
|
|
||||||
|
data_dir = Path.expand("../../priv/data/projects/#{project_id}", __DIR__)
|
||||||
|
refute cache_dir == data_dir
|
||||||
|
refute String.starts_with?(cache_dir, Path.expand("../../priv/data", __DIR__))
|
||||||
|
|
||||||
|
private_app_dir =
|
||||||
|
case :filename.basedir(:user_config, "bds") do
|
||||||
|
path when is_list(path) -> List.to_string(path)
|
||||||
|
path -> path
|
||||||
|
end
|
||||||
|
|> Path.expand()
|
||||||
|
|
||||||
|
assert String.starts_with?(cache_dir, private_app_dir)
|
||||||
|
end
|
||||||
|
|
||||||
test "ensure_default_project creates the default project once and keeps it active" do
|
test "ensure_default_project creates the default project once and keeps it active" do
|
||||||
Repo.delete_all(Project)
|
Repo.delete_all(Project)
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user