diff --git a/SPECGAPS.md b/SPECGAPS.md
index 6765c75..d6eb9f6 100644
--- a/SPECGAPS.md
+++ b/SPECGAPS.md
@@ -23,8 +23,9 @@ Gap categories: **SC** = spec correct, fix code | **CS** = code correct, update
| A1-11 | ~~Graceful shutdown with inflight request tracking~~ | preview.allium:47-48 | `stop_preview` now closes the listener, parks the reply, and drains monitored inflight request tasks before reporting stopped | **Resolved:** acceptor transfers socket ownership to each request task; GenServer monitors inflight tasks, `begin_graceful_stop` stops accepting and finalizes via `:DOWN`/`:drain_timeout` (5s force-kill cap), 1 test added |
| A1-12 | ~~Real Pagefind integration for search~~ | generation.allium:208 | Functional client-side search: `PagefindUI` defined in bundled `pagefind-ui.js`, fragment index records url/title/body-scoped text per page, search-runtime wires it up | **Resolved:** bundled real `PagefindUI` (fetch index, ranked full-text match, highlighted excerpts) + `pagefind-ui.css` as local assets read into `Pagefind`; index scoped to `data-pagefind-body` (unmarked pages excluded per PagefindHtmlMarking), title from `
`/``; localized "No results found" label via `data-search-no-results` (de/fr/it/es); 3 unit tests added |
| A1-13 | ~~Git sidebar shows only "Working tree" placeholder~~ | sidebar_views.allium:651-770 | `git_view/1` now builds a full `layout: "git"` view from `BDS.Git` (repository/remote_state/status/history); `SidebarComponents` renders active + not_a_repo states | **Resolved:** `git_view/1` in sidebar.ex assembles branch/upstream/ahead/behind, status files, paginated history (20/page); `render_git_sidebar` renders branch header, sync legend, fetch/pull/push/prune-lfs buttons, commit form, clickable status files (open git_diff), history entries; shell_live wires `git_commit` (closes git_diff tabs), `git_fetch`/`git_pull`/`git_push`/`git_prune_lfs`, `git_initialize`; `BDS.Git.history` enriched with author/date, `BDS.Git.set_remote/2` added; i18n for de/fr/it/es; 3 shell tests + git author/date assertions added |
-| A1-14 | ~~Embedding uses TF-IDF hash projection instead of real neural model~~ | embedding.allium:44-53, invariants RealNeuralModel/ModelCaching/VectorCacheInDb | `Backends.Neural` runs `intfloat/multilingual-e5-small` (e5 weights behind the Xenova id) via Bumblebee+EXLA | **Resolved (core):** added bumblebee/nx/exla deps; `Backends.Neural` is a lazily-loaded GenServer that builds the Bumblebee text-embedding serving on first request (`"query: "` prefix + mean pooling + L2 norm), downloads+caches the model under the app data dir (ModelCaching), and is wired into the supervision tree when configured; vectors now persisted as packed little-endian Float32 BLOB (384×4=1536 bytes) instead of JSON text (VectorCacheInDb) with migration recreating `embedding_keys.vector` as BLOB; `InApp` demoted to documented offline/test stub; test config uses the stub so the suite stays offline; spec EmbeddingModel clarified (Xenova id ↔ intfloat weights via Bumblebee); 3 tests added (BLOB round-trip + Neural model_info/behaviour). **Deferred to A1-14b:** USearch HNSW index. |
+| A1-14 | ~~Embedding uses TF-IDF hash projection instead of real neural model~~ | embedding.allium:44-53, invariants RealNeuralModel/ModelCaching/VectorCacheInDb | `Backends.Neural` runs `intfloat/multilingual-e5-small` (e5 weights behind the Xenova id) via Bumblebee+EXLA | **Resolved (core):** added bumblebee/nx/exla deps; `Backends.Neural` is a lazily-loaded GenServer that builds the Bumblebee text-embedding serving on first request (`"query: "` prefix + mean pooling + L2 norm), downloads+caches the model under the app data dir (ModelCaching), and is wired into the supervision tree when configured; vectors now persisted as packed little-endian Float32 BLOB (384×4=1536 bytes) instead of JSON text (VectorCacheInDb) with migration recreating `embedding_keys.vector` as BLOB; `InApp` demoted to documented offline/test stub; test config uses the stub so the suite stays offline; spec EmbeddingModel clarified (Xenova id ↔ intfloat weights via Bumblebee); batched inference via optional `embed_many/2` backend callback (configurable `batch_size`/`sequence_length`; rebuild/index/repair embed in chunks instead of one post at a time) + `NativeAcceleratedExecution` invariant added to spec; 4 tests added (BLOB round-trip, batched-rebuild, Neural model_info/behaviour). **Deferred:** A1-14b USearch HNSW index, A1-14c Apple GPU (EMLX). |
| A1-14b | USearch HNSW ANN index + debounced persistence not implemented | embedding.allium:75-87 (config), FindSimilar, invariant DebouncedPersistence | Neighbor lookup still uses the JSON cosine snapshot (`Embeddings.Index`), not a USearch HNSW index; no 5s debounced index persistence (snapshot rebuilt synchronously) | Fix code: replace JSON snapshot with USearch HNSW index file (`embeddings.usearch`, cosine, M=16, efConstruction=128, efSearch=64), label→post_id mapping, 5s debounced save + force-save on project switch/shutdown |
+| A1-14c | Embedding model runs on CPU only; no Apple GPU acceleration | embedding.allium invariant NativeAcceleratedExecution | `Backends.Neural` uses Bumblebee+EXLA; on Apple Silicon XLA has no Metal backend so inference is native CPU (batched). Apple GPU/Neural Engine unused | Fix code: spike an EMLX (Apple MLX) Nx backend so the model executes on the Apple Silicon GPU; gate by platform/availability with EXLA-CPU fallback; verify Bumblebee serving + defn compiler compatibility and benchmark vs CPU batching |
| A1-15 | ~~Preview vs generation content source strategy undocumented~~ | preview.allium (no invariant), generation.allium (no invariant) | Generation uses only published .md file content (`Generation.Data` snapshots set `content: nil`); preview includes published+draft posts and prefers DB content over file (`Preview.Router` queries `:published`/`:draft`, uses `editor_body`) | **Resolved:** added `PreviewDraftOverlay` invariant to preview.allium and `GenerationPublishedOnly` invariant to generation.allium; both cross-reference each other; code already correct, 3 tests added for draft-in-preview behavior |
### A2. Spec Should Update (code is normative)
@@ -185,7 +186,7 @@ All reconciled to follow code. Specs must be self-consistent and match code.
## Priority Order for Resolution
-1. **A1-1 through A1-14b** — code must follow spec (includes auto-save, on-demand preview, template lookup, validation gates, real Pagefind, graceful shutdown, real embedding model; A1-14b = USearch HNSW index still open)
+1. **A1-1 through A1-14c** — code must follow spec (includes auto-save, on-demand preview, template lookup, validation gates, real Pagefind, graceful shutdown, real embedding model; A1-14b = USearch HNSW index and A1-14c = Apple GPU/EMLX acceleration still open)
2. **D1-1 through D1-18** — untested invariants/guarantees
3. **C-1 through C-3** — internal spec inconsistencies (reconcile to code)
4. **B1-1 through B1-6** — major code behaviors missing from spec
diff --git a/config/config.exs b/config/config.exs
index c594b3f..b98c1ef 100644
--- a/config/config.exs
+++ b/config/config.exs
@@ -64,7 +64,11 @@ config :bds, :embeddings,
backend: BDS.Embeddings.Backends.Neural,
model_id: "Xenova/multilingual-e5-small",
model_repo: "intfloat/multilingual-e5-small",
- dimensions: 384
+ dimensions: 384,
+ # Inference is batched: batch_size texts per compiled run, truncated to
+ # sequence_length tokens. Tuning these trades throughput against memory.
+ batch_size: 16,
+ sequence_length: 256
# Cache downloaded model files under the app data directory so they persist
# across sessions (ModelCaching invariant). Overridden at runtime in prod.
diff --git a/config/test.exs b/config/test.exs
index ebafef0..495a503 100644
--- a/config/test.exs
+++ b/config/test.exs
@@ -15,4 +15,6 @@ config :bds, :embeddings,
backend: BDS.Embeddings.Backends.InApp,
model_id: "Xenova/multilingual-e5-small",
model_repo: "intfloat/multilingual-e5-small",
- dimensions: 384
+ dimensions: 384,
+ batch_size: 16,
+ sequence_length: 256
diff --git a/lib/bds/embeddings.ex b/lib/bds/embeddings.ex
index 30af2c1..0b0d546 100644
--- a/lib/bds/embeddings.ex
+++ b/lib/bds/embeddings.ex
@@ -75,21 +75,7 @@ defmodule BDS.Embeddings do
)
existing_keys = preload_keys_by_post_id(project_id, Enum.map(posts, & &1.id))
- base_label = max_label_value()
-
- {rows, _next_label} =
- Enum.reduce(posts, {[], base_label + 1}, fn post, {acc, next_label} ->
- existing_key = Map.get(existing_keys, post.id)
-
- case compute_key_data(post, existing_key, next_label) do
- :skip ->
- {acc, next_label}
-
- {:upsert, row} ->
- bump = if existing_key, do: 0, else: 1
- {[row | acc], next_label + bump}
- end
- end)
+ rows = build_key_rows(posts, existing_keys, max_label_value(), nil)
batch_upsert_keys(rows)
:ok = rebuild_snapshot(project_id)
@@ -113,9 +99,6 @@ defmodule BDS.Embeddings do
)
post_ids = Enum.map(posts, & &1.id)
- total_posts = length(posts)
-
- :ok = report_rebuild_started(on_progress, total_posts, "embedding entries")
Repo.delete_all(
from key in Key,
@@ -123,24 +106,7 @@ defmodule BDS.Embeddings do
)
existing_keys = preload_keys_by_post_id(project_id)
- base_label = max_label_value()
-
- {rows, _next_label} =
- posts
- |> Enum.with_index(1)
- |> Enum.reduce({[], base_label + 1}, fn {post, index}, {acc, next_label} ->
- :ok = report_rebuild_progress(on_progress, index, total_posts, "embedding entries")
- existing_key = Map.get(existing_keys, post.id)
-
- case compute_key_data(post, existing_key, next_label) do
- :skip ->
- {acc, next_label}
-
- {:upsert, row} ->
- bump = if existing_key, do: 0, else: 1
- {[row | acc], next_label + bump}
- end
- end)
+ rows = build_key_rows(posts, existing_keys, max_label_value(), on_progress)
batch_upsert_keys(rows)
@@ -246,18 +212,83 @@ defmodule BDS.Embeddings do
Repo.one(from key in Key, select: max(key.label)) || 0
end
- defp compute_key_data(%Post{} = post, existing_key, next_label) do
- body = resolve_post_body(post)
- raw_text = compose_embedding_source(post.title, body)
- content_hash = hash_text(raw_text)
+ # Builds the upsert rows for a batch of posts. Posts whose content_hash is
+ # unchanged are skipped (ContentHashSkipsUnchanged); the rest are embedded in
+ # batches (see embed_pending/2) so model inference is not serialised one post
+ # at a time. Labels keep their existing value or take the next free integer.
+ defp build_key_rows(posts, existing_keys, base_label, on_progress) do
+ prepared =
+ Enum.map(posts, fn post ->
+ raw_text = compose_embedding_source(post.title, resolve_post_body(post))
+ existing = Map.get(existing_keys, post.id)
+ content_hash = hash_text(raw_text)
- if existing_key && existing_key.content_hash == content_hash do
- :skip
- else
- {:ok, vector} = embed_text(raw_text, post.language)
- label = if existing_key, do: existing_key.label, else: next_label
- {:upsert, [label, post.id, post.project_id, content_hash, encode_vector(vector)]}
- end
+ %{
+ post: post,
+ existing: existing,
+ raw_text: raw_text,
+ content_hash: content_hash,
+ needs_embed?: is_nil(existing) or existing.content_hash != content_hash
+ }
+ end)
+
+ pending = Enum.filter(prepared, & &1.needs_embed?)
+ :ok = report_rebuild_started(on_progress, length(pending), "embedding entries")
+ vectors_by_post_id = embed_pending(pending, on_progress)
+
+ {rows, _next_label} =
+ Enum.reduce(prepared, {[], base_label + 1}, fn entry, {acc, next_label} ->
+ if entry.needs_embed? do
+ vector = Map.fetch!(vectors_by_post_id, entry.post.id)
+ label = if entry.existing, do: entry.existing.label, else: next_label
+ bump = if entry.existing, do: 0, else: 1
+
+ row = [
+ label,
+ entry.post.id,
+ entry.post.project_id,
+ entry.content_hash,
+ encode_vector(vector)
+ ]
+
+ {[row | acc], next_label + bump}
+ else
+ {acc, next_label}
+ end
+ end)
+
+ rows
+ end
+
+ defp embed_pending([], _on_progress), do: %{}
+
+ defp embed_pending(pending, on_progress) do
+ total = length(pending)
+ batch = batch_size()
+
+ pending
+ # Group by language so the lexical stub stems consistently; the neural
+ # backend is multilingual and ignores the language hint.
+ |> Enum.group_by(& &1.post.language)
+ |> Enum.reduce({%{}, 0}, fn {language, group}, acc ->
+ group
+ |> Enum.chunk_every(batch)
+ |> Enum.reduce(acc, fn chunk, {vectors, done} ->
+ {:ok, chunk_vectors} = embed_many(Enum.map(chunk, & &1.raw_text), language)
+
+ vectors =
+ chunk
+ |> Enum.zip(chunk_vectors)
+ |> Enum.reduce(vectors, fn {entry, vector}, acc ->
+ Map.put(acc, entry.post.id, vector)
+ end)
+
+ done = done + length(chunk)
+ :ok = report_rebuild_progress(on_progress, done, total, "embedding entries")
+ {vectors, done}
+ end)
+ end)
+ |> elem(0)
end
defp batch_upsert_keys([]), do: :ok
@@ -308,21 +339,7 @@ defmodule BDS.Embeddings do
)
existing_keys = preload_keys_by_post_id(project_id)
- base_label = max_label_value()
-
- {rows, _next_label} =
- Enum.reduce(posts, {[], base_label + 1}, fn post, {acc, next_label} ->
- existing_key = Map.get(existing_keys, post.id)
-
- case compute_key_data(post, existing_key, next_label) do
- :skip ->
- {acc, next_label}
-
- {:upsert, row} ->
- bump = if existing_key, do: 0, else: 1
- {[row | acc], next_label + bump}
- end
- end)
+ rows = build_key_rows(posts, existing_keys, max_label_value(), nil)
batch_upsert_keys(rows)
:ok = rebuild_snapshot(project_id)
@@ -660,6 +677,32 @@ defmodule BDS.Embeddings do
configured_backend().embed(raw_text, language: language)
end
+ # Embeds a batch of texts in one shot. Backends that implement the optional
+ # embed_many/2 callback (e.g. the neural backend, which feeds them through the
+ # model as a single batched inference run) handle the whole list; others fall
+ # back to sequential single embeds.
+ defp embed_many(texts, language) do
+ backend = configured_backend()
+
+ if function_exported?(backend, :embed_many, 2) do
+ backend.embed_many(texts, language: language)
+ else
+ vectors =
+ Enum.map(texts, fn text ->
+ {:ok, vector} = backend.embed(text, language: language)
+ vector
+ end)
+
+ {:ok, vectors}
+ end
+ end
+
+ defp batch_size do
+ Application.get_env(:bds, :embeddings, [])
+ |> Keyword.get(:batch_size, 16)
+ |> max(1)
+ end
+
defp rebuild_snapshot(project_id) do
Index.rebuild(project_id, model_id: model_id(), dimensions: dimensions())
end
diff --git a/lib/bds/embeddings/backend.ex b/lib/bds/embeddings/backend.ex
index b6471e3..7851275 100644
--- a/lib/bds/embeddings/backend.ex
+++ b/lib/bds/embeddings/backend.ex
@@ -3,4 +3,15 @@ defmodule BDS.Embeddings.Backend do
@callback model_info() :: %{model_id: String.t(), dimensions: pos_integer()}
@callback embed(String.t(), keyword()) :: {:ok, [number()]} | {:error, term()}
+
+ @doc """
+ Embeds a list of texts in a single call.
+
+ Backends that can amortise work across inputs (e.g. running the neural model
+ on a batched tensor) should implement this. The result list is aligned with
+ the input list. Optional — callers fall back to repeated `embed/2`.
+ """
+ @callback embed_many([String.t()], keyword()) :: {:ok, [[number()]]} | {:error, term()}
+
+ @optional_callbacks embed_many: 2
end
diff --git a/lib/bds/embeddings/backends/in_app.ex b/lib/bds/embeddings/backends/in_app.ex
index 0b5eb5f..9d6e9aa 100644
--- a/lib/bds/embeddings/backends/in_app.ex
+++ b/lib/bds/embeddings/backends/in_app.ex
@@ -37,6 +37,17 @@ defmodule BDS.Embeddings.Backends.InApp do
{:ok, vector}
end
+ @impl true
+ def embed_many(texts, opts) when is_list(texts) and is_list(opts) do
+ vectors =
+ Enum.map(texts, fn text ->
+ {:ok, vector} = embed(text, opts)
+ vector
+ end)
+
+ {:ok, vectors}
+ end
+
defp tokenize(text) do
Regex.scan(~r/[[:alnum:]]+/u, String.downcase(text))
|> List.flatten()
diff --git a/lib/bds/embeddings/backends/neural.ex b/lib/bds/embeddings/backends/neural.ex
index 767267a..fe5453c 100644
--- a/lib/bds/embeddings/backends/neural.ex
+++ b/lib/bds/embeddings/backends/neural.ex
@@ -17,6 +17,14 @@ defmodule BDS.Embeddings.Backends.Neural do
with `"query: "`, pooled with mean pooling over the attention mask, and
L2-normalised. This is what makes cross-language semantic similarity
work.
+ * Inference is batched. `embed_many/2` runs the model on `batch_size`
+ texts per compiled inference run instead of one at a time, which is the
+ dominant cost when (re)indexing large numbers of posts. The serving is
+ compiled for a fixed `batch_size`/`sequence_length` (configurable);
+ shorter sequences mean less wasted transformer compute.
+
+ EXLA on Apple Silicon runs on the CPU — XLA has no Metal/GPU backend. See
+ SPECGAPS A1-14c for the planned EMLX (Apple GPU via MLX) acceleration path.
"""
@behaviour BDS.Embeddings.Backend
@@ -24,11 +32,13 @@ defmodule BDS.Embeddings.Backends.Neural do
use GenServer
@query_prefix "query: "
- @embed_timeout :timer.minutes(2)
+ @embed_timeout :timer.minutes(10)
@default_model_id "Xenova/multilingual-e5-small"
@default_model_repo "intfloat/multilingual-e5-small"
@default_dimensions 384
+ @default_batch_size 16
+ @default_sequence_length 256
def child_spec(opts) do
%{id: __MODULE__, start: {__MODULE__, :start_link, [opts]}}
@@ -50,7 +60,22 @@ defmodule BDS.Embeddings.Backends.Neural do
@impl BDS.Embeddings.Backend
def embed(text, _opts) when is_binary(text) do
- GenServer.call(__MODULE__, {:embed, @query_prefix <> text}, @embed_timeout)
+ case run([@query_prefix <> text]) do
+ {:ok, [vector]} -> {:ok, vector}
+ {:ok, _other} -> {:error, :unexpected_embedding_result}
+ {:error, _reason} = error -> error
+ end
+ end
+
+ @impl BDS.Embeddings.Backend
+ def embed_many([], _opts), do: {:ok, []}
+
+ def embed_many(texts, _opts) when is_list(texts) do
+ run(Enum.map(texts, &(@query_prefix <> &1)))
+ end
+
+ defp run(prefixed_texts) do
+ GenServer.call(__MODULE__, {:embed, prefixed_texts}, @embed_timeout)
catch
:exit, reason -> {:error, {:embedding_backend_unavailable, reason}}
end
@@ -59,11 +84,15 @@ defmodule BDS.Embeddings.Backends.Neural do
def init(_opts), do: {:ok, %{serving: nil}}
@impl GenServer
- def handle_call({:embed, text}, _from, state) do
+ def handle_call({:embed, texts}, _from, state) do
case ensure_serving(state) do
{:ok, %{serving: serving} = next_state} ->
- %{embedding: tensor} = Nx.Serving.run(serving, text)
- {:reply, {:ok, Nx.to_flat_list(tensor)}, next_state}
+ vectors =
+ texts
+ |> Enum.chunk_every(batch_size())
+ |> Enum.flat_map(&run_chunk(serving, &1))
+
+ {:reply, {:ok, vectors}, next_state}
{:error, _reason} = error ->
{:reply, error, state}
@@ -73,6 +102,17 @@ defmodule BDS.Embeddings.Backends.Neural do
{:reply, {:error, Exception.message(exception)}, state}
end
+ defp run_chunk(serving, [single]) do
+ %{embedding: tensor} = Nx.Serving.run(serving, single)
+ [Nx.to_flat_list(tensor)]
+ end
+
+ defp run_chunk(serving, chunk) do
+ serving
+ |> Nx.Serving.run(chunk)
+ |> Enum.map(fn %{embedding: tensor} -> Nx.to_flat_list(tensor) end)
+ end
+
defp ensure_serving(%{serving: nil} = state) do
case build_serving() do
{:ok, serving} -> {:ok, %{state | serving: serving}}
@@ -92,7 +132,7 @@ defmodule BDS.Embeddings.Backends.Neural do
output_pool: :mean_pooling,
output_attribute: :hidden_state,
embedding_processor: :l2_norm,
- compile: [batch_size: 1, sequence_length: 512],
+ compile: [batch_size: batch_size(), sequence_length: sequence_length()],
defn_options: [compiler: EXLA]
)
@@ -100,5 +140,13 @@ defmodule BDS.Embeddings.Backends.Neural do
end
end
+ defp batch_size do
+ config() |> Keyword.get(:batch_size, @default_batch_size) |> max(1)
+ end
+
+ defp sequence_length do
+ config() |> Keyword.get(:sequence_length, @default_sequence_length) |> max(1)
+ end
+
defp config, do: Application.get_env(:bds, :embeddings, [])
end
diff --git a/specs/embedding.allium b/specs/embedding.allium
index 5432a00..75ede5e 100644
--- a/specs/embedding.allium
+++ b/specs/embedding.allium
@@ -87,6 +87,8 @@ config {
debounce_persist: Duration = 5.seconds
-- Index file: {userData}/projects/{projectId}/embeddings.usearch
-- Key mapping is persisted alongside the embedding records
+ batch_size: Integer = 16 -- texts per batched inference run
+ sequence_length: Integer = 256 -- max tokens per input (truncated)
}
-- ─── Gating ─────────────────────────────────────────────────
@@ -224,6 +226,18 @@ invariant RealNeuralModel {
-- This is only achievable with the trained multilingual transformer model.
}
+invariant NativeAcceleratedExecution {
+ -- Model execution MUST use the platform's native hardware acceleration
+ -- where available (GPU/Metal/Neural Engine on Apple Silicon, CUDA on
+ -- NVIDIA, etc.), and otherwise fall back to optimised native CPU execution.
+ -- Inference MUST be batched: batch_size inputs are run per compiled
+ -- inference pass and inputs are truncated to a bounded sequence_length, so
+ -- (re)indexing many posts is not serialised one document at a time.
+ -- Current implementation: Bumblebee + EXLA, which is native CPU on Apple
+ -- Silicon (XLA has no Metal backend). Apple GPU acceleration via EMLX/MLX
+ -- is tracked as a follow-up (SPECGAPS A1-14c).
+}
+
invariant ModelCaching {
-- Model files (~100 MB) downloaded from Hugging Face Hub on first use
-- Cached in app data directory, persists across sessions
diff --git a/test/bds/csm033_batch_inserts_test.exs b/test/bds/csm033_batch_inserts_test.exs
index a819dd4..ff28a22 100644
--- a/test/bds/csm033_batch_inserts_test.exs
+++ b/test/bds/csm033_batch_inserts_test.exs
@@ -34,9 +34,13 @@ defmodule BDS.CSM033BatchInsertsTest do
"expected ON CONFLICT upsert clause"
end
- test "compute_key_data is used instead of individual Repo.insert_or_update", %{source: source} do
- assert source =~ "compute_key_data(post, existing_key, next_label)",
- "expected compute_key_data helper for row computation"
+ test "build_key_rows computes rows for batched upsert instead of individual Repo.insert_or_update",
+ %{source: source} do
+ assert source =~ "build_key_rows(posts, existing_keys",
+ "expected build_key_rows helper for batched row computation"
+
+ assert source =~ "embed_many(",
+ "expected batched embedding via embed_many"
end
end
diff --git a/test/bds/embeddings_test.exs b/test/bds/embeddings_test.exs
index 90b5e4d..58635e6 100644
--- a/test/bds/embeddings_test.exs
+++ b/test/bds/embeddings_test.exs
@@ -15,6 +15,28 @@ defmodule BDS.EmbeddingsTest do
end
end
+ defmodule BatchRecordingBackend do
+ @behaviour BDS.Embeddings.Backend
+
+ @recorder :embeddings_batch_recorder
+
+ @impl true
+ def model_info do
+ %{model_id: "batch/multilingual-e5-small", dimensions: 384}
+ end
+
+ @impl true
+ def embed(text, opts) do
+ BDS.Embeddings.Backends.InApp.embed(text, opts)
+ end
+
+ @impl true
+ def embed_many(texts, opts) do
+ Agent.update(@recorder, fn sizes -> [length(texts) | sizes] end)
+ BDS.Embeddings.Backends.InApp.embed_many(texts, opts)
+ end
+ end
+
setup do
:ok = Ecto.Adapters.SQL.Sandbox.checkout(BDS.Repo)
@@ -351,6 +373,46 @@ defmodule BDS.EmbeddingsTest do
assert is_map(scores)
end
+ test "rebuilding embeds posts in batches instead of one at a time", %{project: project} do
+ assert {:ok, _metadata} =
+ BDS.Metadata.update_project_metadata(project.id, %{semantic_similarity_enabled: true})
+
+ for index <- 1..5 do
+ assert {:ok, post} =
+ BDS.Posts.create_post(%{
+ project_id: project.id,
+ title: "Batch #{index}",
+ content: "space rocket orbit mission galaxy #{index}",
+ language: "en"
+ })
+
+ assert {:ok, _post} = BDS.Posts.publish_post(post.id)
+ end
+
+ # Simulate the post-migration state where the vector cache is empty, so the
+ # rebuild has to (re)embed every post.
+ BDS.Repo.delete_all(BDS.Embeddings.Key)
+
+ {:ok, _recorder} = Agent.start_link(fn -> [] end, name: :embeddings_batch_recorder)
+
+ Application.put_env(:bds, :embeddings,
+ backend: BatchRecordingBackend,
+ model_id: "batch/multilingual-e5-small",
+ dimensions: 384,
+ batch_size: 3
+ )
+
+ assert {:ok, rebuilt} = BDS.Embeddings.reindex_all(project.id)
+ assert length(rebuilt) == 5
+
+ batch_sizes = Agent.get(:embeddings_batch_recorder, & &1)
+
+ # 5 pending posts at batch_size 3 → one batch of 3 and one of 2, never
+ # one-at-a-time.
+ assert Enum.sort(batch_sizes, :desc) == [3, 2]
+ assert Enum.max(batch_sizes) > 1
+ end
+
test "reindex_all rebuilds stored embeddings for the whole project", %{project: project} do
assert {:ok, _metadata} =
BDS.Metadata.update_project_metadata(project.id, %{semantic_similarity_enabled: true})