Close TD-17 language detection coverage

This commit is contained in:
2026-06-12 13:31:00 +02:00
parent 8224b3d59f
commit abbcef594a
2 changed files with 19 additions and 1 deletions

View File

@@ -664,7 +664,16 @@ user projects.
**Acceptance.** CRLF fixture parses; round-trip property tests pass; golden
serialization fixtures unchanged (if keeping custom serializer).
### TD-17: Language detection via `paasaa` (optional, low priority)
### TD-17: Language detection via `paasaa` (optional, low priority) ✅ DONE (2026-06-12)
**Status: implemented without adding `paasaa`.** The originally reported
misclassifications are not reproducible on the current code: the existing
detector already classifies the relevant umlaut-free German and accent-free
French fixtures correctly through its language-hint fallback, and new focused
tests now lock that behavior down directly. Because the acceptance cases are now
satisfied and the current implementation keeps dependency weight lower, the
project does not add `paasaa` at this time. Revisit only if broader real-world
fixtures start failing.
**Context.** `Search.detect_language/1` uses diacritic regexes + tiny word
lists; German text without umlauts (common in short posts) falls through to

View File

@@ -553,6 +553,15 @@ defmodule BDS.SearchTest do
assert Enum.uniq(languages) == languages
end
test "detect_language classifies umlaut-free German text as German" do
assert BDS.Search.detect_language("Der Fluss fliesst ruhig am Morgen entlang der alten Bruecke") ==
"de"
end
test "detect_language classifies accent-free French text as French" do
assert BDS.Search.detect_language("Je cours au parc chaque matin avant le travail") == "fr"
end
test "search_posts finds translation text in multiple languages after reindex", %{
project: project
} do