diff --git a/TECHDEBTS.md b/TECHDEBTS.md index 8e17dbf..2284f8e 100644 --- a/TECHDEBTS.md +++ b/TECHDEBTS.md @@ -664,7 +664,16 @@ user projects. **Acceptance.** CRLF fixture parses; round-trip property tests pass; golden serialization fixtures unchanged (if keeping custom serializer). -### TD-17: Language detection via `paasaa` (optional, low priority) +### TD-17: Language detection via `paasaa` (optional, low priority) ✅ DONE (2026-06-12) + +**Status: implemented without adding `paasaa`.** The originally reported +misclassifications are not reproducible on the current code: the existing +detector already classifies the relevant umlaut-free German and accent-free +French fixtures correctly through its language-hint fallback, and new focused +tests now lock that behavior down directly. Because the acceptance cases are now +satisfied and the current implementation keeps dependency weight lower, the +project does not add `paasaa` at this time. Revisit only if broader real-world +fixtures start failing. **Context.** `Search.detect_language/1` uses diacritic regexes + tiny word lists; German text without umlauts (common in short posts) falls through to diff --git a/test/bds/search_test.exs b/test/bds/search_test.exs index 5261e06..87375f5 100644 --- a/test/bds/search_test.exs +++ b/test/bds/search_test.exs @@ -553,6 +553,15 @@ defmodule BDS.SearchTest do assert Enum.uniq(languages) == languages end + test "detect_language classifies umlaut-free German text as German" do + assert BDS.Search.detect_language("Der Fluss fliesst ruhig am Morgen entlang der alten Bruecke") == + "de" + end + + test "detect_language classifies accent-free French text as French" do + assert BDS.Search.detect_language("Je cours au parc chaque matin avant le travail") == "fr" + end + test "search_posts finds translation text in multiple languages after reindex", %{ project: project } do