Feature/post media translations (#42)

* chore: updated todo with translation ideas

* feat: first take at the implementation of translations

* fix: small addition for the translation feature

* feat: support language switching in the editor and preview

* feat: better handling of long bodies by not running them through a json envelope

* fix: unknown macros have better fallback

* feat: api for python to get translations

* fix: strip dumb prefix of content in translation

* feat: extend meta diff for translations

* feat: hook up translations to rebuild-from-disk

* feat: generation of the website prefers project language, falling back to canonical language

* fix: crashes during rendering

* feat: translation validation report

* fix: made the translation validation actually work

* chore: reorganization of menu

* fix: some topics cleanup

* chore: updated doc

* feat: translations for media

* feat: more aligned in UI/UX

* feat: edit translations possible

* chore: added full multi-language todo

* chore: updated todo for clarity

* feat: implementation of full multi-linguality

* fix: page creation creates pages

* fix: flags on every page

* fix: better prompt

* feat: made MCP server aware of language content

* feat: python tools for translations

* fix: better fill-in-translations

* fix: better prompt for translation. maybe.

* fix: losing posts from search due to translation process

* fix: translation validation handles in-db content and fill-in of missing translations fixed to flush

* fix: faster scanning for infilling of missing translations

* chore: updated agent instructions

* feat: calendar and tag cloud respect current language now

* fix: retries going up

* fix: got metadata-diff and rebuild into sync

* fix: extended meta-diff for timestamps

* fix: made website validation look at translated content, too

* fix: multi-lingual search

* chore: refactor Editor.tsx into two separate editors

* feat: do language detection when no explicit language given

---------

Co-authored-by: hugo <hugoms@me.com>
This commit is contained in:
Georg Bauer
2026-03-09 14:43:18 +01:00
committed by GitHub
parent f1c9038803
commit b855d61524
116 changed files with 19954 additions and 2094 deletions

553
TODO.md
View File

@@ -1,5 +1,7 @@
# bDS — Remaining Feature Work
<!-- markdownlint-disable MD024 -->
This document covers the features described in VISION.md that are not yet
implemented. Each section is a self-contained plan that can be picked up
independently.
@@ -10,121 +12,342 @@ independently.
### Goal
Posts have a language attribute. The AI importing agent detects post language
and can auto-translate posts. Posts link to their translations so the
publishing pipeline can generate multilingual output.
Posts keep their canonical metadata in the main `posts` table. Translations are
stored separately so translated variants cannot drift into full independent
posts with their own unrelated metadata. The system must expose translation
availability everywhere posts are consumed.
### Current State
- Posts have no `language` field.
- No translation relationship tracking.
- No language detection during import.
- No AI translation tools.
- The `excerpt` field already exists and can serve as the summary field
mentioned in the vision.
- `analyzeMediaImage()` in `OpenCodeManager` already demonstrates the pattern
for single-shot AI analysis with language parameters.
- Project-level `mainLanguage` exists in `MetaEngine`.
- No translation storage model yet.
- No translation relationship tracking yet.
- No translation-aware post metadata such as `availableLanguages` yet.
- No language/missing-language filtering in post query APIs yet.
- AI post language detection already exists via `chat:detectPostLanguage`.
- AI post analysis already exists via `chat:analyzePost` and already suggests
title, excerpt, and slug.
- Project-level `mainLanguage` already exists.
### Implementation Plan
#### 1.1 Database Schema
Extend the `posts` table:
Add a dedicated translations table instead of storing translations as normal
posts.
| Column | Type | Notes |
|-----------------|------|-------------------------------------------------|
| language | text | ISO code (`en`, `de`, etc.), defaults to project `mainLanguage` |
| translationOfId | text | FK to posts.id — the original post this is a translation of |
Each translation row should contain only:
No separate junction table needed. A translated post is simply a post with
`translationOfId` pointing at its source. This keeps the model simple: each
post belongs to exactly one language and optionally references one original.
- its own ID
- `translationFor` referencing the source post ID
- `language`
- `title`
- `excerpt`
- `content` for draft translations only
- normal timestamps/status fields needed for lifecycle management
Published translations follow the same rule as published posts: body content is
not stored in the database and is read from the filesystem.
#### 1.2 YAML Frontmatter
Extend `postFileUtils.ts` to read/write:
Store translation files separately but in the same general markdown + YAML
frontmatter style as posts, with only the supported translation fields:
```yaml
language: de
translationOf: <original-post-id>
translationFor: <original-post-id>
language: fr
title: ...
excerpt: ...
```
On `readPostFile()`, parse these fields. On `writePostFile()`, include them
when present.
Draft translation files include markdown body content. Published translation
files keep content only in the file, not in the database. File naming is based
on the source post slug plus the language code, for example:
- `this-slug.md` for the source post
- `this-slug.fr.md` for the French translation
#### 1.3 PostEngine Extensions
Add methods:
Add translation-aware storage and lookup methods instead of treating
translations as regular posts.
- `getTranslations(postId)` — find all posts where `translationOfId === postId`.
- `getOriginal(postId)` — if the post has `translationOfId`, return that post.
- `createTranslation(originalPostId, targetLanguage, content)` — create a new
post linked to the original with the target language set.
- Create/read/update/publish translation records.
- Resolve translations for a post by source post ID.
- Prevent duplicate translations for the same `(translationFor, language)` pair.
- Keep source post metadata authoritative; translations only override the fields
they actually own.
- Keep `getPost(id)` and `getPostBySlug(slug)` canonical-only.
- Add explicit translation reads such as `getPostTranslation(postId, language)`
and `getPostTranslations(postId)` instead of overloading `getPost()` with an
optional language parameter.
- If callers need "best variant for language X", add a separate higher-level
resolver rather than changing the semantics of the base post APIs.
Modify `createPost()` and `updatePost()` to accept and persist the `language`
and `translationOfId` fields.
Post APIs should expose an `availableLanguages` meta field derived from the
translations table for every source post.
#### 1.4 AI Translation Tools in OpenCodeManager
Add three new methods following the `analyzeMediaImage()` pattern:
**`detectPostLanguage(postId)`**
- Read post content.
- Send to AI with prompt: "Detect the language of this text. Return a JSON
object with `language` (ISO 639-1 code) and `confidence` (0-1)."
- Return `{ language: string, confidence: number }`.
Add translation generation on top of the existing one-shot AI tooling.
**`translatePost(postId, targetLanguage)`**
- Read full post content + title + excerpt.
- Send to AI with prompt: "Translate this blog post to {language}. Return JSON
with `title`, `content` (markdown), and `excerpt`."
- Return translated fields without creating a post (caller decides).
**`generatePostSummary(postId)`**
- Read post content.
- Send to AI: "Write a 2-3 sentence summary of this blog post in
{post.language}. Return JSON with `excerpt`."
- Return `{ excerpt: string }`.
- Read the source post's full content plus title/excerpt.
- Return translated `title`, `excerpt`, and markdown `content`.
- Create or update a translation record/file from the returned data.
- After the post translation is persisted, cascade to linked media: for every
image linked to the source post that does not already have a translation for
`targetLanguage`, call `translateMediaMetadata(mediaId, targetLanguage)` (see
§2.5) to keep the post and its images in the same set of languages.
Register these as IPC handlers: `chat:detectPostLanguage`,
`chat:translatePost`, `chat:generatePostSummary`.
Language detection and excerpt suggestion already exist; this step is only
about translation-specific tooling.
#### 1.5 Import Pipeline Integration
In `ImportExecutionEngine`, after a post is imported and published:
Integrate with the existing import flow without redefining source posts as
translations.
1. Call `detectPostLanguage()` to set the `language` field.
2. If the detected language differs from the project's `mainLanguage`, queue
a translation task via `TaskManager`.
3. The translation task calls `translatePost()`, creates a new post via
`createTranslation()`, and publishes it.
2. Optionally queue translation generation for configured target languages.
3. Persist generated results as translation records/files linked via
`translationFor`.
This is optional and should be configurable per import definition (a checkbox
"Auto-detect language and translate" in `ImportAnalysisView`).
#### 1.6 UI — Translation Panel
#### 1.6 API Surface
Expose translation metadata consistently across all post consumers:
- Templates and Python scripts can read `post.meta.availableLanguages`.
- Internal AI tools can inspect available translation languages.
- MCP post APIs return the same `availableLanguages` data.
- Python post-query APIs support filtering by `language` and by missing
translation language, so callers can ask for posts available in French or
posts missing Spanish.
- The same language and missing-language filters must be available to internal
AI tools and MCP server queries.
#### 1.7 UI — Translation Panel
In the post editor metadata area, add a "Translations" section:
- Show current post language (dropdown to change).
- List existing translations with links (open in new tab).
- "Translate to..." button that opens a language picker, triggers AI
translation, and creates the linked post.
- If the post is itself a translation, show "Original: {title}" link.
- List existing translations by language.
- "Translate to..." creates or refreshes the separate translation record.
- Show which configured languages are still missing.
In the sidebar post list, optionally show a language badge per post.
#### 1.7 Publishing Pipeline
#### 1.8 Publishing Pipeline
In `PageRenderer` and `BlogGenerationEngine`:
- Add `hreflang` link tags to generated HTML when translations exist.
- Optionally generate a language switcher partial that templates can include.
- Sitemap should include `xhtml:link` entries for alternate language versions.
- Resolve alternate-language links from the translations table.
- Add `hreflang` metadata and language switcher data from DB-backed
translation availability.
- Include alternate language entries in sitemap generation.
---
## 2. Media Translation System
### Goal
Media files keep their canonical metadata in the main `media` table and main
sidecar. Translations are stored separately so localized metadata cannot drift
into independent media records. The binary asset remains shared; only the
language-specific metadata varies.
### Current State
- No media translation storage model yet.
- No translation relationship tracking for media yet.
- No translation-aware media metadata such as `availableLanguages` yet.
- No language/missing-language filtering in media query APIs yet.
- No explicit language tracking on canonical media metadata yet.
- AI media analysis already exists via `chat:analyzeMediaImage` and already
suggests title, alt text, and caption in a requested language.
- Main media metadata already uses DB + sidecar persistence.
### Implementation Plan
#### 2.1 Database Schema
Add a `language` column to the `media` table (optional text, ISO code such as
`'en'`, `'de'`). This records what language the canonical `title`, `alt`, and
`caption` are written in. When null, the project `mainLanguage` is assumed.
Persist the value in the canonical sidecar as well.
Add a dedicated media translations table instead of storing localized metadata
inside the canonical `media` row.
Each translation row should contain only:
- its own ID
- `translationFor` referencing the source media ID
- `language`
- `title`
- `alt`
- `caption`
- normal timestamps needed for lifecycle and sync
The binary media file remains the canonical original file. Translation rows do
not duplicate the asset itself.
#### 2.2 Sidecar Format
Keep the canonical sidecar for canonical metadata, and add language-specific
sidecars for translated metadata only.
Canonical sidecar stays with the original media file, for example:
- `image.jpg.meta`
Translated metadata sidecars use the source filename plus the language code,
for example:
- `image.jpg.fr.meta`
Translated sidecars should contain only the supported translation fields:
```yaml
translationFor: <original-media-id>
language: fr
title: ...
alt: ...
caption: ...
```
#### 2.3 MediaEngine Extensions
Add translation-aware storage and lookup methods instead of treating media
translations as separate media items.
- Create/read/update translation records and translated sidecars.
- Resolve translations for a media item by source media ID.
- Prevent duplicate translations for the same `(translationFor, language)` pair.
- Keep source media metadata authoritative; translations only override
`title`, `alt`, and `caption`.
- Keep `getMedia(id)` canonical-only.
- Add explicit translation reads such as `getMediaTranslation(mediaId,
language)` and `getMediaTranslations(mediaId)` instead of overloading
`getMedia()` with an optional language parameter.
- If callers need "best variant for language X", add a separate higher-level
resolver rather than changing the semantics of the base media APIs.
Media APIs should expose an `availableLanguages` meta field derived from the
translations table for every canonical media item.
#### 2.4 AI Translation Tools
Add media-metadata translation on top of the existing one-shot AI tooling.
**`translateMediaMetadata(mediaId, targetLanguage)`**
- Read the source media metadata plus image context needed for a faithful
translation.
- Determine source language from `media.language` (falling back to the
project `mainLanguage`).
- Return translated `title`, `alt`, and `caption`.
- Create or update a translation record/sidecar from the returned data.
**`detectMediaLanguage(mediaId)`**
- Read the canonical `title`, `alt`, and `caption` of a media item.
- Use the same lightweight title model and detection pattern as
`detectPostLanguage`.
- Return the detected ISO language code.
- Optionally persist the result to `media.language` if the caller requests it.
AI media analysis already exists; these steps are only about
language-detection and translation-specific tooling.
#### 2.5 Post-Triggered Media Translation Cascade
When a post is translated, all images linked to that post should be translated
automatically so rendered output never mixes languages.
**Trigger**: After `translatePost(postId, targetLanguage)` successfully
persists a post translation (§1.4), the system resolves all media linked to
the source post via the `postMedia` junction table.
**For each linked media item**:
1. Check whether the media already has a translation for `targetLanguage`
(via `getMediaTranslation(mediaId, targetLanguage)`).
2. If a translation already exists, skip — the image is already covered.
3. If no translation exists, call `translateMediaMetadata(mediaId,
targetLanguage)` (§2.4) to generate and persist the translated `title`,
`alt`, and `caption`.
**Design constraints**:
- The cascade is additive only — it never overwrites an existing media
translation. Users who independently translate an image via quick action
keep their version.
- Images can still be translated independently at any time through their own
quick action or the media translation panel (§2.7). The cascade merely
ensures coverage; it does not create a hard coupling.
- Failures on individual media translations should be logged but must not
block the post translation from succeeding. Report partial failures to the
UI so the user can retry individual images.
- The cascade runs after the post translation is committed, not inside the
same transaction, so a media-translation failure never rolls back post
work.
#### 2.6 Import And Sync Integration
Integrate with the existing media import and metadata sync flow without
creating translated duplicate media records.
1. Import the binary asset once as the canonical media item.
2. Optionally queue metadata translation generation for configured target
languages.
3. Persist generated results as translation records/sidecars linked via
`translationFor`.
4. Extend metadata diff/sync tooling so canonical and translated sidecars can
both be compared against the database safely.
#### 2.7 API Surface
Expose translation metadata consistently across all media consumers:
- Templates and Python scripts can read `media.meta.availableLanguages`.
- Internal AI tools can inspect available translation languages.
- MCP media APIs return the same `availableLanguages` data.
- Python media-query APIs support filtering by `language` and by missing
translation language, so callers can ask for media with French metadata or
media missing Spanish metadata.
- The same language and missing-language filters must be available to internal
AI tools and MCP server queries.
#### 2.8 UI — Translation Panel
In the media editor/details area, add a "Translations" section:
- Show the canonical media language as a dropdown (same UX as the post
language selector). Changing it updates `media.language`.
- Provide a "Detect Language" button that calls `detectMediaLanguage` and
updates the dropdown.
- List existing metadata translations by language.
- "Translate to..." creates or refreshes the separate translation record.
- Show which configured languages are still missing.
Media list/detail views can optionally show a language-availability badge.
#### 2.9 Rendering And Asset Use
When media metadata is consumed during rendering or editing:
- Resolve localized `title`, `alt`, and `caption` from the translations table
when a language-specific variant is requested.
- Fall back to canonical metadata when no translation exists.
- Keep URLs and binary asset references stable; only metadata changes by
language.
---
## 3. Drag-and-Drop Image Insertion
@@ -235,3 +458,205 @@ The same plugin can handle `paste` events with image files:
editor state after drop.
- Test edge cases: non-image files, failed imports, multiple simultaneous
drops.
---
## 4. Multi-Language Blog Rendering (Phase 2)
### Goal
The generated blog is fully navigable in each activated language. Every
language gets its own route subtree (`/en/`, `/de/`, …), its own feeds, and
its own sitemap entries. Media assets are shared; only HTML differs. The
preview server must serve the same language-prefixed routes so the user can
verify output before uploading.
### Current State
- Post and media translation schemas, CRUD, AI translation, and validation
already exist (§1, §2).
- `PageRenderer` already accepts `preferredLanguage` and resolves translations
via `resolveRenderablePost()` and `getMediaTranslation()`.
- `BlogGenerationEngine` builds translation variants with `.lang` slug
suffixes but writes everything to a flat `html/` directory — no
language-prefixed subtrees.
- `PreviewServer` supports a `?lang=` query parameter but has no
language-prefixed routes.
- `ProjectMetadata` has `mainLanguage` but no `blogLanguages` list.
- No `doNotTranslate` flag on posts.
- No automatic translation on post create/update.
- No "Fill missing translations" batch tool.
### Implementation Plan
#### 4.0 Extract `SUPPORTED_POST_LANGUAGES` Constant
The list of supported post languages is currently hardcoded inline in AI
task files (e.g. `['en', 'de', 'fr', 'it', 'es']`). Extract it into a
shared constant in `src/main/shared/` (or similar) so that both AI tasks
and the Blog Languages UI (§4.1) reference a single source of truth.
#### 4.1 Project Preferences — Blog Languages
Add `blogLanguages?: string[]` to `ProjectMetadata`. This is the list of
languages the blog is rendered in (e.g. `['en', 'de']`). The `mainLanguage`
is always implicitly included. When `blogLanguages` is empty or absent, the
blog renders in `mainLanguage` only (current behaviour).
**UI**: Add a "Blog Languages" multi-select in the Project Settings panel,
populated from `SUPPORTED_POST_LANGUAGES`. The main language is shown but
cannot be removed. i18n keys: `settings.project.blogLanguagesLabel`,
`settings.project.blogLanguagesDescription`.
#### 4.2 Do-Not-Translate Flag
Add a boolean `doNotTranslate` column to the `posts` table (default false).
Persist in YAML frontmatter as `doNotTranslate: true`. Migration required.
**UI**: Checkbox in the post editor metadata area, labelled via i18n
(`editor.doNotTranslateLabel`).
**Validate Translations** must detect posts marked `doNotTranslate` that
still have translations and offer to remove them.
#### 4.3 Automatic Translation on Post Create/Update
When a canonical post is created or updated and `blogLanguages` contains
languages beyond `mainLanguage`:
1. For each active blog language missing a translation (skip if
`doNotTranslate` is set), enqueue a `TaskManager` task calling
`chat:translatePost`.
2. On success, show a toast ("Translated to French"). On failure, show an
error toast. Task progress is visible in the task panel.
3. Only canonical content changes trigger re-translation. Editing a
translation directly does **not** re-trigger anything.
4. After each post translation succeeds, cascade to linked media: for every
media item linked via `postMedia` that lacks a translation for the target
language, enqueue `chat:translateMediaMetadata`.
#### 4.4 Fill Missing Translations (Blog Menu Tool)
Add a "Fill Missing Translations" menu item under the Blog menu.
1. Scan all published posts (excluding `doNotTranslate`) and all linked media
for missing translations across `blogLanguages`.
2. Create one task for post translations and a second task for media metadata
translations.
3. Report progress and partial failures via the task panel and toasts.
4. This is separate from Validate Translations — validate checks consistency,
fill adds missing content.
#### 4.5 Route Generation — Main Language Flat, Alternatives Prefixed
The main language keeps the current flat route structure. Only additional
blog languages get a language-prefixed subtree. This means single-language
blogs see zero change from today's output.
```
html/
index.html ← main language (flat, same as today)
page/2/index.html
2025/03/08/my-post/index.html
category/tech/index.html
tag/rust/index.html
rss.xml
atom.xml
de/ ← additional blog language subtree
index.html
page/2/index.html
2025/03/08/my-post/index.html
category/tech/index.html
tag/rust/index.html
rss.xml
atom.xml
sitemap.xml ← combined, with hreflang alternates
media/ ← shared, not duplicated
assets/ ← shared, not duplicated
```
For the main language pass, generation works exactly as today — no prefix,
no routing changes. For each additional language in `blogLanguages`:
- Iterate the same route list, writing output under `/{lang}/…`.
- Resolve every post through `resolveRenderablePost(post, engine, lang)`.
If no translation exists, fall back to canonical content.
- Same for media metadata in macros: `getMediaTranslation(id, lang)` with
canonical fallback.
- All internal links within a language subtree stay prefixed (`/de/…` links
to `/de/…`). Main-language links remain unprefixed (`/2025/…`).
- Posts marked `doNotTranslate` render only in the main language output.
They are omitted from alternative language subtrees entirely.
#### 4.6 Per-Language Feeds
The main language feeds (`rss.xml`, `atom.xml`) stay at the root as today.
Each alternative language subtree gets its own `rss.xml` and `atom.xml`
under `/{lang}/`, containing only posts available in that language, with
URLs pointing into the language subtree. Feed `<language>` / `xml:lang` is
set to the subtree language.
#### 4.7 Combined Sitemap with hreflang
The root `sitemap.xml` lists all language variants of every URL. Each `<url>`
entry includes `<xhtml:link rel="alternate" hreflang="…" href="…"/>` for
every language the post is available in, plus `x-default` pointing to the
main language variant.
#### 4.8 Language Switcher in Templates
Add a `blogLanguages` array and `currentLanguage` string to the Liquid
template context. Default templates render a language switcher bar (flag
badges) at the top linking to the same page in each available language.
The switcher links are absolute paths — unprefixed for the main language
(`/2025/03/08/my-post/`) and prefixed for alternatives
(`/de/2025/03/08/my-post/`) — so they work regardless of route depth.
#### 4.9 Preview Server — Language-Prefixed Routes
Extend `PreviewServer` to handle language-prefixed paths for alternative
languages so preview matches the generated output:
- `GET /2025/03/08/my-post` → render post in main language (unchanged).
- `GET /de/category/tech` → render category list in German.
- For paths starting with a known alternative language prefix, strip it and
pass the language as `preferredLanguage` to `renderRouteForContext()`.
- Unprefixed paths use `mainLanguage` (current behaviour, no change).
- Keep the existing `?lang=` parameter as a fallback for single-post preview
from the editor.
- Language switcher links in preview HTML work because they use the same
prefix scheme as generation.
This ensures the user sees the exact same route structure and language
switching behaviour in preview as in the generated output.
#### 4.10 Preview/Generation Parity Checklist
Both preview and generation must produce identical output for:
- [ ] Main language routes remain flat/unprefixed.
- [ ] Alternative language routes use `/{lang}/…` prefix.
- [ ] Post content: translated title, excerpt, body with canonical fallback.
- [ ] Media metadata in macros (gallery, photo_album): translated alt/title/
caption with canonical fallback.
- [ ] Internal links: unprefixed for main language, prefixed for alternatives.
- [ ] Language switcher rendering with correct cross-language links.
- [ ] Per-language feed links in HTML `<head>`.
- [ ] `doNotTranslate` posts omitted from alternative language subtrees.
- [ ] Root `/` renders main language content (unchanged from today).
Shared implementation: both paths go through `SharedRouteRenderer` →
`PageRenderer`, so language handling logic added there automatically applies
to both preview and generation. The key change is making
`SharedRouteRenderer` language-prefix-aware and ensuring
`BlogGenerationEngine` iterates over `blogLanguages` when building routes.
#### 4.11 Testing
- **Unit**: Route prefix stripping, language fallback resolution, feed
language filtering, `doNotTranslate` exclusion, sitemap hreflang building.
- **Integration**: End-to-end generation with two languages produces correct
subtree structure, shared assets, per-language feeds, combined sitemap.
- **Preview parity**: Same route in preview and generation produces identical
HTML (modulo asset URLs).