Files
bDS/TODO.md
Georg Bauer b855d61524 Feature/post media translations (#42)
* chore: updated todo with translation ideas

* feat: first take at the implementation of translations

* fix: small addition for the translation feature

* feat: support language switching in the editor and preview

* feat: better handling of long bodies by not running them through a json envelope

* fix: unknown macros have better fallback

* feat: api for python to get translations

* fix: strip dumb prefix of content in translation

* feat: extend meta diff for translations

* feat: hook up translations to rebuild-from-disk

* feat: generation of the website prefers project language, falling back to canonical language

* fix: crashes during rendering

* feat: translation validation report

* fix: made the translation validation actually work

* chore: reorganization of menu

* fix: some topics cleanup

* chore: updated doc

* feat: translations for media

* feat: more aligned in UI/UX

* feat: edit translations possible

* chore: added full multi-language todo

* chore: updated todo for clarity

* feat: implementation of full multi-linguality

* fix: page creation creates pages

* fix: flags on every page

* fix: better prompt

* feat: made MCP server aware of language content

* feat: python tools for translations

* fix: better fill-in-translations

* fix: better prompt for translation. maybe.

* fix: losing posts from search due to translation process

* fix: translation validation handles in-db content and fill-in of missing translations fixed to flush

* fix: faster scanning for infilling of missing translations

* chore: updated agent instructions

* feat: calendar and tag cloud respect current language now

* fix: retries going up

* fix: got metadata-diff and rebuild into sync

* fix: extended meta-diff for timestamps

* fix: made website validation look at translated content, too

* fix: multi-lingual search

* chore: refactor Editor.tsx into two separate editors

* feat: do language detection when no explicit language given

---------

Co-authored-by: hugo <hugoms@me.com>
2026-03-09 14:43:18 +01:00

25 KiB

bDS — Remaining Feature Work

This document covers the features described in VISION.md that are not yet implemented. Each section is a self-contained plan that can be picked up independently.


1. Post Translation System

Goal

Posts keep their canonical metadata in the main posts table. Translations are stored separately so translated variants cannot drift into full independent posts with their own unrelated metadata. The system must expose translation availability everywhere posts are consumed.

Current State

  • No translation storage model yet.
  • No translation relationship tracking yet.
  • No translation-aware post metadata such as availableLanguages yet.
  • No language/missing-language filtering in post query APIs yet.
  • AI post language detection already exists via chat:detectPostLanguage.
  • AI post analysis already exists via chat:analyzePost and already suggests title, excerpt, and slug.
  • Project-level mainLanguage already exists.

Implementation Plan

1.1 Database Schema

Add a dedicated translations table instead of storing translations as normal posts.

Each translation row should contain only:

  • its own ID
  • translationFor referencing the source post ID
  • language
  • title
  • excerpt
  • content for draft translations only
  • normal timestamps/status fields needed for lifecycle management

Published translations follow the same rule as published posts: body content is not stored in the database and is read from the filesystem.

1.2 YAML Frontmatter

Store translation files separately but in the same general markdown + YAML frontmatter style as posts, with only the supported translation fields:

translationFor: <original-post-id>
language: fr
title: ...
excerpt: ...

Draft translation files include markdown body content. Published translation files keep content only in the file, not in the database. File naming is based on the source post slug plus the language code, for example:

  • this-slug.md for the source post
  • this-slug.fr.md for the French translation

1.3 PostEngine Extensions

Add translation-aware storage and lookup methods instead of treating translations as regular posts.

  • Create/read/update/publish translation records.
  • Resolve translations for a post by source post ID.
  • Prevent duplicate translations for the same (translationFor, language) pair.
  • Keep source post metadata authoritative; translations only override the fields they actually own.
  • Keep getPost(id) and getPostBySlug(slug) canonical-only.
  • Add explicit translation reads such as getPostTranslation(postId, language) and getPostTranslations(postId) instead of overloading getPost() with an optional language parameter.
  • If callers need "best variant for language X", add a separate higher-level resolver rather than changing the semantics of the base post APIs.

Post APIs should expose an availableLanguages meta field derived from the translations table for every source post.

1.4 AI Translation Tools in OpenCodeManager

Add translation generation on top of the existing one-shot AI tooling.

translatePost(postId, targetLanguage)

  • Read the source post's full content plus title/excerpt.
  • Return translated title, excerpt, and markdown content.
  • Create or update a translation record/file from the returned data.
  • After the post translation is persisted, cascade to linked media: for every image linked to the source post that does not already have a translation for targetLanguage, call translateMediaMetadata(mediaId, targetLanguage) (see §2.5) to keep the post and its images in the same set of languages.

Language detection and excerpt suggestion already exist; this step is only about translation-specific tooling.

1.5 Import Pipeline Integration

Integrate with the existing import flow without redefining source posts as translations.

  1. Call detectPostLanguage() to set the language field.
  2. Optionally queue translation generation for configured target languages.
  3. Persist generated results as translation records/files linked via translationFor.

This is optional and should be configurable per import definition (a checkbox "Auto-detect language and translate" in ImportAnalysisView).

1.6 API Surface

Expose translation metadata consistently across all post consumers:

  • Templates and Python scripts can read post.meta.availableLanguages.
  • Internal AI tools can inspect available translation languages.
  • MCP post APIs return the same availableLanguages data.
  • Python post-query APIs support filtering by language and by missing translation language, so callers can ask for posts available in French or posts missing Spanish.
  • The same language and missing-language filters must be available to internal AI tools and MCP server queries.

1.7 UI — Translation Panel

In the post editor metadata area, add a "Translations" section:

  • Show current post language (dropdown to change).
  • List existing translations by language.
  • "Translate to..." creates or refreshes the separate translation record.
  • Show which configured languages are still missing.

In the sidebar post list, optionally show a language badge per post.

1.8 Publishing Pipeline

In PageRenderer and BlogGenerationEngine:

  • Resolve alternate-language links from the translations table.
  • Add hreflang metadata and language switcher data from DB-backed translation availability.
  • Include alternate language entries in sitemap generation.

2. Media Translation System

Goal

Media files keep their canonical metadata in the main media table and main sidecar. Translations are stored separately so localized metadata cannot drift into independent media records. The binary asset remains shared; only the language-specific metadata varies.

Current State

  • No media translation storage model yet.
  • No translation relationship tracking for media yet.
  • No translation-aware media metadata such as availableLanguages yet.
  • No language/missing-language filtering in media query APIs yet.
  • No explicit language tracking on canonical media metadata yet.
  • AI media analysis already exists via chat:analyzeMediaImage and already suggests title, alt text, and caption in a requested language.
  • Main media metadata already uses DB + sidecar persistence.

Implementation Plan

2.1 Database Schema

Add a language column to the media table (optional text, ISO code such as 'en', 'de'). This records what language the canonical title, alt, and caption are written in. When null, the project mainLanguage is assumed. Persist the value in the canonical sidecar as well.

Add a dedicated media translations table instead of storing localized metadata inside the canonical media row.

Each translation row should contain only:

  • its own ID
  • translationFor referencing the source media ID
  • language
  • title
  • alt
  • caption
  • normal timestamps needed for lifecycle and sync

The binary media file remains the canonical original file. Translation rows do not duplicate the asset itself.

2.2 Sidecar Format

Keep the canonical sidecar for canonical metadata, and add language-specific sidecars for translated metadata only.

Canonical sidecar stays with the original media file, for example:

  • image.jpg.meta

Translated metadata sidecars use the source filename plus the language code, for example:

  • image.jpg.fr.meta

Translated sidecars should contain only the supported translation fields:

translationFor: <original-media-id>
language: fr
title: ...
alt: ...
caption: ...

2.3 MediaEngine Extensions

Add translation-aware storage and lookup methods instead of treating media translations as separate media items.

  • Create/read/update translation records and translated sidecars.
  • Resolve translations for a media item by source media ID.
  • Prevent duplicate translations for the same (translationFor, language) pair.
  • Keep source media metadata authoritative; translations only override title, alt, and caption.
  • Keep getMedia(id) canonical-only.
  • Add explicit translation reads such as getMediaTranslation(mediaId, language) and getMediaTranslations(mediaId) instead of overloading getMedia() with an optional language parameter.
  • If callers need "best variant for language X", add a separate higher-level resolver rather than changing the semantics of the base media APIs.

Media APIs should expose an availableLanguages meta field derived from the translations table for every canonical media item.

2.4 AI Translation Tools

Add media-metadata translation on top of the existing one-shot AI tooling.

translateMediaMetadata(mediaId, targetLanguage)

  • Read the source media metadata plus image context needed for a faithful translation.
  • Determine source language from media.language (falling back to the project mainLanguage).
  • Return translated title, alt, and caption.
  • Create or update a translation record/sidecar from the returned data.

detectMediaLanguage(mediaId)

  • Read the canonical title, alt, and caption of a media item.
  • Use the same lightweight title model and detection pattern as detectPostLanguage.
  • Return the detected ISO language code.
  • Optionally persist the result to media.language if the caller requests it.

AI media analysis already exists; these steps are only about language-detection and translation-specific tooling.

2.5 Post-Triggered Media Translation Cascade

When a post is translated, all images linked to that post should be translated automatically so rendered output never mixes languages.

Trigger: After translatePost(postId, targetLanguage) successfully persists a post translation (§1.4), the system resolves all media linked to the source post via the postMedia junction table.

For each linked media item:

  1. Check whether the media already has a translation for targetLanguage (via getMediaTranslation(mediaId, targetLanguage)).
  2. If a translation already exists, skip — the image is already covered.
  3. If no translation exists, call translateMediaMetadata(mediaId, targetLanguage) (§2.4) to generate and persist the translated title, alt, and caption.

Design constraints:

  • The cascade is additive only — it never overwrites an existing media translation. Users who independently translate an image via quick action keep their version.
  • Images can still be translated independently at any time through their own quick action or the media translation panel (§2.7). The cascade merely ensures coverage; it does not create a hard coupling.
  • Failures on individual media translations should be logged but must not block the post translation from succeeding. Report partial failures to the UI so the user can retry individual images.
  • The cascade runs after the post translation is committed, not inside the same transaction, so a media-translation failure never rolls back post work.

2.6 Import And Sync Integration

Integrate with the existing media import and metadata sync flow without creating translated duplicate media records.

  1. Import the binary asset once as the canonical media item.
  2. Optionally queue metadata translation generation for configured target languages.
  3. Persist generated results as translation records/sidecars linked via translationFor.
  4. Extend metadata diff/sync tooling so canonical and translated sidecars can both be compared against the database safely.

2.7 API Surface

Expose translation metadata consistently across all media consumers:

  • Templates and Python scripts can read media.meta.availableLanguages.
  • Internal AI tools can inspect available translation languages.
  • MCP media APIs return the same availableLanguages data.
  • Python media-query APIs support filtering by language and by missing translation language, so callers can ask for media with French metadata or media missing Spanish metadata.
  • The same language and missing-language filters must be available to internal AI tools and MCP server queries.

2.8 UI — Translation Panel

In the media editor/details area, add a "Translations" section:

  • Show the canonical media language as a dropdown (same UX as the post language selector). Changing it updates media.language.
  • Provide a "Detect Language" button that calls detectMediaLanguage and updates the dropdown.
  • List existing metadata translations by language.
  • "Translate to..." creates or refreshes the separate translation record.
  • Show which configured languages are still missing.

Media list/detail views can optionally show a language-availability badge.

2.9 Rendering And Asset Use

When media metadata is consumed during rendering or editing:

  • Resolve localized title, alt, and caption from the translations table when a language-specific variant is requested.
  • Fall back to canonical metadata when no translation exists.
  • Keep URLs and binary asset references stable; only metadata changes by language.

3. Drag-and-Drop Image Insertion

Goal

Users can drag image files from the filesystem onto the editor to insert them. Dropped files are automatically imported into the media library and inserted as markdown images.

Current State

  • Images are inserted only via InsertModal (browse media library or enter URL).
  • MediaEngine.importMedia(sourcePath) handles file import, thumbnail generation, and database indexing.
  • imageResolverPlugin already converts relative media paths to bds-media:// protocol URLs for editor display.
  • LinkedMediaPanel has working drag-drop for reordering (reference pattern).
  • insertImageCommand from Milkdown inserts image nodes into the editor.

Implementation Plan

3.1 ProseMirror Drop Plugin

Create a new plugin in src/renderer/plugins/dropImagePlugin.ts following the imageResolverPlugin pattern:

// Pseudo-structure
export const dropImagePlugin = $prose(() => {
  return new Plugin({
    props: {
      handleDOMEvents: {
        drop: (view, event) => {
          // 1. Check for files in dataTransfer
          // 2. Filter to image types
          // 3. Get file paths (Electron exposes .path on File objects)
          // 4. For each file: import via IPC, insert into editor
          // 5. Return true to prevent default
        },
        dragover: (view, event) => {
          // Show drop indicator if files are images
        }
      }
    }
  });
});

3.2 Drop Handler Flow

For each dropped file:

  1. Validate — check file extension against supported image types (jpg, png, gif, webp, svg, bmp).
  2. Import — call window.electronAPI.media.import(file.path). This returns MediaData with the media ID and file path.
  3. Insert — use insertImageCommand with { src: relativePath, alt: '' } where relativePath is the media's storage path (e.g., media/2025/01/uuid.jpg).
  4. Link — call window.electronAPI.postMedia.link(postId, mediaId) to track the relationship.
  5. Resolve — the existing imageResolverPlugin will automatically convert the relative path to a bds-media:// URL for display.

3.3 Visual Feedback

  • On dragover with image files: add a CSS class to the editor container showing a drop zone indicator (border highlight or overlay).
  • On dragleave / drop: remove the indicator.
  • During import (for large files): show a small inline spinner or toast.

3.4 Integration into MilkdownEditor

In MilkdownEditor.tsx, register the new plugin alongside existing plugins:

import { dropImagePlugin } from '../../plugins/dropImagePlugin';

// In the editor setup, add to the plugin list
.use(dropImagePlugin)

Pass postId and the import callback to the plugin via the editor context or a shared ref.

3.5 Paste Support (Optional Extension)

The same plugin can handle paste events with image files:

  • Check clipboardData.files for images.
  • Same import → insert → link flow as drop.
  • This handles screenshots pasted from the clipboard.

3.6 Error Handling

  • Non-image files: ignore silently (don't prevent default, let editor handle text drops normally).
  • Import failure: show toast with error message, don't insert anything.
  • Multiple files: process sequentially, insert at cursor position for first, then append after each previous insertion.

3.7 Testing

  • Unit test the plugin's file validation logic.
  • Integration test: mock electronAPI.media.import, verify correct calls and editor state after drop.
  • Test edge cases: non-image files, failed imports, multiple simultaneous drops.

4. Multi-Language Blog Rendering (Phase 2)

Goal

The generated blog is fully navigable in each activated language. Every language gets its own route subtree (/en/, /de/, …), its own feeds, and its own sitemap entries. Media assets are shared; only HTML differs. The preview server must serve the same language-prefixed routes so the user can verify output before uploading.

Current State

  • Post and media translation schemas, CRUD, AI translation, and validation already exist (§1, §2).
  • PageRenderer already accepts preferredLanguage and resolves translations via resolveRenderablePost() and getMediaTranslation().
  • BlogGenerationEngine builds translation variants with .lang slug suffixes but writes everything to a flat html/ directory — no language-prefixed subtrees.
  • PreviewServer supports a ?lang= query parameter but has no language-prefixed routes.
  • ProjectMetadata has mainLanguage but no blogLanguages list.
  • No doNotTranslate flag on posts.
  • No automatic translation on post create/update.
  • No "Fill missing translations" batch tool.

Implementation Plan

4.0 Extract SUPPORTED_POST_LANGUAGES Constant

The list of supported post languages is currently hardcoded inline in AI task files (e.g. ['en', 'de', 'fr', 'it', 'es']). Extract it into a shared constant in src/main/shared/ (or similar) so that both AI tasks and the Blog Languages UI (§4.1) reference a single source of truth.

4.1 Project Preferences — Blog Languages

Add blogLanguages?: string[] to ProjectMetadata. This is the list of languages the blog is rendered in (e.g. ['en', 'de']). The mainLanguage is always implicitly included. When blogLanguages is empty or absent, the blog renders in mainLanguage only (current behaviour).

UI: Add a "Blog Languages" multi-select in the Project Settings panel, populated from SUPPORTED_POST_LANGUAGES. The main language is shown but cannot be removed. i18n keys: settings.project.blogLanguagesLabel, settings.project.blogLanguagesDescription.

4.2 Do-Not-Translate Flag

Add a boolean doNotTranslate column to the posts table (default false). Persist in YAML frontmatter as doNotTranslate: true. Migration required.

UI: Checkbox in the post editor metadata area, labelled via i18n (editor.doNotTranslateLabel).

Validate Translations must detect posts marked doNotTranslate that still have translations and offer to remove them.

4.3 Automatic Translation on Post Create/Update

When a canonical post is created or updated and blogLanguages contains languages beyond mainLanguage:

  1. For each active blog language missing a translation (skip if doNotTranslate is set), enqueue a TaskManager task calling chat:translatePost.
  2. On success, show a toast ("Translated to French"). On failure, show an error toast. Task progress is visible in the task panel.
  3. Only canonical content changes trigger re-translation. Editing a translation directly does not re-trigger anything.
  4. After each post translation succeeds, cascade to linked media: for every media item linked via postMedia that lacks a translation for the target language, enqueue chat:translateMediaMetadata.

4.4 Fill Missing Translations (Blog Menu Tool)

Add a "Fill Missing Translations" menu item under the Blog menu.

  1. Scan all published posts (excluding doNotTranslate) and all linked media for missing translations across blogLanguages.
  2. Create one task for post translations and a second task for media metadata translations.
  3. Report progress and partial failures via the task panel and toasts.
  4. This is separate from Validate Translations — validate checks consistency, fill adds missing content.

4.5 Route Generation — Main Language Flat, Alternatives Prefixed

The main language keeps the current flat route structure. Only additional blog languages get a language-prefixed subtree. This means single-language blogs see zero change from today's output.

html/
  index.html                 ← main language (flat, same as today)
  page/2/index.html
  2025/03/08/my-post/index.html
  category/tech/index.html
  tag/rust/index.html
  rss.xml
  atom.xml
  de/                        ← additional blog language subtree
    index.html
    page/2/index.html
    2025/03/08/my-post/index.html
    category/tech/index.html
    tag/rust/index.html
    rss.xml
    atom.xml
  sitemap.xml                ← combined, with hreflang alternates
  media/                     ← shared, not duplicated
  assets/                    ← shared, not duplicated

For the main language pass, generation works exactly as today — no prefix, no routing changes. For each additional language in blogLanguages:

  • Iterate the same route list, writing output under /{lang}/….
  • Resolve every post through resolveRenderablePost(post, engine, lang). If no translation exists, fall back to canonical content.
  • Same for media metadata in macros: getMediaTranslation(id, lang) with canonical fallback.
  • All internal links within a language subtree stay prefixed (/de/… links to /de/…). Main-language links remain unprefixed (/2025/…).
  • Posts marked doNotTranslate render only in the main language output. They are omitted from alternative language subtrees entirely.

4.6 Per-Language Feeds

The main language feeds (rss.xml, atom.xml) stay at the root as today. Each alternative language subtree gets its own rss.xml and atom.xml under /{lang}/, containing only posts available in that language, with URLs pointing into the language subtree. Feed <language> / xml:lang is set to the subtree language.

4.7 Combined Sitemap with hreflang

The root sitemap.xml lists all language variants of every URL. Each <url> entry includes <xhtml:link rel="alternate" hreflang="…" href="…"/> for every language the post is available in, plus x-default pointing to the main language variant.

4.8 Language Switcher in Templates

Add a blogLanguages array and currentLanguage string to the Liquid template context. Default templates render a language switcher bar (flag badges) at the top linking to the same page in each available language.

The switcher links are absolute paths — unprefixed for the main language (/2025/03/08/my-post/) and prefixed for alternatives (/de/2025/03/08/my-post/) — so they work regardless of route depth.

4.9 Preview Server — Language-Prefixed Routes

Extend PreviewServer to handle language-prefixed paths for alternative languages so preview matches the generated output:

  • GET /2025/03/08/my-post → render post in main language (unchanged).
  • GET /de/category/tech → render category list in German.
  • For paths starting with a known alternative language prefix, strip it and pass the language as preferredLanguage to renderRouteForContext().
  • Unprefixed paths use mainLanguage (current behaviour, no change).
  • Keep the existing ?lang= parameter as a fallback for single-post preview from the editor.
  • Language switcher links in preview HTML work because they use the same prefix scheme as generation.

This ensures the user sees the exact same route structure and language switching behaviour in preview as in the generated output.

4.10 Preview/Generation Parity Checklist

Both preview and generation must produce identical output for:

  • Main language routes remain flat/unprefixed.
  • Alternative language routes use /{lang}/… prefix.
  • Post content: translated title, excerpt, body with canonical fallback.
  • Media metadata in macros (gallery, photo_album): translated alt/title/ caption with canonical fallback.
  • Internal links: unprefixed for main language, prefixed for alternatives.
  • Language switcher rendering with correct cross-language links.
  • Per-language feed links in HTML <head>.
  • doNotTranslate posts omitted from alternative language subtrees.
  • Root / renders main language content (unchanged from today).

Shared implementation: both paths go through SharedRouteRendererPageRenderer, so language handling logic added there automatically applies to both preview and generation. The key change is making SharedRouteRenderer language-prefix-aware and ensuring BlogGenerationEngine iterates over blogLanguages when building routes.

4.11 Testing

  • Unit: Route prefix stripping, language fallback resolution, feed language filtering, doNotTranslate exclusion, sitemap hreflang building.
  • Integration: End-to-end generation with two languages produces correct subtree structure, shared assets, per-language feeds, combined sitemap.
  • Preview parity: Same route in preview and generation produces identical HTML (modulo asset URLs).