From 5938aa9642e86922d2046ee4d85930dac9521652 Mon Sep 17 00:00:00 2001
From: hugo <hugoms@me.com>
Date: Sun, 1 Mar 2026 09:58:33 +0100
Subject: [PATCH] chore: added local AI plan

---
 LOCAL_AI_PLAN.md | 98 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)
 create mode 100644 LOCAL_AI_PLAN.md

diff --git a/LOCAL_AI_PLAN.md b/LOCAL_AI_PLAN.md
new file mode 100644
index 0000000..2dd49a5
--- /dev/null
+++ b/LOCAL_AI_PLAN.md
@@ -0,0 +1,98 @@
+can you check this plan for viability and create a LOCAL_AI_PLAN.md into the project with a good starting plan for this topic? I would want to use that local model for the image generation, the import categorization and probably also as an option for the ai chat integrated into the app. so a bit more than just the image metadata, but my hope is that Qwen2.5 could help there, too. 
+
+for now out of scope, will be looked at after mistral implementation is done.
+
+Image Metadata Generation Plan (M4 Optimized)
+
+This document outlines the implementation of a local AI pipeline to automatically generate titles, captions, and alt-texts for an image management application.
+
+1. Core Architecture
+
+
+
+
+
+Hardware: MacBook Air M4 (16GB Unified Memory).
+
+
+
+Inference Engine: Ollama (provides a local OpenAI-compatible REST API).
+
+
+
+Primary Model: Qwen2.5-VL-7B (Quantized to 4-bit/q4_K_M).
+
+
+
+
+
+Why: Best balance between spatial awareness (for alt-text) and memory footprint.
+
+
+
+Secondary Model: Gemma 3 4B (for high-speed batch processing).
+
+2. Setup Instructions
+
+
+
+
+
+Install Ollama: Download from ollama.com.
+
+
+
+Pull the Model: Open Terminal and run:
+
+bash
+
+ollama pull qwen2.5-vl 
+
+Verwende Code mit Vorsicht.
+
+
+
+Verify Performance: Ensure the model stays within the 16GB RAM limit by monitoring Activity Monitor (GPU/Memory pressure).
+
+3. Integration Logic (Python Example)
+
+To integrate this into your app, use the following logic to receive structured JSON data:
+
+python
+
+import ollama  def generate_metadata(image_path):     response = ollama.chat(         model='qwen2.5-vl',         messages=[{             'role': 'user',             'content': 'Analyze this image. Return ONLY a JSON object with: "title", "caption", and "alt_text" (in German).',             'images': [image_path]         }],         format='json' # Forces the model to output valid JSON     )     return response['message']['content'] 
+
+Verwende Code mit Vorsicht.
+
+4. Prompt Engineering Strategy
+
+To ensure consistent results for image management, use a system prompt like this:
+
+
+
+
+
+Title: Max 5 words, catchy.
+
+
+
+Caption: One descriptive sentence for UI display.
+
+
+
+Alt-Text: Detailed description focused on accessibility (textures, positions, colors).
+
+5. Performance Considerations on M4
+
+
+
+
+
+Thermal Management: Since the Air is fanless, batch process images in chunks (e.g., 20 at a time) to avoid thermal throttling.
+
+
+
+Memory Pressure: Close memory-heavy apps (like Chrome or Docker) when running the 7B model to ensure the GPU has maximum unified memory access.
+
+
+