From 2a58699398d7c7207d5ee5001eff63d3b19e5995 Mon Sep 17 00:00:00 2001 From: hugo Date: Sun, 1 Mar 2026 12:35:43 +0100 Subject: [PATCH] chore: plan update --- LOCAL_AI_PLAN.md | 92 +++++++----------------------------------------- 1 file changed, 13 insertions(+), 79 deletions(-) diff --git a/LOCAL_AI_PLAN.md b/LOCAL_AI_PLAN.md index 2dd49a5..3aeaf45 100644 --- a/LOCAL_AI_PLAN.md +++ b/LOCAL_AI_PLAN.md @@ -1,98 +1,32 @@ -can you check this plan for viability and create a LOCAL_AI_PLAN.md into the project with a good starting plan for this topic? I would want to use that local model for the image generation, the import categorization and probably also as an option for the ai chat integrated into the app. so a bit more than just the image metadata, but my hope is that Qwen2.5 could help there, too. +# Local LLM integration for offline use -for now out of scope, will be looked at after mistral implementation is done. - -Image Metadata Generation Plan (M4 Optimized) - -This document outlines the implementation of a local AI pipeline to automatically generate titles, captions, and alt-texts for an image management application. +I want to implement support for Ollama as another engine to run models, so that I can be fully offline for travel blogging. 1. Core Architecture - - - - Hardware: MacBook Air M4 (16GB Unified Memory). - - Inference Engine: Ollama (provides a local OpenAI-compatible REST API). - - Primary Model: Qwen2.5-VL-7B (Quantized to 4-bit/q4_K_M). - - - - -Why: Best balance between spatial awareness (for alt-text) and memory footprint. - - +Why: Best balance between spatial awareness (for alt-text) and memory footprint. Important: it MUST support vision capabilities to create titles, captions and alt-texts for images. Secondary Model: Gemma 3 4B (for high-speed batch processing). -2. Setup Instructions +Decision: which model is better for my usecases? +2. Main Considerations +- fully offline capability +- useable for image titling/captioning/alt-texting +- useable for excerpts, summaries, tab-titling +- useable for AI chat assistant +3. Integration +Ollama is using OpenAI protocols, so should be easy to integrate as a third AI provider. -Install Ollama: Download from ollama.com. - - - -Pull the Model: Open Terminal and run: - -bash - -ollama pull qwen2.5-vl - -Verwende Code mit Vorsicht. - - - -Verify Performance: Ensure the model stays within the 16GB RAM limit by monitoring Activity Monitor (GPU/Memory pressure). - -3. Integration Logic (Python Example) - -To integrate this into your app, use the following logic to receive structured JSON data: - -python - -import ollama def generate_metadata(image_path): response = ollama.chat( model='qwen2.5-vl', messages=[{ 'role': 'user', 'content': 'Analyze this image. Return ONLY a JSON object with: "title", "caption", and "alt_text" (in German).', 'images': [image_path] }], format='json' # Forces the model to output valid JSON ) return response['message']['content'] - -Verwende Code mit Vorsicht. - -4. Prompt Engineering Strategy - -To ensure consistent results for image management, use a system prompt like this: - - - - - -Title: Max 5 words, catchy. - - - -Caption: One descriptive sentence for UI display. - - - -Alt-Text: Detailed description focused on accessibility (textures, positions, colors). - -5. Performance Considerations on M4 - - - - - -Thermal Management: Since the Air is fanless, batch process images in chunks (e.g., 20 at a time) to avoid thermal throttling. - - - -Memory Pressure: Close memory-heavy apps (like Chrome or Docker) when running the 7B model to ensure the GPU has maximum unified memory access. - - +Important: models for different defaults in Preferences must be able to be configured to span multiple providers if multiple ones are set up. +Ollama integration - if activated - must do a check if ollama is serving the model, and if not give a message to the user, so they can fire up ollama, since it won't always be running.