chore: plan update

This commit is contained in:
2026-03-01 12:35:43 +01:00
parent f21dc74113
commit 2a58699398

View File

@@ -1,98 +1,32 @@
can you check this plan for viability and create a LOCAL_AI_PLAN.md into the project with a good starting plan for this topic? I would want to use that local model for the image generation, the import categorization and probably also as an option for the ai chat integrated into the app. so a bit more than just the image metadata, but my hope is that Qwen2.5 could help there, too.
# Local LLM integration for offline use
for now out of scope, will be looked at after mistral implementation is done.
Image Metadata Generation Plan (M4 Optimized)
This document outlines the implementation of a local AI pipeline to automatically generate titles, captions, and alt-texts for an image management application.
I want to implement support for Ollama as another engine to run models, so that I can be fully offline for travel blogging.
1. Core Architecture
Hardware: MacBook Air M4 (16GB Unified Memory).
Inference Engine: Ollama (provides a local OpenAI-compatible REST API).
Primary Model: Qwen2.5-VL-7B (Quantized to 4-bit/q4_K_M).
Why: Best balance between spatial awareness (for alt-text) and memory footprint.
Why: Best balance between spatial awareness (for alt-text) and memory footprint. Important: it MUST support vision capabilities to create titles, captions and alt-texts for images.
Secondary Model: Gemma 3 4B (for high-speed batch processing).
2. Setup Instructions
Decision: which model is better for my usecases?
2. Main Considerations
- fully offline capability
- useable for image titling/captioning/alt-texting
- useable for excerpts, summaries, tab-titling
- useable for AI chat assistant
3. Integration
Ollama is using OpenAI protocols, so should be easy to integrate as a third AI provider.
Install Ollama: Download from ollama.com.
Pull the Model: Open Terminal and run:
bash
ollama pull qwen2.5-vl
Verwende Code mit Vorsicht.
Verify Performance: Ensure the model stays within the 16GB RAM limit by monitoring Activity Monitor (GPU/Memory pressure).
3. Integration Logic (Python Example)
To integrate this into your app, use the following logic to receive structured JSON data:
python
import ollama def generate_metadata(image_path): response = ollama.chat( model='qwen2.5-vl', messages=[{ 'role': 'user', 'content': 'Analyze this image. Return ONLY a JSON object with: "title", "caption", and "alt_text" (in German).', 'images': [image_path] }], format='json' # Forces the model to output valid JSON ) return response['message']['content']
Verwende Code mit Vorsicht.
4. Prompt Engineering Strategy
To ensure consistent results for image management, use a system prompt like this:
Title: Max 5 words, catchy.
Caption: One descriptive sentence for UI display.
Alt-Text: Detailed description focused on accessibility (textures, positions, colors).
5. Performance Considerations on M4
Thermal Management: Since the Air is fanless, batch process images in chunks (e.g., 20 at a time) to avoid thermal throttling.
Memory Pressure: Close memory-heavy apps (like Chrome or Docker) when running the 7B model to ensure the GPU has maximum unified memory access.
Important: models for different defaults in Preferences must be able to be configured to span multiple providers if multiple ones are set up.
Ollama integration - if activated - must do a check if ollama is serving the model, and if not give a message to the user, so they can fire up ollama, since it won't always be running.