bDS/LOCAL_AI_PLAN.md

can you check this plan for viability and create a LOCAL_AI_PLAN.md into the project with a good starting plan for this topic? I would want to use that local model for the image generation, the import categorization and probably also as an option for the ai chat integrated into the app. so a bit more than just the image metadata, but my hope is that Qwen2.5 could help there, too.

for now out of scope, will be looked at after mistral implementation is done.

Image Metadata Generation Plan (M4 Optimized)

This document outlines the implementation of a local AI pipeline to automatically generate titles, captions, and alt-texts for an image management application.

1. Core Architecture


Hardware: MacBook Air M4 (16GB Unified Memory).


Inference Engine: Ollama (provides a local OpenAI-compatible REST API).


Primary Model: Qwen2.5-VL-7B (Quantized to 4-bit/q4_K_M).


Why: Best balance between spatial awareness (for alt-text) and memory footprint.


Secondary Model: Gemma 3 4B (for high-speed batch processing).

2. Setup Instructions


Install Ollama: Download from ollama.com.


Pull the Model: Open Terminal and run:

bash

ollama pull qwen2.5-vl

Verwende Code mit Vorsicht.


Verify Performance: Ensure the model stays within the 16GB RAM limit by monitoring Activity Monitor (GPU/Memory pressure).

3. Integration Logic (Python Example)

To integrate this into your app, use the following logic to receive structured JSON data:

python

import ollama  def generate_metadata(image_path):     response = ollama.chat(         model='qwen2.5-vl',         messages=[{             'role': 'user',             'content': 'Analyze this image. Return ONLY a JSON object with: "title", "caption", and "alt_text" (in German).',             'images': [image_path]         }],         format='json' # Forces the model to output valid JSON     )     return response['message']['content']

Verwende Code mit Vorsicht.

4. Prompt Engineering Strategy

To ensure consistent results for image management, use a system prompt like this:


Title: Max 5 words, catchy.


Caption: One descriptive sentence for UI display.


Alt-Text: Detailed description focused on accessibility (textures, positions, colors).

5. Performance Considerations on M4


Thermal Management: Since the Air is fanless, batch process images in chunks (e.g., 20 at a time) to avoid thermal throttling.


Memory Pressure: Close memory-heavy apps (like Chrome or Docker) when running the 7B model to ensure the GPU has maximum unified memory access.