chore: plan update
This commit is contained in:
@@ -1,98 +1,32 @@
|
|||||||
can you check this plan for viability and create a LOCAL_AI_PLAN.md into the project with a good starting plan for this topic? I would want to use that local model for the image generation, the import categorization and probably also as an option for the ai chat integrated into the app. so a bit more than just the image metadata, but my hope is that Qwen2.5 could help there, too.
|
# Local LLM integration for offline use
|
||||||
|
|
||||||
for now out of scope, will be looked at after mistral implementation is done.
|
I want to implement support for Ollama as another engine to run models, so that I can be fully offline for travel blogging.
|
||||||
|
|
||||||
Image Metadata Generation Plan (M4 Optimized)
|
|
||||||
|
|
||||||
This document outlines the implementation of a local AI pipeline to automatically generate titles, captions, and alt-texts for an image management application.
|
|
||||||
|
|
||||||
1. Core Architecture
|
1. Core Architecture
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Hardware: MacBook Air M4 (16GB Unified Memory).
|
Hardware: MacBook Air M4 (16GB Unified Memory).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Inference Engine: Ollama (provides a local OpenAI-compatible REST API).
|
Inference Engine: Ollama (provides a local OpenAI-compatible REST API).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Primary Model: Qwen2.5-VL-7B (Quantized to 4-bit/q4_K_M).
|
Primary Model: Qwen2.5-VL-7B (Quantized to 4-bit/q4_K_M).
|
||||||
|
|
||||||
|
Why: Best balance between spatial awareness (for alt-text) and memory footprint. Important: it MUST support vision capabilities to create titles, captions and alt-texts for images.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Why: Best balance between spatial awareness (for alt-text) and memory footprint.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Secondary Model: Gemma 3 4B (for high-speed batch processing).
|
Secondary Model: Gemma 3 4B (for high-speed batch processing).
|
||||||
|
|
||||||
2. Setup Instructions
|
Decision: which model is better for my usecases?
|
||||||
|
|
||||||
|
2. Main Considerations
|
||||||
|
|
||||||
|
- fully offline capability
|
||||||
|
- useable for image titling/captioning/alt-texting
|
||||||
|
- useable for excerpts, summaries, tab-titling
|
||||||
|
- useable for AI chat assistant
|
||||||
|
|
||||||
|
3. Integration
|
||||||
|
|
||||||
|
Ollama is using OpenAI protocols, so should be easy to integrate as a third AI provider.
|
||||||
|
|
||||||
Install Ollama: Download from ollama.com.
|
Important: models for different defaults in Preferences must be able to be configured to span multiple providers if multiple ones are set up.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Pull the Model: Open Terminal and run:
|
|
||||||
|
|
||||||
bash
|
|
||||||
|
|
||||||
ollama pull qwen2.5-vl
|
|
||||||
|
|
||||||
Verwende Code mit Vorsicht.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Verify Performance: Ensure the model stays within the 16GB RAM limit by monitoring Activity Monitor (GPU/Memory pressure).
|
|
||||||
|
|
||||||
3. Integration Logic (Python Example)
|
|
||||||
|
|
||||||
To integrate this into your app, use the following logic to receive structured JSON data:
|
|
||||||
|
|
||||||
python
|
|
||||||
|
|
||||||
import ollama def generate_metadata(image_path): response = ollama.chat( model='qwen2.5-vl', messages=[{ 'role': 'user', 'content': 'Analyze this image. Return ONLY a JSON object with: "title", "caption", and "alt_text" (in German).', 'images': [image_path] }], format='json' # Forces the model to output valid JSON ) return response['message']['content']
|
|
||||||
|
|
||||||
Verwende Code mit Vorsicht.
|
|
||||||
|
|
||||||
4. Prompt Engineering Strategy
|
|
||||||
|
|
||||||
To ensure consistent results for image management, use a system prompt like this:
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Title: Max 5 words, catchy.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Caption: One descriptive sentence for UI display.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Alt-Text: Detailed description focused on accessibility (textures, positions, colors).
|
|
||||||
|
|
||||||
5. Performance Considerations on M4
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Thermal Management: Since the Air is fanless, batch process images in chunks (e.g., 20 at a time) to avoid thermal throttling.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Memory Pressure: Close memory-heavy apps (like Chrome or Docker) when running the 7B model to ensure the GPU has maximum unified memory access.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Ollama integration - if activated - must do a check if ollama is serving the model, and if not give a message to the user, so they can fire up ollama, since it won't always be running.
|
||||||
|
|||||||
Reference in New Issue
Block a user