# Local LLM integration for offline use I want to implement support for Ollama as another engine to run models, so that I can be fully offline for travel blogging. 1. Core Architecture Hardware: MacBook Air M4 (16GB Unified Memory). Inference Engine: Ollama (provides a local OpenAI-compatible REST API). Primary Model: Qwen2.5-VL-7B (Quantized to 4-bit/q4_K_M). Why: Best balance between spatial awareness (for alt-text) and memory footprint. Important: it MUST support vision capabilities to create titles, captions and alt-texts for images. Secondary Model: Gemma 3 4B (for high-speed batch processing). Decision: which model is better for my usecases? 2. Main Considerations - fully offline capability - useable for image titling/captioning/alt-texting - useable for excerpts, summaries, tab-titling - useable for AI chat assistant 3. Integration Ollama is using OpenAI protocols, so should be easy to integrate with AI SDK. Ollama integration - if activated - must do a check if ollama is serving the model, and if not give a message to the user, so they can fire up ollama, since it won't always be running.