Files
bDS/LOCAL_AI_PLAN.md

1.1 KiB

Local LLM integration for offline use

I want to implement support for Ollama as another engine to run models, so that I can be fully offline for travel blogging.

  1. Core Architecture

Hardware: MacBook Air M4 (16GB Unified Memory).

Inference Engine: Ollama (provides a local OpenAI-compatible REST API).

Primary Model: Qwen2.5-VL-7B (Quantized to 4-bit/q4_K_M).

Why: Best balance between spatial awareness (for alt-text) and memory footprint. Important: it MUST support vision capabilities to create titles, captions and alt-texts for images.

Secondary Model: Gemma 3 4B (for high-speed batch processing).

Decision: which model is better for my usecases?

  1. Main Considerations
  • fully offline capability
  • useable for image titling/captioning/alt-texting
  • useable for excerpts, summaries, tab-titling
  • useable for AI chat assistant
  1. Integration

Ollama is using OpenAI protocols, so should be easy to integrate with AI SDK.

Ollama integration - if activated - must do a check if ollama is serving the model, and if not give a message to the user, so they can fire up ollama, since it won't always be running.