feat: added gemma 4 (not supported yet in mlx-swift-lm, though)

2026-04-28 21:52:32 +02:00
parent d5b9ae15cc
commit 4ad46ec1ea
4 changed files with 208 additions and 31 deletions
--- a/README.md
+++ b/README.md
@@ -141,7 +141,7 @@ MLXServer/
 │   ├── ToolCallParser.swift        — Parses tool calls from model output
 │   └── ToolPromptBuilder.swift     — Model-specific tool prompt formatting
 └── Utilities/
-    ├── LocalModelResolver.swift    — Offline-first HuggingFace cache resolution (sandbox + system)
+    ├── LocalModelResolver.swift    — Offline-first HuggingFace cache resolution
    ├── ChatExporter.swift          — Export conversations to Markdown or RTF
    ├── FocusedValues.swift         — FocusedValue keys for menu bar integration
    └── Preferences.swift           — UserDefaults wrapper, including scene persistence
@@ -153,7 +153,7 @@ build.sh        — One-command build script (xcodegen + xcodebuild)
 ## Key Design Decisions

 - Uses `mlx-swift-lm` for inference — `VLMModelFactory` for vision models and `LLMModelFactory` for text-only models
- **Offline-first**: `LocalModelResolver` checks both the sandboxed app container and `~/.cache/huggingface/hub/` for locally-cached models before downloading
+- **Offline-first**: `LocalModelResolver` checks `~/.cache/huggingface/hub/` for locally-cached models before downloading
 - **No duplicate storage**: custom `HubApi` with blob cache disabled — models are stored once in the snapshot cache
 - **KV cache reuse** across API requests — reuses `ChatSession` when conversation history prefix matches
 - **Thinking mode**: `enable_thinking` passed via Jinja template context; `<think>` tags parsed in real-time during streaming