feat: first tries at save dialog, so far failing
This commit is contained in:
33
CLAUDE.md
33
CLAUDE.md
@@ -1,6 +1,6 @@
|
||||
# MLX Server
|
||||
|
||||
Native macOS SwiftUI app for local LLMs on Apple Silicon via MLX. Provides a chat UI and an embedded OpenAI-compatible API server. Supports vision and tool use.
|
||||
Native macOS SwiftUI app for local LLMs on Apple Silicon via MLX. Provides a chat UI and an embedded OpenAI-compatible API server. Supports vision, tool use, and thinking mode.
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -14,18 +14,24 @@ open "build/Debug/MLX Server.app"
|
||||
|
||||
## Project Structure
|
||||
|
||||
- `MLXServer/MLXServerApp.swift` — App entry point, GPU cache config
|
||||
- `MLXServer/ContentView.swift` — Main layout, toolbar, keyboard shortcuts
|
||||
- `MLXServer/MLXServerApp.swift` — App entry point, GPU cache config, menu commands
|
||||
- `MLXServer/ContentView.swift` — Main layout, toolbar, keyboard shortcuts, focused values
|
||||
- `MLXServer/Models/ModelConfig.swift` — Model definitions (alias, repoId, contextLength), resolution
|
||||
- `MLXServer/Models/ChatMessage.swift` — Chat message data model
|
||||
- `MLXServer/ViewModels/ModelManager.swift` — Model loading/switching via VLMModelFactory, offline-first resolution
|
||||
- `MLXServer/Models/ChatMessage.swift` — Chat message data model, `<think>` tag parsing
|
||||
- `MLXServer/ViewModels/ModelManager.swift` — Model loading/switching via VLMModelFactory, download tracking, idle unload
|
||||
- `MLXServer/ViewModels/ChatViewModel.swift` — Chat state, ChatSession management, API server lifecycle
|
||||
- `MLXServer/Server/APIServer.swift` — NWListener HTTP server, SSE streaming, KV cache reuse, vision, tool call handling
|
||||
- `MLXServer/Server/APIModels.swift` — OpenAI-compatible Codable structs
|
||||
- `MLXServer/Server/ToolCallParser.swift` — Parses tool calls from model output (Gemma tool_code, Qwen XML tags)
|
||||
- `MLXServer/Server/ToolPromptBuilder.swift` — Model-specific tool prompt formatting
|
||||
- `MLXServer/Utilities/LocalModelResolver.swift` — Resolves HF repo IDs to ~/.cache/huggingface/hub/ snapshots
|
||||
- `MLXServer/Utilities/Preferences.swift` — UserDefaults wrapper
|
||||
- `MLXServer/Views/DownloadModalView.swift` — Modal overlay for model download progress
|
||||
- `MLXServer/Views/ChatMessagesView.swift` — Message bubbles with markdown rendering and collapsible thinking blocks
|
||||
- `MLXServer/Views/ChatInputView.swift` — Text input, image attach (file picker, drag & drop, Finder copy-paste)
|
||||
- `MLXServer/Commands/SaveChatCommands.swift` — File > Export Chat menu command
|
||||
- `MLXServer/Utilities/LocalModelResolver.swift` — Resolves HF repo IDs to local snapshots (sandbox + system cache + flat layouts)
|
||||
- `MLXServer/Utilities/ChatExporter.swift` — Export conversations to Markdown or RTF (Pages-compatible)
|
||||
- `MLXServer/Utilities/FocusedValues.swift` — FocusedValue keys for menu bar integration
|
||||
- `MLXServer/Utilities/Preferences.swift` — UserDefaults wrapper (model, thinking mode, API, idle timeout)
|
||||
- `project.yml` — xcodegen project spec
|
||||
- `build.sh` — Build script (xcodegen + xcodebuild)
|
||||
|
||||
@@ -35,6 +41,9 @@ open "build/Debug/MLX Server.app"
|
||||
|-------|---------------|-------|
|
||||
| `gemma` | `mlx-community/gemma-3-4b-it-4bit` | Vision + tool use via `tool_code` blocks (128k context) |
|
||||
| `qwen` | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | Vision + tool use via `<tool_call>` tags (256k context) |
|
||||
| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | Thinking mode, tool use (256k context) |
|
||||
|
||||
Any model in MLX format on HuggingFace can be added — no restriction on uploader or architecture.
|
||||
|
||||
## Critical Performance Rule
|
||||
|
||||
@@ -47,9 +56,15 @@ open "build/Debug/MLX Server.app"
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) as the inference backend — supports both text and vision in a single model load
|
||||
- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) as the inference backend — loads any MLX-format model from HuggingFace
|
||||
- Model-specific prompt formatting: Gemma uses `tool_code` blocks; Qwen uses `<tool_call>` XML tags
|
||||
- Offline-first: if the model is already cached locally (~/.cache/huggingface/hub/), `LocalModelResolver` resolves the local snapshot path directly — no network requests
|
||||
- **Offline-first**: `LocalModelResolver` checks the sandboxed app container, system `~/.cache/huggingface/hub/`, and flat download layouts — no network requests if model is cached
|
||||
- **No duplicate storage**: custom `HubApi(cache: nil)` with explicit `downloadBase` — models stored once in the snapshot cache, not duplicated across blob cache and snapshots
|
||||
- **Thinking mode**: `enable_thinking` passed to Jinja template context via `additionalContext`; `<think>...</think>` tags parsed in real-time during streaming and shown in collapsible UI blocks. Toggleable in Settings.
|
||||
- **Download progress**: separate `isDownloading` state from `isLoading`; modal overlay shows file count, percentage, speed
|
||||
- **Idle unload**: timer resets on both user input and model generation completion (not just request start)
|
||||
- **Chat export**: Markdown (user messages as blockquotes) and RTF (Pages-compatible with formatted markdown)
|
||||
- **Finder paste**: local event monitor intercepts Cmd+V to check pasteboard for image file URLs before TextField handles it
|
||||
- HTTP server built on `Network.framework` (`NWListener`) — no third-party server dependencies
|
||||
- KV cache reuse across API requests — reuses `ChatSession` when conversation history prefix matches
|
||||
- GPU cache limit set to 20 MB; cache cleared on model unload
|
||||
|
||||
Reference in New Issue
Block a user