feat: first tries at save dialog, so far failing

2026-03-18 11:40:43 +01:00
parent af8b8c9532
commit 82a77fdb0a
11 changed files with 445 additions and 128 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,6 +1,6 @@
 # MLX Server

-Native macOS SwiftUI app for local LLMs on Apple Silicon via MLX. Provides a chat UI and an embedded OpenAI-compatible API server. Supports vision and tool use.
+Native macOS SwiftUI app for local LLMs on Apple Silicon via MLX. Provides a chat UI and an embedded OpenAI-compatible API server. Supports vision, tool use, and thinking mode.

 ## Quick Start

@@ -14,18 +14,24 @@ open "build/Debug/MLX Server.app"

 ## Project Structure

- `MLXServer/MLXServerApp.swift` — App entry point, GPU cache config
- `MLXServer/ContentView.swift` — Main layout, toolbar, keyboard shortcuts
+- `MLXServer/MLXServerApp.swift` — App entry point, GPU cache config, menu commands
+- `MLXServer/ContentView.swift` — Main layout, toolbar, keyboard shortcuts, focused values
 - `MLXServer/Models/ModelConfig.swift` — Model definitions (alias, repoId, contextLength), resolution
- `MLXServer/Models/ChatMessage.swift` — Chat message data model
- `MLXServer/ViewModels/ModelManager.swift` — Model loading/switching via VLMModelFactory, offline-first resolution
+- `MLXServer/Models/ChatMessage.swift` — Chat message data model, `<think>` tag parsing
+- `MLXServer/ViewModels/ModelManager.swift` — Model loading/switching via VLMModelFactory, download tracking, idle unload
 - `MLXServer/ViewModels/ChatViewModel.swift` — Chat state, ChatSession management, API server lifecycle
 - `MLXServer/Server/APIServer.swift` — NWListener HTTP server, SSE streaming, KV cache reuse, vision, tool call handling
 - `MLXServer/Server/APIModels.swift` — OpenAI-compatible Codable structs
 - `MLXServer/Server/ToolCallParser.swift` — Parses tool calls from model output (Gemma tool_code, Qwen XML tags)
 - `MLXServer/Server/ToolPromptBuilder.swift` — Model-specific tool prompt formatting
- `MLXServer/Utilities/LocalModelResolver.swift` — Resolves HF repo IDs to ~/.cache/huggingface/hub/ snapshots
- `MLXServer/Utilities/Preferences.swift` — UserDefaults wrapper
+- `MLXServer/Views/DownloadModalView.swift` — Modal overlay for model download progress
+- `MLXServer/Views/ChatMessagesView.swift` — Message bubbles with markdown rendering and collapsible thinking blocks
+- `MLXServer/Views/ChatInputView.swift` — Text input, image attach (file picker, drag & drop, Finder copy-paste)
+- `MLXServer/Commands/SaveChatCommands.swift` — File > Export Chat menu command
+- `MLXServer/Utilities/LocalModelResolver.swift` — Resolves HF repo IDs to local snapshots (sandbox + system cache + flat layouts)
+- `MLXServer/Utilities/ChatExporter.swift` — Export conversations to Markdown or RTF (Pages-compatible)
+- `MLXServer/Utilities/FocusedValues.swift` — FocusedValue keys for menu bar integration
+- `MLXServer/Utilities/Preferences.swift` — UserDefaults wrapper (model, thinking mode, API, idle timeout)
 - `project.yml` — xcodegen project spec
 - `build.sh` — Build script (xcodegen + xcodebuild)

@@ -35,6 +41,9 @@ open "build/Debug/MLX Server.app"
 |-------|---------------|-------|
 | `gemma` | `mlx-community/gemma-3-4b-it-4bit` | Vision + tool use via `tool_code` blocks (128k context) |
 | `qwen` | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | Vision + tool use via `<tool_call>` tags (256k context) |
+| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | Thinking mode, tool use (256k context) |
+
+Any model in MLX format on HuggingFace can be added — no restriction on uploader or architecture.

 ## Critical Performance Rule

@@ -47,9 +56,15 @@ open "build/Debug/MLX Server.app"

 ## Key Design Decisions

- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) as the inference backend — supports both text and vision in a single model load
+- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) as the inference backend — loads any MLX-format model from HuggingFace
 - Model-specific prompt formatting: Gemma uses `tool_code` blocks; Qwen uses `<tool_call>` XML tags
- Offline-first: if the model is already cached locally (~/.cache/huggingface/hub/), `LocalModelResolver` resolves the local snapshot path directly — no network requests
+- **Offline-first**: `LocalModelResolver` checks the sandboxed app container, system `~/.cache/huggingface/hub/`, and flat download layouts — no network requests if model is cached
+- **No duplicate storage**: custom `HubApi(cache: nil)` with explicit `downloadBase` — models stored once in the snapshot cache, not duplicated across blob cache and snapshots
+- **Thinking mode**: `enable_thinking` passed to Jinja template context via `additionalContext`; `<think>...</think>` tags parsed in real-time during streaming and shown in collapsible UI blocks. Toggleable in Settings.
+- **Download progress**: separate `isDownloading` state from `isLoading`; modal overlay shows file count, percentage, speed
+- **Idle unload**: timer resets on both user input and model generation completion (not just request start)
+- **Chat export**: Markdown (user messages as blockquotes) and RTF (Pages-compatible with formatted markdown)
+- **Finder paste**: local event monitor intercepts Cmd+V to check pasteboard for image file URLs before TextField handles it
 - HTTP server built on `Network.framework` (`NWListener`) — no third-party server dependencies
 - KV cache reuse across API requests — reuses `ChatSession` when conversation history prefix matches
 - GPU cache limit set to 20 MB; cache cleared on model unload
--- a/MLXServer.xcodeproj/project.pbxproj
+++ b/MLXServer.xcodeproj/project.pbxproj
@@ -10,9 +10,12 @@
 		0168AEE16009097901363E16 /* ModelManager.swift in Sources */ = {isa = PBXBuildFile; fileRef = 922CBDC9206737BD04AF2874 /* ModelManager.swift */; };
 		165E8AB6ADAE1D59B1A86420 /* Preferences.swift in Sources */ = {isa = PBXBuildFile; fileRef = 145B888FBDD4F931512C5473 /* Preferences.swift */; };
 		189362AAE2CDE5D4B3428334 /* ToolCallParser.swift in Sources */ = {isa = PBXBuildFile; fileRef = E73B165A1822729C907791AE /* ToolCallParser.swift */; };
+		29879D696584B96CC56560DF /* ChatExporter.swift in Sources */ = {isa = PBXBuildFile; fileRef = D7C9BAD674E29688ACE53B0B /* ChatExporter.swift */; };
 		2CAAF7129F7CC45200FA9F6B /* ModelPickerView.swift in Sources */ = {isa = PBXBuildFile; fileRef = C3C3A76C02AF70A9D8F868FC /* ModelPickerView.swift */; };
 		2D08769282BD71C170DB0943 /* InferenceStats.swift in Sources */ = {isa = PBXBuildFile; fileRef = E35452B166893B25E765FF70 /* InferenceStats.swift */; };
+		4158FA884D981D73288FB74C /* SaveChatCommands.swift in Sources */ = {isa = PBXBuildFile; fileRef = 2E2FCA55CEBEBCED78D9479A /* SaveChatCommands.swift */; };
 		4CB13DC1AC7A500DDBB443EC /* ChatInputView.swift in Sources */ = {isa = PBXBuildFile; fileRef = E5E6AD02CDF23BDAB64700A7 /* ChatInputView.swift */; };
+		4DC033E45880B2948B47DEB1 /* FocusedValues.swift in Sources */ = {isa = PBXBuildFile; fileRef = EF518FEBF3A38E830E3CE1A5 /* FocusedValues.swift */; };
 		50B6861FF8610B3ED4FFAD9D /* MLXServerApp.swift in Sources */ = {isa = PBXBuildFile; fileRef = C67742651DB486871CEF1612 /* MLXServerApp.swift */; };
 		50DD129CCF2843482DEC3B96 /* APIServer.swift in Sources */ = {isa = PBXBuildFile; fileRef = 3D08828E16B17EF02C14243E /* APIServer.swift */; };
 		5946258F1DE88CE904584E0B /* ContentView.swift in Sources */ = {isa = PBXBuildFile; fileRef = 944C699FBB76C734C9DF2F2E /* ContentView.swift */; };
@@ -38,6 +41,7 @@
 		145B888FBDD4F931512C5473 /* Preferences.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = Preferences.swift; sourceTree = "<group>"; };
 		16AE82A64D1D07AE3CD8D33A /* ToolPromptBuilder.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ToolPromptBuilder.swift; sourceTree = "<group>"; };
 		2DC8C86D397B1FCA08E07CBD /* DownloadModalView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = DownloadModalView.swift; sourceTree = "<group>"; };
+		2E2FCA55CEBEBCED78D9479A /* SaveChatCommands.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = SaveChatCommands.swift; sourceTree = "<group>"; };
 		38DFC212AF4359A45FBE22BA /* ModelConfig.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ModelConfig.swift; sourceTree = "<group>"; };
 		3AF462805202797F61422AEE /* MLXServer.entitlements */ = {isa = PBXFileReference; lastKnownFileType = text.plist.entitlements; path = MLXServer.entitlements; sourceTree = "<group>"; };
 		3D08828E16B17EF02C14243E /* APIServer.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = APIServer.swift; sourceTree = "<group>"; };
@@ -53,10 +57,12 @@
 		C3C3A76C02AF70A9D8F868FC /* ModelPickerView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ModelPickerView.swift; sourceTree = "<group>"; };
 		C67742651DB486871CEF1612 /* MLXServerApp.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = MLXServerApp.swift; sourceTree = "<group>"; };
 		D733A0D1D4AC25DDDA6C8684 /* LocalModelResolver.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = LocalModelResolver.swift; sourceTree = "<group>"; };
+		D7C9BAD674E29688ACE53B0B /* ChatExporter.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ChatExporter.swift; sourceTree = "<group>"; };
 		DB1A5E8B1C9F2BC4D262C53A /* ChatMessagesView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ChatMessagesView.swift; sourceTree = "<group>"; };
 		E35452B166893B25E765FF70 /* InferenceStats.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = InferenceStats.swift; sourceTree = "<group>"; };
 		E5E6AD02CDF23BDAB64700A7 /* ChatInputView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ChatInputView.swift; sourceTree = "<group>"; };
 		E73B165A1822729C907791AE /* ToolCallParser.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ToolCallParser.swift; sourceTree = "<group>"; };
+		EF518FEBF3A38E830E3CE1A5 /* FocusedValues.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = FocusedValues.swift; sourceTree = "<group>"; };
 		F1A52E2C9964ADA9D841A89B /* APIModels.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = APIModels.swift; sourceTree = "<group>"; };
 /* End PBXFileReference section */

@@ -78,6 +84,8 @@
 		05B1BAE308E64D2FB2E73823 /* Utilities */ = {
 			isa = PBXGroup;
 			children = (
+				D7C9BAD674E29688ACE53B0B /* ChatExporter.swift */,
+				EF518FEBF3A38E830E3CE1A5 /* FocusedValues.swift */,
 				D733A0D1D4AC25DDDA6C8684 /* LocalModelResolver.swift */,
 				145B888FBDD4F931512C5473 /* Preferences.swift */,
 			);
@@ -99,6 +107,7 @@
 				944C699FBB76C734C9DF2F2E /* ContentView.swift */,
 				3AF462805202797F61422AEE /* MLXServer.entitlements */,
 				C67742651DB486871CEF1612 /* MLXServerApp.swift */,
+				B459409ED6FD8797FDD81E94 /* Commands */,
 				BD0E350482D91238B4B59721 /* Models */,
 				E13C1AAA0C49D0ED85EFD94D /* Server */,
 				05B1BAE308E64D2FB2E73823 /* Utilities */,
@@ -122,6 +131,14 @@
 			path = Views;
 			sourceTree = "<group>";
 		};
+		B459409ED6FD8797FDD81E94 /* Commands */ = {
+			isa = PBXGroup;
+			children = (
+				2E2FCA55CEBEBCED78D9479A /* SaveChatCommands.swift */,
+			);
+			path = Commands;
+			sourceTree = "<group>";
+		};
 		BD0E350482D91238B4B59721 /* Models */ = {
 			isa = PBXGroup;
 			children = (
@@ -238,12 +255,14 @@
 			files = (
 				D96DDE66F76FDDA642629E17 /* APIModels.swift in Sources */,
 				50DD129CCF2843482DEC3B96 /* APIServer.swift in Sources */,
+				29879D696584B96CC56560DF /* ChatExporter.swift in Sources */,
 				4CB13DC1AC7A500DDBB443EC /* ChatInputView.swift in Sources */,
 				FAF7D4714AC6D02674920208 /* ChatMessage.swift in Sources */,
 				5C1E8FE1C521914CEF98D3AA /* ChatMessagesView.swift in Sources */,
 				B5AA6E3B4BE21676226B342B /* ChatViewModel.swift in Sources */,
 				5946258F1DE88CE904584E0B /* ContentView.swift in Sources */,
 				C07A377244DCD67F4FE709FE /* DownloadModalView.swift in Sources */,
+				4DC033E45880B2948B47DEB1 /* FocusedValues.swift in Sources */,
 				2D08769282BD71C170DB0943 /* InferenceStats.swift in Sources */,
 				6828CCA8B78AB40906F87CAB /* LocalModelResolver.swift in Sources */,
 				50B6861FF8610B3ED4FFAD9D /* MLXServerApp.swift in Sources */,
@@ -252,6 +271,7 @@
 				2CAAF7129F7CC45200FA9F6B /* ModelPickerView.swift in Sources */,
 				B1D9BC407DB7DB1489230C20 /* MonitorView.swift in Sources */,
 				165E8AB6ADAE1D59B1A86420 /* Preferences.swift in Sources */,
+				4158FA884D981D73288FB74C /* SaveChatCommands.swift in Sources */,
 				D666A311788375E8A061C832 /* SettingsView.swift in Sources */,
 				621B7E4382199AC1378F5F9C /* StatusBarView.swift in Sources */,
 				189362AAE2CDE5D4B3428334 /* ToolCallParser.swift in Sources */,
@@ -399,7 +419,7 @@
 				);
 				MACOSX_DEPLOYMENT_TARGET = 15.0;
 				MARKETING_VERSION = 1.0.0;
-				PRODUCT_BUNDLE_IDENTIFIER = com.mlxserver.app;
+				PRODUCT_BUNDLE_IDENTIFIER = de.rfc1437.mlxserver;
 				PRODUCT_NAME = "MLX Server";
 				SDKROOT = macosx;
 				SWIFT_VERSION = 6.0;
@@ -424,7 +444,7 @@
 				);
 				MACOSX_DEPLOYMENT_TARGET = 15.0;
 				MARKETING_VERSION = 1.0.0;
-				PRODUCT_BUNDLE_IDENTIFIER = com.mlxserver.app;
+				PRODUCT_BUNDLE_IDENTIFIER = de.rfc1437.mlxserver;
 				PRODUCT_NAME = "MLX Server";
 				SDKROOT = macosx;
 				SWIFT_VERSION = 6.0;
--- a/MLXServer/Commands/SaveChatCommands.swift
+++ b/MLXServer/Commands/SaveChatCommands.swift
@@ -0,0 +1,16 @@
+import SwiftUI
+
+/// Adds "Export Chat…" to the File menu.
+struct SaveChatCommands: Commands {
+    @FocusedBinding(\.exportTrigger) var isExporting
+
+    var body: some Commands {
+        CommandGroup(after: .saveItem) {
+            Button("Export Chat…") {
+                isExporting = true
+            }
+            .keyboardShortcut("e", modifiers: [.command, .shift])
+            .disabled(isExporting == nil)
+        }
+    }
+}
--- a/MLXServer/ContentView.swift
+++ b/MLXServer/ContentView.swift
@@ -1,10 +1,12 @@
 import SwiftUI
+import UniformTypeIdentifiers

 struct ContentView: View {
    @Environment(ModelManager.self) private var modelManager
    @State private var chatVM: ChatViewModel?
    @State private var showLoadError = false
    @State private var showMonitor = false
+    @State private var isExporting = false

    var body: some View {
        mainContent
@@ -52,6 +54,21 @@ struct ContentView: View {
            .background {
                modelSwitchShortcuts
            }
+            // Expose export trigger to menu bar command
+            .focusedSceneValue(\.exportTrigger, $isExporting)
+            .fileExporter(
+                isPresented: $isExporting,
+                document: ChatExportDocument(
+                    messages: chatVM?.conversation.messages ?? [],
+                    modelName: modelManager.currentModel?.displayName
+                ),
+                contentTypes: ChatExportDocument.writableContentTypes,
+                defaultFilename: "chat"
+            ) { result in
+                if case .failure(let error) = result {
+                    print("[Export] Failed: \(error.localizedDescription)")
+                }
+            }
    }

    @ViewBuilder
--- a/MLXServer/MLXServerApp.swift
+++ b/MLXServer/MLXServerApp.swift
@@ -23,6 +23,9 @@ struct MLXServerApp: App {
        }
        .windowStyle(.titleBar)
        .defaultSize(width: 800, height: 700)
+        .commands {
+            SaveChatCommands()
+        }

        #if os(macOS)
        Settings {
--- a/MLXServer/Utilities/ChatExporter.swift
+++ b/MLXServer/Utilities/ChatExporter.swift
@@ -0,0 +1,290 @@
+import AppKit
+import Foundation
+import SwiftUI
+import UniformTypeIdentifiers
+
+/// A FileDocument that exports a chat conversation as Markdown or RTF.
+struct ChatExportDocument: FileDocument {
+    static var readableContentTypes: [UTType] { [.plainText] }
+    static var writableContentTypes: [UTType] {
+        [UTType(filenameExtension: "md") ?? .plainText, .rtf]
+    }
+
+    let messages: [ChatMessage]
+    let modelName: String?
+
+    init(messages: [ChatMessage], modelName: String?) {
+        self.messages = messages
+        self.modelName = modelName
+    }
+
+    init(configuration: ReadConfiguration) throws {
+        self.messages = []
+        self.modelName = nil
+    }
+
+    func fileWrapper(configuration: WriteConfiguration) throws -> FileWrapper {
+        let contentType = configuration.contentType
+
+        if contentType == .rtf, let data = ChatExporter.exportRTF(messages: messages, modelName: modelName) {
+            return FileWrapper(regularFileWithContents: data)
+        } else {
+            let md = ChatExporter.exportMarkdown(messages: messages, modelName: modelName)
+            return FileWrapper(regularFileWithContents: Data(md.utf8))
+        }
+    }
+}
+
+/// Exports a chat conversation to Markdown or RTF (Pages-compatible) format.
+enum ChatExporter {
+
+    // MARK: - Markdown export
+
+    static func exportMarkdown(messages: [ChatMessage], modelName: String?) -> String {
+        var lines: [String] = []
+
+        // Header
+        lines.append("# Chat Session")
+        if let modelName {
+            lines.append("**Model:** \(modelName)")
+        }
+        let formatter = DateFormatter()
+        formatter.dateStyle = .long
+        formatter.timeStyle = .short
+        if let first = messages.first {
+            lines.append("**Date:** \(formatter.string(from: first.timestamp))")
+        }
+        lines.append("")
+        lines.append("---")
+        lines.append("")
+
+        for message in messages {
+            guard message.role != .system else { continue }
+
+            if message.role == .user {
+                // User messages as blockquotes
+                lines.append("**You:**")
+                lines.append("")
+                for line in message.content.components(separatedBy: "\n") {
+                    lines.append("> \(line)")
+                }
+            } else {
+                // Assistant messages: carry over original markdown
+                lines.append("**Assistant:**")
+                lines.append("")
+                lines.append(message.content)
+            }
+
+            lines.append("")
+            lines.append("---")
+            lines.append("")
+        }
+
+        return lines.joined(separator: "\n")
+    }
+
+    // MARK: - RTF export
+
+    static func exportRTF(messages: [ChatMessage], modelName: String?) -> Data? {
+        let doc = NSMutableAttributedString()
+
+        let bodyFont = NSFont.systemFont(ofSize: 13)
+        let bodyBoldFont = NSFont.boldSystemFont(ofSize: 13)
+        let titleFont = NSFont.boldSystemFont(ofSize: 20)
+        let metaFont = NSFont.systemFont(ofSize: 11)
+        let codeFont = NSFont.monospacedSystemFont(ofSize: 12, weight: .regular)
+
+        let bodyParagraph = NSMutableParagraphStyle()
+        bodyParagraph.paragraphSpacing = 8
+        bodyParagraph.lineSpacing = 2
+
+        let userParagraph = NSMutableParagraphStyle()
+        userParagraph.paragraphSpacing = 8
+        userParagraph.lineSpacing = 2
+        userParagraph.headIndent = 20
+        userParagraph.firstLineHeadIndent = 20
+
+        // Title
+        doc.append(NSAttributedString(
+            string: "Chat Session\n",
+            attributes: [.font: titleFont, .paragraphStyle: bodyParagraph]
+        ))
+
+        // Metadata
+        let formatter = DateFormatter()
+        formatter.dateStyle = .long
+        formatter.timeStyle = .short
+        var metaText = ""
+        if let modelName { metaText += "Model: \(modelName)  " }
+        if let first = messages.first {
+            metaText += "Date: \(formatter.string(from: first.timestamp))"
+        }
+        if !metaText.isEmpty {
+            doc.append(NSAttributedString(
+                string: metaText + "\n\n",
+                attributes: [.font: metaFont, .foregroundColor: NSColor.secondaryLabelColor]
+            ))
+        }
+
+        for message in messages {
+            guard message.role != .system else { continue }
+
+            if message.role == .user {
+                doc.append(NSAttributedString(
+                    string: "You\n",
+                    attributes: [
+                        .font: bodyBoldFont,
+                        .foregroundColor: NSColor.systemBlue,
+                    ]
+                ))
+                doc.append(NSAttributedString(
+                    string: message.content + "\n\n",
+                    attributes: [
+                        .font: bodyFont,
+                        .paragraphStyle: userParagraph,
+                        .foregroundColor: NSColor.labelColor,
+                    ]
+                ))
+            } else {
+                doc.append(NSAttributedString(
+                    string: "Assistant\n",
+                    attributes: [
+                        .font: bodyBoldFont,
+                        .foregroundColor: NSColor.labelColor,
+                    ]
+                ))
+                let rendered = renderMarkdown(message.content, bodyFont: bodyFont, codeFont: codeFont, paragraph: bodyParagraph)
+                doc.append(rendered)
+                doc.append(NSAttributedString(string: "\n\n"))
+            }
+
+            doc.append(NSAttributedString(
+                string: "\n",
+                attributes: [
+                    .strikethroughStyle: NSUnderlineStyle.single.rawValue,
+                    .strikethroughColor: NSColor.separatorColor,
+                    .font: NSFont.systemFont(ofSize: 4),
+                ]
+            ))
+        }
+
+        return doc.rtf(from: NSRange(location: 0, length: doc.length), documentAttributes: [
+            .documentType: NSAttributedString.DocumentType.rtf,
+        ])
+    }
+
+    // MARK: - Markdown → NSAttributedString (basic)
+
+    private static func renderMarkdown(
+        _ text: String,
+        bodyFont: NSFont,
+        codeFont: NSFont,
+        paragraph: NSParagraphStyle
+    ) -> NSAttributedString {
+        let result = NSMutableAttributedString()
+        let lines = text.components(separatedBy: "\n")
+        var inCodeBlock = false
+        var codeBlockLines: [String] = []
+
+        for line in lines {
+            if line.hasPrefix("```") {
+                if inCodeBlock {
+                    let code = codeBlockLines.joined(separator: "\n")
+                    let codePara = NSMutableParagraphStyle()
+                    codePara.paragraphSpacing = 4
+                    codePara.headIndent = 12
+                    codePara.firstLineHeadIndent = 12
+                    result.append(NSAttributedString(
+                        string: code + "\n",
+                        attributes: [
+                            .font: codeFont,
+                            .foregroundColor: NSColor.secondaryLabelColor,
+                            .backgroundColor: NSColor.quaternaryLabelColor,
+                            .paragraphStyle: codePara,
+                        ]
+                    ))
+                    codeBlockLines = []
+                    inCodeBlock = false
+                } else {
+                    inCodeBlock = true
+                }
+                continue
+            }
+
+            if inCodeBlock {
+                codeBlockLines.append(line)
+                continue
+            }
+
+            if line.hasPrefix("### ") {
+                result.append(NSAttributedString(
+                    string: String(line.dropFirst(4)) + "\n",
+                    attributes: [.font: NSFont.boldSystemFont(ofSize: 14), .paragraphStyle: paragraph]
+                ))
+            } else if line.hasPrefix("## ") {
+                result.append(NSAttributedString(
+                    string: String(line.dropFirst(3)) + "\n",
+                    attributes: [.font: NSFont.boldSystemFont(ofSize: 15), .paragraphStyle: paragraph]
+                ))
+            } else if line.hasPrefix("# ") {
+                result.append(NSAttributedString(
+                    string: String(line.dropFirst(2)) + "\n",
+                    attributes: [.font: NSFont.boldSystemFont(ofSize: 17), .paragraphStyle: paragraph]
+                ))
+            } else {
+                let styled = applyInlineFormatting(line, bodyFont: bodyFont, codeFont: codeFont)
+                result.append(styled)
+                result.append(NSAttributedString(string: "\n", attributes: [.font: bodyFont]))
+            }
+        }
+
+        return result
+    }
+
+    private static func applyInlineFormatting(
+        _ text: String,
+        bodyFont: NSFont,
+        codeFont: NSFont
+    ) -> NSAttributedString {
+        let result = NSMutableAttributedString()
+        var remaining = text[text.startIndex...]
+
+        while !remaining.isEmpty {
+            if remaining.hasPrefix("`"), let end = remaining.dropFirst().firstIndex(of: "`") {
+                let code = String(remaining[remaining.index(after: remaining.startIndex)..<end])
+                result.append(NSAttributedString(
+                    string: code,
+                    attributes: [
+                        .font: codeFont,
+                        .foregroundColor: NSColor.secondaryLabelColor,
+                        .backgroundColor: NSColor.quaternaryLabelColor,
+                    ]
+                ))
+                remaining = remaining[remaining.index(after: end)...]
+            } else if remaining.hasPrefix("**"), let end = remaining.dropFirst(2).range(of: "**") {
+                let bold = String(remaining[remaining.index(remaining.startIndex, offsetBy: 2)..<end.lowerBound])
+                result.append(NSAttributedString(
+                    string: bold,
+                    attributes: [.font: NSFont.boldSystemFont(ofSize: bodyFont.pointSize)]
+                ))
+                remaining = remaining[end.upperBound...]
+            } else if remaining.hasPrefix("*"), let end = remaining.dropFirst().firstIndex(of: "*") {
+                let italic = String(remaining[remaining.index(after: remaining.startIndex)..<end])
+                result.append(NSAttributedString(
+                    string: italic,
+                    attributes: [.font: NSFontManager.shared.convert(bodyFont, toHaveTrait: .italicFontMask)]
+                ))
+                remaining = remaining[remaining.index(after: end)...]
+            } else {
+                let ch = remaining[remaining.startIndex]
+                result.append(NSAttributedString(
+                    string: String(ch),
+                    attributes: [.font: bodyFont]
+                ))
+                remaining = remaining[remaining.index(after: remaining.startIndex)...]
+            }
+        }
+
+        return result
+    }
+}
--- a/MLXServer/Utilities/FocusedValues.swift
+++ b/MLXServer/Utilities/FocusedValues.swift
@@ -0,0 +1,13 @@
+import SwiftUI
+
+/// Focused value key for triggering chat export from the menu bar.
+struct FocusedExportTriggerKey: FocusedValueKey {
+    typealias Value = Binding<Bool>
+}
+
+extension FocusedValues {
+    var exportTrigger: Binding<Bool>? {
+        get { self[FocusedExportTriggerKey.self] }
+        set { self[FocusedExportTriggerKey.self] = newValue }
+    }
+}
--- a/MLXServer/Utilities/LocalModelResolver.swift
+++ b/MLXServer/Utilities/LocalModelResolver.swift
@@ -1,75 +1,28 @@
 import Foundation

-/// Resolves HuggingFace model repos to local snapshot directories,
-/// matching the cache layout used by Python's `huggingface_hub`.
+/// Resolves HuggingFace model repos to local directories.
 ///
-/// Checks two locations:
-///   1. App sandbox container: ~/Library/Containers/com.mlxserver.app/.../huggingface/hub/
-///   2. System-wide cache: ~/.cache/huggingface/hub/ (shared with Python tools)
-///
-/// Cache structure:
-///   .../huggingface/hub/models--{org}--{name}/snapshots/{hash}/
+/// HubApi(downloadBase: .cachesDirectory, cache: nil) downloads models to:
+///   ~/Library/Containers/de.rfc1437.mlxserver/Data/Library/Caches/models/{org}/{name}/
 enum LocalModelResolver {

-    /// All HuggingFace cache directories to search, in priority order.
-    /// The sandboxed container path is checked first (where the app downloads to),
-    /// then the system-wide Python cache (for models downloaded via huggingface-cli).
-    private static let cacheBases: [URL] = {
-        var bases: [URL] = []
-
-        // 1. Sandboxed app container cache (where swift-transformers Hub downloads to)
-        let containerCache = FileManager.default.homeDirectoryForCurrentUser
-            .appendingPathComponent("Library/Caches/huggingface/hub", isDirectory: true)
-        bases.append(containerCache)
-
-        // 2. System-wide ~/.cache/huggingface/hub/ (Python huggingface_hub)
-        //    When sandboxed, homeDirectory points to the container, so construct the real path.
-        let realHome = URL(fileURLWithPath: NSHomeDirectory())
-        let systemCache = realHome
-            .appendingPathComponent(".cache/huggingface/hub", isDirectory: true)
-        // Avoid duplicate if they resolve to the same path
-        if systemCache.path != containerCache.path {
-            bases.append(systemCache)
-        }
-
-        // 3. Also try the unsandboxed home directory path
-        let globalHome = FileManager.default.homeDirectoryForCurrentUser
-            .appendingPathComponent(".cache/huggingface/hub", isDirectory: true)
-        if globalHome.path != containerCache.path && globalHome.path != systemCache.path {
-            bases.append(globalHome)
-        }
-
-        return bases
+    /// Base directory where HubApi stores downloaded models.
+    private static let modelsBase: URL? = {
+        FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first?
+            .appendingPathComponent("models", isDirectory: true)
    }()

    /// Resolve a HuggingFace repo ID (e.g. "mlx-community/gemma-3-4b-it-4bit")
-    /// to its local snapshot directory, if it exists.
+    /// to its local directory, if it exists.
    ///
    /// Returns `nil` if the model hasn't been downloaded yet.
    static func resolve(repoId: String) -> URL? {
-        let dirName = "models--" + repoId.replacingOccurrences(of: "/", with: "--")
-
-        for cacheBase in cacheBases {
-            let snapshotsDir = cacheBase
-                .appendingPathComponent(dirName, isDirectory: true)
-                .appendingPathComponent("snapshots", isDirectory: true)
-
-            guard let contents = try? FileManager.default.contentsOfDirectory(
-                at: snapshotsDir,
-                includingPropertiesForKeys: [.isDirectoryKey],
-                options: [.skipsHiddenFiles]
-            ) else {
-                continue
-            }
-
-            if let snapshot = contents
-                .filter({ (try? $0.resourceValues(forKeys: [.isDirectoryKey]).isDirectory) == true })
-                .sorted(by: { $0.lastPathComponent < $1.lastPathComponent })
-                .last {
-                return snapshot
-            }
+        guard let base = modelsBase else { return nil }
+        let modelDir = base.appendingPathComponent(repoId, isDirectory: true)
+        var isDir: ObjCBool = false
+        if FileManager.default.fileExists(atPath: modelDir.path, isDirectory: &isDir), isDir.boolValue {
+            return modelDir
        }
-
        return nil
    }

@@ -79,39 +32,18 @@ enum LocalModelResolver {
    }

    /// Delete the local cache for a model so it will be re-downloaded next time.
-    /// Removes from all cache locations.
-    /// Returns true if something was deleted.
    @discardableResult
    static func deleteLocal(repoId: String) -> Bool {
-        let dirName = "models--" + repoId.replacingOccurrences(of: "/", with: "--")
-        var deleted = false
-
-        for cacheBase in cacheBases {
-            let modelDir = cacheBase.appendingPathComponent(dirName, isDirectory: true)
-            guard FileManager.default.fileExists(atPath: modelDir.path) else { continue }
-            do {
-                try FileManager.default.removeItem(at: modelDir)
-                print("[LocalModelResolver] Deleted \(modelDir.path)")
-                deleted = true
-            } catch {
-                print("[LocalModelResolver] Failed to delete \(modelDir.path): \(error)")
-            }
+        guard let base = modelsBase else { return false }
+        let modelDir = base.appendingPathComponent(repoId, isDirectory: true)
+        guard FileManager.default.fileExists(atPath: modelDir.path) else { return false }
+        do {
+            try FileManager.default.removeItem(at: modelDir)
+            print("[LocalModelResolver] Deleted \(modelDir.path)")
+            return true
+        } catch {
+            print("[LocalModelResolver] Failed to delete \(modelDir.path): \(error)")
+            return false
        }
-
-        // Also clean up the per-model cache in the container (used by swift-transformers)
-        let containerModelsDir = FileManager.default.homeDirectoryForCurrentUser
-            .appendingPathComponent("Library/Caches/models", isDirectory: true)
-            .appendingPathComponent(repoId, isDirectory: true)
-        if FileManager.default.fileExists(atPath: containerModelsDir.path) {
-            do {
-                try FileManager.default.removeItem(at: containerModelsDir)
-                print("[LocalModelResolver] Deleted \(containerModelsDir.path)")
-                deleted = true
-            } catch {
-                print("[LocalModelResolver] Failed to delete \(containerModelsDir.path): \(error)")
-            }
-        }
-
-        return deleted
    }
 }
--- a/MLXServer/ViewModels/ModelManager.swift
+++ b/MLXServer/ViewModels/ModelManager.swift
@@ -12,7 +12,12 @@ final class ModelManager {
    /// HubApi with blob cache disabled to avoid storing every model twice.
    /// swift-huggingface defaults to caching in both huggingface/hub/ (snapshots)
    /// AND models/ (content-addressed blobs). We only need the snapshots.
-    private static let hub = HubApi(cache: nil)
+    /// Must use the same downloadBase as defaultHubApi (.cachesDirectory) so
+    /// LocalModelResolver can find downloaded models.
+    private static let hub: HubApi = {
+        let cachesDir = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first
+        return HubApi(downloadBase: cachesDir, cache: nil)
+    }()
    var currentModel: ModelConfig?
    var modelContainer: ModelContainer?
    var isLoading = false
@@ -52,7 +57,6 @@ final class ModelManager {
        }

        do {
-            let container: ModelContainer
            let progressHandler: @Sendable (Progress) -> Void = { progress in
                Task { @MainActor in
                    self.downloadProgress = progress.fractionCompleted
@@ -73,7 +77,7 @@ final class ModelManager {
                configuration = config.modelConfiguration
            }

-            container = try await VLMModelFactory.shared.loadContainer(
+            let container = try await VLMModelFactory.shared.loadContainer(
                hub: Self.hub,
                configuration: configuration,
                progressHandler: progressHandler
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # MLX Server

-Native macOS app for running local LLMs on Apple Silicon via [MLX](https://github.com/ml-explore/mlx). Built with SwiftUI, it provides both a **chat UI** and an embedded **OpenAI-compatible API server**. Supports vision and tool use with automatic model swapping.
+Native macOS app for running local LLMs on Apple Silicon via [MLX](https://github.com/ml-explore/mlx). Built with SwiftUI, it provides both a **chat UI** and an embedded **OpenAI-compatible API server**. Supports vision, tool use, and thinking mode.

 ## Supported Models

@@ -8,6 +8,9 @@ Native macOS app for running local LLMs on Apple Silicon via [MLX](https://githu
 |-------|-------|---------|-------------|
 | `gemma` | `mlx-community/gemma-3-4b-it-4bit` | 128k | Vision, tool use (`tool_code` blocks) |
 | `qwen` | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | 256k | Vision, tool use (`<tool_call>` tags) |
+| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | 256k | Thinking mode, tool use |
+
+Any model in MLX format on HuggingFace can be added — there is no restriction on uploader or architecture.

 ## Quick Start

@@ -20,12 +23,16 @@ open "build/Debug/MLX Server.app"

 ## App Features

- **Chat interface** with markdown rendering, image attachments (file picker, drag & drop, clipboard paste)
- **Model picker** in toolbar with local/download status indicators
+- **Chat interface** with markdown rendering, image attachments (file picker, drag & drop, clipboard paste, Finder copy-paste)
+- **Model picker** in toolbar with local/download status indicators and re-download button
+- **Download progress modal** — shows file progress, percentage, and speed when downloading a new model
+- **Thinking mode** — models like Qwen3.5 can reason internally before responding; thinking content appears in a collapsible box. Toggle on/off in Settings.
 - **Streaming responses** with live token display
+- **Export chat** — File > Export Chat (Cmd+Shift+S) saves conversations as Markdown or RTF (Pages-compatible)
 - **Status bar** showing model name, context window, tokens/sec, token counts, GPU memory, API server status
- **Keyboard shortcuts**: `Cmd+N` (new chat), `Cmd+Return` (send), `Escape` (stop), `Cmd+1/2/3` (switch models)
- **Settings** (`Cmd+,`): system prompt, API port, API auto-start
+- **Keyboard shortcuts**: `Cmd+N` (new chat), `Cmd+Return` (send), `Escape` (stop), `Cmd+1/2/3/4` (switch models), `Cmd+Shift+S` (export)
+- **Settings** (`Cmd+,`): default model, thinking mode toggle, system prompt, API port, API auto-start, idle unload timeout
+- **Idle auto-unload** — model is unloaded after configurable idle time (resets on both user input and model output), reloaded on next request

 ## API Server

@@ -74,23 +81,29 @@ MLXServer/
 ├── ContentView.swift               — Main layout, toolbar, keyboard shortcuts
 ├── Models/
 │   ├── ModelConfig.swift           — Model definitions, alias/repoId resolution
-│   └── ChatMessage.swift           — Chat message data model
+│   └── ChatMessage.swift           — Chat message data model, thinking tag parser
 ├── ViewModels/
-│   ├── ModelManager.swift          — Model loading/switching via VLMModelFactory
+│   ├── ModelManager.swift          — Model loading/switching, download tracking, idle unload
 │   └── ChatViewModel.swift         — Chat state, ChatSession, API server lifecycle
 ├── Views/
-│   ├── ModelPickerView.swift       — Toolbar model selector
-│   ├── ChatMessagesView.swift      — Scrollable message list with markdown
-│   ├── ChatInputView.swift         — Text input + image attach
+│   ├── ModelPickerView.swift       — Toolbar model selector with re-download
+│   ├── ChatMessagesView.swift      — Scrollable message list with markdown + thinking blocks
+│   ├── ChatInputView.swift         — Text input + image attach (paste, drag, picker)
+│   ├── DownloadModalView.swift     — Model download progress overlay
 │   ├── StatusBarView.swift         — Model info, tok/s, GPU memory, API status
-│   └── SettingsView.swift          — System prompt + API settings
+│   ├── MonitorView.swift           — Inference statistics monitor
+│   └── SettingsView.swift          — System prompt, thinking mode, API, idle settings
+├── Commands/
+│   └── SaveChatCommands.swift      — File menu export command
 ├── Server/
 │   ├── APIServer.swift             — NWListener HTTP server, SSE streaming, KV cache reuse
 │   ├── APIModels.swift             — OpenAI-compatible Codable structs
 │   ├── ToolCallParser.swift        — Parses tool calls from model output
 │   └── ToolPromptBuilder.swift     — Model-specific tool prompt formatting
 └── Utilities/
-    ├── LocalModelResolver.swift    — Offline-first HuggingFace cache resolution
+    ├── LocalModelResolver.swift    — Offline-first HuggingFace cache resolution (sandbox + system)
+    ├── ChatExporter.swift          — Export conversations to Markdown or RTF
+    ├── FocusedValues.swift         — FocusedValue keys for menu bar integration
    └── Preferences.swift           — UserDefaults wrapper

 project.yml     — xcodegen project spec (dependencies, settings, deployment target)
@@ -99,17 +112,11 @@ build.sh        — One-command build script (xcodegen + xcodebuild)

 ## Key Design Decisions

- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) for inference — supports both text and vision in a single model load
- **Offline-first**: `LocalModelResolver` checks `~/.cache/huggingface/hub/` for locally-cached snapshots before downloading
+- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) for inference — loads any MLX-format model from HuggingFace
+- **Offline-first**: `LocalModelResolver` checks both the sandboxed app container and `~/.cache/huggingface/hub/` for locally-cached models before downloading
+- **No duplicate storage**: custom `HubApi` with blob cache disabled — models are stored once in the snapshot cache
 - **KV cache reuse** across API requests — reuses `ChatSession` when conversation history prefix matches
+- **Thinking mode**: `enable_thinking` passed via Jinja template context; `<think>` tags parsed in real-time during streaming
 - HTTP server built on `Network.framework` (`NWListener`) — no third-party server dependencies
 - Model-specific prompt formatting: Gemma uses `tool_code` blocks, Qwen uses `<tool_call>` XML tags
 - GPU cache limit set to 20 MB; cache cleared on model unload
-
-## Design Notes
-
- Uses `mlx_vlm` (not `mlx_lm`) as the backend — supports both text and vision in a single model load
- Offline-first: if the model is cached locally (`~/.cache/huggingface/hub/`), no network requests are made
- Thread lock on generation — MLX models aren't safe for concurrent generation
- KV prefix caching for multi-turn conversations
- Context window read from each model's config (Gemma 3 4B: 128k, Qwen3-VL 4B: 256k) with automatic summarization fallback
--- a/project.yml
+++ b/project.yml
@@ -22,7 +22,7 @@ targets:
      - MLXServer
    settings:
      base:
-        PRODUCT_BUNDLE_IDENTIFIER: com.mlxserver.app
+        PRODUCT_BUNDLE_IDENTIFIER: de.rfc1437.mlxserver
        PRODUCT_NAME: MLX Server
        MARKETING_VERSION: "1.0.0"
        CURRENT_PROJECT_VERSION: "1"