diff --git a/CLAUDE.md b/CLAUDE.md index 3f9265a..145cd6e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,6 +1,6 @@ # MLX Server -Native macOS SwiftUI app for local LLMs on Apple Silicon via MLX. Provides a chat UI and an embedded OpenAI-compatible API server. Supports vision and tool use. +Native macOS SwiftUI app for local LLMs on Apple Silicon via MLX. Provides a chat UI and an embedded OpenAI-compatible API server. Supports vision, tool use, and thinking mode. ## Quick Start @@ -14,18 +14,24 @@ open "build/Debug/MLX Server.app" ## Project Structure -- `MLXServer/MLXServerApp.swift` — App entry point, GPU cache config -- `MLXServer/ContentView.swift` — Main layout, toolbar, keyboard shortcuts +- `MLXServer/MLXServerApp.swift` — App entry point, GPU cache config, menu commands +- `MLXServer/ContentView.swift` — Main layout, toolbar, keyboard shortcuts, focused values - `MLXServer/Models/ModelConfig.swift` — Model definitions (alias, repoId, contextLength), resolution -- `MLXServer/Models/ChatMessage.swift` — Chat message data model -- `MLXServer/ViewModels/ModelManager.swift` — Model loading/switching via VLMModelFactory, offline-first resolution +- `MLXServer/Models/ChatMessage.swift` — Chat message data model, `` tag parsing +- `MLXServer/ViewModels/ModelManager.swift` — Model loading/switching via VLMModelFactory, download tracking, idle unload - `MLXServer/ViewModels/ChatViewModel.swift` — Chat state, ChatSession management, API server lifecycle - `MLXServer/Server/APIServer.swift` — NWListener HTTP server, SSE streaming, KV cache reuse, vision, tool call handling - `MLXServer/Server/APIModels.swift` — OpenAI-compatible Codable structs - `MLXServer/Server/ToolCallParser.swift` — Parses tool calls from model output (Gemma tool_code, Qwen XML tags) - `MLXServer/Server/ToolPromptBuilder.swift` — Model-specific tool prompt formatting -- `MLXServer/Utilities/LocalModelResolver.swift` — Resolves HF repo IDs to ~/.cache/huggingface/hub/ snapshots -- `MLXServer/Utilities/Preferences.swift` — UserDefaults wrapper +- `MLXServer/Views/DownloadModalView.swift` — Modal overlay for model download progress +- `MLXServer/Views/ChatMessagesView.swift` — Message bubbles with markdown rendering and collapsible thinking blocks +- `MLXServer/Views/ChatInputView.swift` — Text input, image attach (file picker, drag & drop, Finder copy-paste) +- `MLXServer/Commands/SaveChatCommands.swift` — File > Export Chat menu command +- `MLXServer/Utilities/LocalModelResolver.swift` — Resolves HF repo IDs to local snapshots (sandbox + system cache + flat layouts) +- `MLXServer/Utilities/ChatExporter.swift` — Export conversations to Markdown or RTF (Pages-compatible) +- `MLXServer/Utilities/FocusedValues.swift` — FocusedValue keys for menu bar integration +- `MLXServer/Utilities/Preferences.swift` — UserDefaults wrapper (model, thinking mode, API, idle timeout) - `project.yml` — xcodegen project spec - `build.sh` — Build script (xcodegen + xcodebuild) @@ -35,6 +41,9 @@ open "build/Debug/MLX Server.app" |-------|---------------|-------| | `gemma` | `mlx-community/gemma-3-4b-it-4bit` | Vision + tool use via `tool_code` blocks (128k context) | | `qwen` | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | Vision + tool use via `` tags (256k context) | +| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | Thinking mode, tool use (256k context) | + +Any model in MLX format on HuggingFace can be added — no restriction on uploader or architecture. ## Critical Performance Rule @@ -47,9 +56,15 @@ open "build/Debug/MLX Server.app" ## Key Design Decisions -- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) as the inference backend — supports both text and vision in a single model load +- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) as the inference backend — loads any MLX-format model from HuggingFace - Model-specific prompt formatting: Gemma uses `tool_code` blocks; Qwen uses `` XML tags -- Offline-first: if the model is already cached locally (~/.cache/huggingface/hub/), `LocalModelResolver` resolves the local snapshot path directly — no network requests +- **Offline-first**: `LocalModelResolver` checks the sandboxed app container, system `~/.cache/huggingface/hub/`, and flat download layouts — no network requests if model is cached +- **No duplicate storage**: custom `HubApi(cache: nil)` with explicit `downloadBase` — models stored once in the snapshot cache, not duplicated across blob cache and snapshots +- **Thinking mode**: `enable_thinking` passed to Jinja template context via `additionalContext`; `...` tags parsed in real-time during streaming and shown in collapsible UI blocks. Toggleable in Settings. +- **Download progress**: separate `isDownloading` state from `isLoading`; modal overlay shows file count, percentage, speed +- **Idle unload**: timer resets on both user input and model generation completion (not just request start) +- **Chat export**: Markdown (user messages as blockquotes) and RTF (Pages-compatible with formatted markdown) +- **Finder paste**: local event monitor intercepts Cmd+V to check pasteboard for image file URLs before TextField handles it - HTTP server built on `Network.framework` (`NWListener`) — no third-party server dependencies - KV cache reuse across API requests — reuses `ChatSession` when conversation history prefix matches - GPU cache limit set to 20 MB; cache cleared on model unload diff --git a/MLXServer.xcodeproj/project.pbxproj b/MLXServer.xcodeproj/project.pbxproj index 1d673c1..d780094 100644 --- a/MLXServer.xcodeproj/project.pbxproj +++ b/MLXServer.xcodeproj/project.pbxproj @@ -10,9 +10,12 @@ 0168AEE16009097901363E16 /* ModelManager.swift in Sources */ = {isa = PBXBuildFile; fileRef = 922CBDC9206737BD04AF2874 /* ModelManager.swift */; }; 165E8AB6ADAE1D59B1A86420 /* Preferences.swift in Sources */ = {isa = PBXBuildFile; fileRef = 145B888FBDD4F931512C5473 /* Preferences.swift */; }; 189362AAE2CDE5D4B3428334 /* ToolCallParser.swift in Sources */ = {isa = PBXBuildFile; fileRef = E73B165A1822729C907791AE /* ToolCallParser.swift */; }; + 29879D696584B96CC56560DF /* ChatExporter.swift in Sources */ = {isa = PBXBuildFile; fileRef = D7C9BAD674E29688ACE53B0B /* ChatExporter.swift */; }; 2CAAF7129F7CC45200FA9F6B /* ModelPickerView.swift in Sources */ = {isa = PBXBuildFile; fileRef = C3C3A76C02AF70A9D8F868FC /* ModelPickerView.swift */; }; 2D08769282BD71C170DB0943 /* InferenceStats.swift in Sources */ = {isa = PBXBuildFile; fileRef = E35452B166893B25E765FF70 /* InferenceStats.swift */; }; + 4158FA884D981D73288FB74C /* SaveChatCommands.swift in Sources */ = {isa = PBXBuildFile; fileRef = 2E2FCA55CEBEBCED78D9479A /* SaveChatCommands.swift */; }; 4CB13DC1AC7A500DDBB443EC /* ChatInputView.swift in Sources */ = {isa = PBXBuildFile; fileRef = E5E6AD02CDF23BDAB64700A7 /* ChatInputView.swift */; }; + 4DC033E45880B2948B47DEB1 /* FocusedValues.swift in Sources */ = {isa = PBXBuildFile; fileRef = EF518FEBF3A38E830E3CE1A5 /* FocusedValues.swift */; }; 50B6861FF8610B3ED4FFAD9D /* MLXServerApp.swift in Sources */ = {isa = PBXBuildFile; fileRef = C67742651DB486871CEF1612 /* MLXServerApp.swift */; }; 50DD129CCF2843482DEC3B96 /* APIServer.swift in Sources */ = {isa = PBXBuildFile; fileRef = 3D08828E16B17EF02C14243E /* APIServer.swift */; }; 5946258F1DE88CE904584E0B /* ContentView.swift in Sources */ = {isa = PBXBuildFile; fileRef = 944C699FBB76C734C9DF2F2E /* ContentView.swift */; }; @@ -38,6 +41,7 @@ 145B888FBDD4F931512C5473 /* Preferences.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = Preferences.swift; sourceTree = ""; }; 16AE82A64D1D07AE3CD8D33A /* ToolPromptBuilder.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ToolPromptBuilder.swift; sourceTree = ""; }; 2DC8C86D397B1FCA08E07CBD /* DownloadModalView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = DownloadModalView.swift; sourceTree = ""; }; + 2E2FCA55CEBEBCED78D9479A /* SaveChatCommands.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = SaveChatCommands.swift; sourceTree = ""; }; 38DFC212AF4359A45FBE22BA /* ModelConfig.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ModelConfig.swift; sourceTree = ""; }; 3AF462805202797F61422AEE /* MLXServer.entitlements */ = {isa = PBXFileReference; lastKnownFileType = text.plist.entitlements; path = MLXServer.entitlements; sourceTree = ""; }; 3D08828E16B17EF02C14243E /* APIServer.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = APIServer.swift; sourceTree = ""; }; @@ -53,10 +57,12 @@ C3C3A76C02AF70A9D8F868FC /* ModelPickerView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ModelPickerView.swift; sourceTree = ""; }; C67742651DB486871CEF1612 /* MLXServerApp.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = MLXServerApp.swift; sourceTree = ""; }; D733A0D1D4AC25DDDA6C8684 /* LocalModelResolver.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = LocalModelResolver.swift; sourceTree = ""; }; + D7C9BAD674E29688ACE53B0B /* ChatExporter.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ChatExporter.swift; sourceTree = ""; }; DB1A5E8B1C9F2BC4D262C53A /* ChatMessagesView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ChatMessagesView.swift; sourceTree = ""; }; E35452B166893B25E765FF70 /* InferenceStats.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = InferenceStats.swift; sourceTree = ""; }; E5E6AD02CDF23BDAB64700A7 /* ChatInputView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ChatInputView.swift; sourceTree = ""; }; E73B165A1822729C907791AE /* ToolCallParser.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ToolCallParser.swift; sourceTree = ""; }; + EF518FEBF3A38E830E3CE1A5 /* FocusedValues.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = FocusedValues.swift; sourceTree = ""; }; F1A52E2C9964ADA9D841A89B /* APIModels.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = APIModels.swift; sourceTree = ""; }; /* End PBXFileReference section */ @@ -78,6 +84,8 @@ 05B1BAE308E64D2FB2E73823 /* Utilities */ = { isa = PBXGroup; children = ( + D7C9BAD674E29688ACE53B0B /* ChatExporter.swift */, + EF518FEBF3A38E830E3CE1A5 /* FocusedValues.swift */, D733A0D1D4AC25DDDA6C8684 /* LocalModelResolver.swift */, 145B888FBDD4F931512C5473 /* Preferences.swift */, ); @@ -99,6 +107,7 @@ 944C699FBB76C734C9DF2F2E /* ContentView.swift */, 3AF462805202797F61422AEE /* MLXServer.entitlements */, C67742651DB486871CEF1612 /* MLXServerApp.swift */, + B459409ED6FD8797FDD81E94 /* Commands */, BD0E350482D91238B4B59721 /* Models */, E13C1AAA0C49D0ED85EFD94D /* Server */, 05B1BAE308E64D2FB2E73823 /* Utilities */, @@ -122,6 +131,14 @@ path = Views; sourceTree = ""; }; + B459409ED6FD8797FDD81E94 /* Commands */ = { + isa = PBXGroup; + children = ( + 2E2FCA55CEBEBCED78D9479A /* SaveChatCommands.swift */, + ); + path = Commands; + sourceTree = ""; + }; BD0E350482D91238B4B59721 /* Models */ = { isa = PBXGroup; children = ( @@ -238,12 +255,14 @@ files = ( D96DDE66F76FDDA642629E17 /* APIModels.swift in Sources */, 50DD129CCF2843482DEC3B96 /* APIServer.swift in Sources */, + 29879D696584B96CC56560DF /* ChatExporter.swift in Sources */, 4CB13DC1AC7A500DDBB443EC /* ChatInputView.swift in Sources */, FAF7D4714AC6D02674920208 /* ChatMessage.swift in Sources */, 5C1E8FE1C521914CEF98D3AA /* ChatMessagesView.swift in Sources */, B5AA6E3B4BE21676226B342B /* ChatViewModel.swift in Sources */, 5946258F1DE88CE904584E0B /* ContentView.swift in Sources */, C07A377244DCD67F4FE709FE /* DownloadModalView.swift in Sources */, + 4DC033E45880B2948B47DEB1 /* FocusedValues.swift in Sources */, 2D08769282BD71C170DB0943 /* InferenceStats.swift in Sources */, 6828CCA8B78AB40906F87CAB /* LocalModelResolver.swift in Sources */, 50B6861FF8610B3ED4FFAD9D /* MLXServerApp.swift in Sources */, @@ -252,6 +271,7 @@ 2CAAF7129F7CC45200FA9F6B /* ModelPickerView.swift in Sources */, B1D9BC407DB7DB1489230C20 /* MonitorView.swift in Sources */, 165E8AB6ADAE1D59B1A86420 /* Preferences.swift in Sources */, + 4158FA884D981D73288FB74C /* SaveChatCommands.swift in Sources */, D666A311788375E8A061C832 /* SettingsView.swift in Sources */, 621B7E4382199AC1378F5F9C /* StatusBarView.swift in Sources */, 189362AAE2CDE5D4B3428334 /* ToolCallParser.swift in Sources */, @@ -399,7 +419,7 @@ ); MACOSX_DEPLOYMENT_TARGET = 15.0; MARKETING_VERSION = 1.0.0; - PRODUCT_BUNDLE_IDENTIFIER = com.mlxserver.app; + PRODUCT_BUNDLE_IDENTIFIER = de.rfc1437.mlxserver; PRODUCT_NAME = "MLX Server"; SDKROOT = macosx; SWIFT_VERSION = 6.0; @@ -424,7 +444,7 @@ ); MACOSX_DEPLOYMENT_TARGET = 15.0; MARKETING_VERSION = 1.0.0; - PRODUCT_BUNDLE_IDENTIFIER = com.mlxserver.app; + PRODUCT_BUNDLE_IDENTIFIER = de.rfc1437.mlxserver; PRODUCT_NAME = "MLX Server"; SDKROOT = macosx; SWIFT_VERSION = 6.0; diff --git a/MLXServer/Commands/SaveChatCommands.swift b/MLXServer/Commands/SaveChatCommands.swift new file mode 100644 index 0000000..532559b --- /dev/null +++ b/MLXServer/Commands/SaveChatCommands.swift @@ -0,0 +1,16 @@ +import SwiftUI + +/// Adds "Export Chat…" to the File menu. +struct SaveChatCommands: Commands { + @FocusedBinding(\.exportTrigger) var isExporting + + var body: some Commands { + CommandGroup(after: .saveItem) { + Button("Export Chat…") { + isExporting = true + } + .keyboardShortcut("e", modifiers: [.command, .shift]) + .disabled(isExporting == nil) + } + } +} diff --git a/MLXServer/ContentView.swift b/MLXServer/ContentView.swift index e9d54e0..2537aba 100644 --- a/MLXServer/ContentView.swift +++ b/MLXServer/ContentView.swift @@ -1,10 +1,12 @@ import SwiftUI +import UniformTypeIdentifiers struct ContentView: View { @Environment(ModelManager.self) private var modelManager @State private var chatVM: ChatViewModel? @State private var showLoadError = false @State private var showMonitor = false + @State private var isExporting = false var body: some View { mainContent @@ -52,6 +54,21 @@ struct ContentView: View { .background { modelSwitchShortcuts } + // Expose export trigger to menu bar command + .focusedSceneValue(\.exportTrigger, $isExporting) + .fileExporter( + isPresented: $isExporting, + document: ChatExportDocument( + messages: chatVM?.conversation.messages ?? [], + modelName: modelManager.currentModel?.displayName + ), + contentTypes: ChatExportDocument.writableContentTypes, + defaultFilename: "chat" + ) { result in + if case .failure(let error) = result { + print("[Export] Failed: \(error.localizedDescription)") + } + } } @ViewBuilder diff --git a/MLXServer/MLXServerApp.swift b/MLXServer/MLXServerApp.swift index dd0bad3..d542317 100644 --- a/MLXServer/MLXServerApp.swift +++ b/MLXServer/MLXServerApp.swift @@ -23,6 +23,9 @@ struct MLXServerApp: App { } .windowStyle(.titleBar) .defaultSize(width: 800, height: 700) + .commands { + SaveChatCommands() + } #if os(macOS) Settings { diff --git a/MLXServer/Utilities/ChatExporter.swift b/MLXServer/Utilities/ChatExporter.swift new file mode 100644 index 0000000..46238b1 --- /dev/null +++ b/MLXServer/Utilities/ChatExporter.swift @@ -0,0 +1,290 @@ +import AppKit +import Foundation +import SwiftUI +import UniformTypeIdentifiers + +/// A FileDocument that exports a chat conversation as Markdown or RTF. +struct ChatExportDocument: FileDocument { + static var readableContentTypes: [UTType] { [.plainText] } + static var writableContentTypes: [UTType] { + [UTType(filenameExtension: "md") ?? .plainText, .rtf] + } + + let messages: [ChatMessage] + let modelName: String? + + init(messages: [ChatMessage], modelName: String?) { + self.messages = messages + self.modelName = modelName + } + + init(configuration: ReadConfiguration) throws { + self.messages = [] + self.modelName = nil + } + + func fileWrapper(configuration: WriteConfiguration) throws -> FileWrapper { + let contentType = configuration.contentType + + if contentType == .rtf, let data = ChatExporter.exportRTF(messages: messages, modelName: modelName) { + return FileWrapper(regularFileWithContents: data) + } else { + let md = ChatExporter.exportMarkdown(messages: messages, modelName: modelName) + return FileWrapper(regularFileWithContents: Data(md.utf8)) + } + } +} + +/// Exports a chat conversation to Markdown or RTF (Pages-compatible) format. +enum ChatExporter { + + // MARK: - Markdown export + + static func exportMarkdown(messages: [ChatMessage], modelName: String?) -> String { + var lines: [String] = [] + + // Header + lines.append("# Chat Session") + if let modelName { + lines.append("**Model:** \(modelName)") + } + let formatter = DateFormatter() + formatter.dateStyle = .long + formatter.timeStyle = .short + if let first = messages.first { + lines.append("**Date:** \(formatter.string(from: first.timestamp))") + } + lines.append("") + lines.append("---") + lines.append("") + + for message in messages { + guard message.role != .system else { continue } + + if message.role == .user { + // User messages as blockquotes + lines.append("**You:**") + lines.append("") + for line in message.content.components(separatedBy: "\n") { + lines.append("> \(line)") + } + } else { + // Assistant messages: carry over original markdown + lines.append("**Assistant:**") + lines.append("") + lines.append(message.content) + } + + lines.append("") + lines.append("---") + lines.append("") + } + + return lines.joined(separator: "\n") + } + + // MARK: - RTF export + + static func exportRTF(messages: [ChatMessage], modelName: String?) -> Data? { + let doc = NSMutableAttributedString() + + let bodyFont = NSFont.systemFont(ofSize: 13) + let bodyBoldFont = NSFont.boldSystemFont(ofSize: 13) + let titleFont = NSFont.boldSystemFont(ofSize: 20) + let metaFont = NSFont.systemFont(ofSize: 11) + let codeFont = NSFont.monospacedSystemFont(ofSize: 12, weight: .regular) + + let bodyParagraph = NSMutableParagraphStyle() + bodyParagraph.paragraphSpacing = 8 + bodyParagraph.lineSpacing = 2 + + let userParagraph = NSMutableParagraphStyle() + userParagraph.paragraphSpacing = 8 + userParagraph.lineSpacing = 2 + userParagraph.headIndent = 20 + userParagraph.firstLineHeadIndent = 20 + + // Title + doc.append(NSAttributedString( + string: "Chat Session\n", + attributes: [.font: titleFont, .paragraphStyle: bodyParagraph] + )) + + // Metadata + let formatter = DateFormatter() + formatter.dateStyle = .long + formatter.timeStyle = .short + var metaText = "" + if let modelName { metaText += "Model: \(modelName) " } + if let first = messages.first { + metaText += "Date: \(formatter.string(from: first.timestamp))" + } + if !metaText.isEmpty { + doc.append(NSAttributedString( + string: metaText + "\n\n", + attributes: [.font: metaFont, .foregroundColor: NSColor.secondaryLabelColor] + )) + } + + for message in messages { + guard message.role != .system else { continue } + + if message.role == .user { + doc.append(NSAttributedString( + string: "You\n", + attributes: [ + .font: bodyBoldFont, + .foregroundColor: NSColor.systemBlue, + ] + )) + doc.append(NSAttributedString( + string: message.content + "\n\n", + attributes: [ + .font: bodyFont, + .paragraphStyle: userParagraph, + .foregroundColor: NSColor.labelColor, + ] + )) + } else { + doc.append(NSAttributedString( + string: "Assistant\n", + attributes: [ + .font: bodyBoldFont, + .foregroundColor: NSColor.labelColor, + ] + )) + let rendered = renderMarkdown(message.content, bodyFont: bodyFont, codeFont: codeFont, paragraph: bodyParagraph) + doc.append(rendered) + doc.append(NSAttributedString(string: "\n\n")) + } + + doc.append(NSAttributedString( + string: "\n", + attributes: [ + .strikethroughStyle: NSUnderlineStyle.single.rawValue, + .strikethroughColor: NSColor.separatorColor, + .font: NSFont.systemFont(ofSize: 4), + ] + )) + } + + return doc.rtf(from: NSRange(location: 0, length: doc.length), documentAttributes: [ + .documentType: NSAttributedString.DocumentType.rtf, + ]) + } + + // MARK: - Markdown → NSAttributedString (basic) + + private static func renderMarkdown( + _ text: String, + bodyFont: NSFont, + codeFont: NSFont, + paragraph: NSParagraphStyle + ) -> NSAttributedString { + let result = NSMutableAttributedString() + let lines = text.components(separatedBy: "\n") + var inCodeBlock = false + var codeBlockLines: [String] = [] + + for line in lines { + if line.hasPrefix("```") { + if inCodeBlock { + let code = codeBlockLines.joined(separator: "\n") + let codePara = NSMutableParagraphStyle() + codePara.paragraphSpacing = 4 + codePara.headIndent = 12 + codePara.firstLineHeadIndent = 12 + result.append(NSAttributedString( + string: code + "\n", + attributes: [ + .font: codeFont, + .foregroundColor: NSColor.secondaryLabelColor, + .backgroundColor: NSColor.quaternaryLabelColor, + .paragraphStyle: codePara, + ] + )) + codeBlockLines = [] + inCodeBlock = false + } else { + inCodeBlock = true + } + continue + } + + if inCodeBlock { + codeBlockLines.append(line) + continue + } + + if line.hasPrefix("### ") { + result.append(NSAttributedString( + string: String(line.dropFirst(4)) + "\n", + attributes: [.font: NSFont.boldSystemFont(ofSize: 14), .paragraphStyle: paragraph] + )) + } else if line.hasPrefix("## ") { + result.append(NSAttributedString( + string: String(line.dropFirst(3)) + "\n", + attributes: [.font: NSFont.boldSystemFont(ofSize: 15), .paragraphStyle: paragraph] + )) + } else if line.hasPrefix("# ") { + result.append(NSAttributedString( + string: String(line.dropFirst(2)) + "\n", + attributes: [.font: NSFont.boldSystemFont(ofSize: 17), .paragraphStyle: paragraph] + )) + } else { + let styled = applyInlineFormatting(line, bodyFont: bodyFont, codeFont: codeFont) + result.append(styled) + result.append(NSAttributedString(string: "\n", attributes: [.font: bodyFont])) + } + } + + return result + } + + private static func applyInlineFormatting( + _ text: String, + bodyFont: NSFont, + codeFont: NSFont + ) -> NSAttributedString { + let result = NSMutableAttributedString() + var remaining = text[text.startIndex...] + + while !remaining.isEmpty { + if remaining.hasPrefix("`"), let end = remaining.dropFirst().firstIndex(of: "`") { + let code = String(remaining[remaining.index(after: remaining.startIndex).. +} + +extension FocusedValues { + var exportTrigger: Binding? { + get { self[FocusedExportTriggerKey.self] } + set { self[FocusedExportTriggerKey.self] = newValue } + } +} diff --git a/MLXServer/Utilities/LocalModelResolver.swift b/MLXServer/Utilities/LocalModelResolver.swift index c55eb41..2ce904b 100644 --- a/MLXServer/Utilities/LocalModelResolver.swift +++ b/MLXServer/Utilities/LocalModelResolver.swift @@ -1,75 +1,28 @@ import Foundation -/// Resolves HuggingFace model repos to local snapshot directories, -/// matching the cache layout used by Python's `huggingface_hub`. +/// Resolves HuggingFace model repos to local directories. /// -/// Checks two locations: -/// 1. App sandbox container: ~/Library/Containers/com.mlxserver.app/.../huggingface/hub/ -/// 2. System-wide cache: ~/.cache/huggingface/hub/ (shared with Python tools) -/// -/// Cache structure: -/// .../huggingface/hub/models--{org}--{name}/snapshots/{hash}/ +/// HubApi(downloadBase: .cachesDirectory, cache: nil) downloads models to: +/// ~/Library/Containers/de.rfc1437.mlxserver/Data/Library/Caches/models/{org}/{name}/ enum LocalModelResolver { - /// All HuggingFace cache directories to search, in priority order. - /// The sandboxed container path is checked first (where the app downloads to), - /// then the system-wide Python cache (for models downloaded via huggingface-cli). - private static let cacheBases: [URL] = { - var bases: [URL] = [] - - // 1. Sandboxed app container cache (where swift-transformers Hub downloads to) - let containerCache = FileManager.default.homeDirectoryForCurrentUser - .appendingPathComponent("Library/Caches/huggingface/hub", isDirectory: true) - bases.append(containerCache) - - // 2. System-wide ~/.cache/huggingface/hub/ (Python huggingface_hub) - // When sandboxed, homeDirectory points to the container, so construct the real path. - let realHome = URL(fileURLWithPath: NSHomeDirectory()) - let systemCache = realHome - .appendingPathComponent(".cache/huggingface/hub", isDirectory: true) - // Avoid duplicate if they resolve to the same path - if systemCache.path != containerCache.path { - bases.append(systemCache) - } - - // 3. Also try the unsandboxed home directory path - let globalHome = FileManager.default.homeDirectoryForCurrentUser - .appendingPathComponent(".cache/huggingface/hub", isDirectory: true) - if globalHome.path != containerCache.path && globalHome.path != systemCache.path { - bases.append(globalHome) - } - - return bases + /// Base directory where HubApi stores downloaded models. + private static let modelsBase: URL? = { + FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first? + .appendingPathComponent("models", isDirectory: true) }() /// Resolve a HuggingFace repo ID (e.g. "mlx-community/gemma-3-4b-it-4bit") - /// to its local snapshot directory, if it exists. + /// to its local directory, if it exists. /// /// Returns `nil` if the model hasn't been downloaded yet. static func resolve(repoId: String) -> URL? { - let dirName = "models--" + repoId.replacingOccurrences(of: "/", with: "--") - - for cacheBase in cacheBases { - let snapshotsDir = cacheBase - .appendingPathComponent(dirName, isDirectory: true) - .appendingPathComponent("snapshots", isDirectory: true) - - guard let contents = try? FileManager.default.contentsOfDirectory( - at: snapshotsDir, - includingPropertiesForKeys: [.isDirectoryKey], - options: [.skipsHiddenFiles] - ) else { - continue - } - - if let snapshot = contents - .filter({ (try? $0.resourceValues(forKeys: [.isDirectoryKey]).isDirectory) == true }) - .sorted(by: { $0.lastPathComponent < $1.lastPathComponent }) - .last { - return snapshot - } + guard let base = modelsBase else { return nil } + let modelDir = base.appendingPathComponent(repoId, isDirectory: true) + var isDir: ObjCBool = false + if FileManager.default.fileExists(atPath: modelDir.path, isDirectory: &isDir), isDir.boolValue { + return modelDir } - return nil } @@ -79,39 +32,18 @@ enum LocalModelResolver { } /// Delete the local cache for a model so it will be re-downloaded next time. - /// Removes from all cache locations. - /// Returns true if something was deleted. @discardableResult static func deleteLocal(repoId: String) -> Bool { - let dirName = "models--" + repoId.replacingOccurrences(of: "/", with: "--") - var deleted = false - - for cacheBase in cacheBases { - let modelDir = cacheBase.appendingPathComponent(dirName, isDirectory: true) - guard FileManager.default.fileExists(atPath: modelDir.path) else { continue } - do { - try FileManager.default.removeItem(at: modelDir) - print("[LocalModelResolver] Deleted \(modelDir.path)") - deleted = true - } catch { - print("[LocalModelResolver] Failed to delete \(modelDir.path): \(error)") - } + guard let base = modelsBase else { return false } + let modelDir = base.appendingPathComponent(repoId, isDirectory: true) + guard FileManager.default.fileExists(atPath: modelDir.path) else { return false } + do { + try FileManager.default.removeItem(at: modelDir) + print("[LocalModelResolver] Deleted \(modelDir.path)") + return true + } catch { + print("[LocalModelResolver] Failed to delete \(modelDir.path): \(error)") + return false } - - // Also clean up the per-model cache in the container (used by swift-transformers) - let containerModelsDir = FileManager.default.homeDirectoryForCurrentUser - .appendingPathComponent("Library/Caches/models", isDirectory: true) - .appendingPathComponent(repoId, isDirectory: true) - if FileManager.default.fileExists(atPath: containerModelsDir.path) { - do { - try FileManager.default.removeItem(at: containerModelsDir) - print("[LocalModelResolver] Deleted \(containerModelsDir.path)") - deleted = true - } catch { - print("[LocalModelResolver] Failed to delete \(containerModelsDir.path): \(error)") - } - } - - return deleted } } diff --git a/MLXServer/ViewModels/ModelManager.swift b/MLXServer/ViewModels/ModelManager.swift index ec83d74..aa22335 100644 --- a/MLXServer/ViewModels/ModelManager.swift +++ b/MLXServer/ViewModels/ModelManager.swift @@ -12,7 +12,12 @@ final class ModelManager { /// HubApi with blob cache disabled to avoid storing every model twice. /// swift-huggingface defaults to caching in both huggingface/hub/ (snapshots) /// AND models/ (content-addressed blobs). We only need the snapshots. - private static let hub = HubApi(cache: nil) + /// Must use the same downloadBase as defaultHubApi (.cachesDirectory) so + /// LocalModelResolver can find downloaded models. + private static let hub: HubApi = { + let cachesDir = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first + return HubApi(downloadBase: cachesDir, cache: nil) + }() var currentModel: ModelConfig? var modelContainer: ModelContainer? var isLoading = false @@ -52,7 +57,6 @@ final class ModelManager { } do { - let container: ModelContainer let progressHandler: @Sendable (Progress) -> Void = { progress in Task { @MainActor in self.downloadProgress = progress.fractionCompleted @@ -73,7 +77,7 @@ final class ModelManager { configuration = config.modelConfiguration } - container = try await VLMModelFactory.shared.loadContainer( + let container = try await VLMModelFactory.shared.loadContainer( hub: Self.hub, configuration: configuration, progressHandler: progressHandler diff --git a/README.md b/README.md index 4bcbb2d..1075e12 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # MLX Server -Native macOS app for running local LLMs on Apple Silicon via [MLX](https://github.com/ml-explore/mlx). Built with SwiftUI, it provides both a **chat UI** and an embedded **OpenAI-compatible API server**. Supports vision and tool use with automatic model swapping. +Native macOS app for running local LLMs on Apple Silicon via [MLX](https://github.com/ml-explore/mlx). Built with SwiftUI, it provides both a **chat UI** and an embedded **OpenAI-compatible API server**. Supports vision, tool use, and thinking mode. ## Supported Models @@ -8,6 +8,9 @@ Native macOS app for running local LLMs on Apple Silicon via [MLX](https://githu |-------|-------|---------|-------------| | `gemma` | `mlx-community/gemma-3-4b-it-4bit` | 128k | Vision, tool use (`tool_code` blocks) | | `qwen` | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | 256k | Vision, tool use (`` tags) | +| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | 256k | Thinking mode, tool use | + +Any model in MLX format on HuggingFace can be added — there is no restriction on uploader or architecture. ## Quick Start @@ -20,12 +23,16 @@ open "build/Debug/MLX Server.app" ## App Features -- **Chat interface** with markdown rendering, image attachments (file picker, drag & drop, clipboard paste) -- **Model picker** in toolbar with local/download status indicators +- **Chat interface** with markdown rendering, image attachments (file picker, drag & drop, clipboard paste, Finder copy-paste) +- **Model picker** in toolbar with local/download status indicators and re-download button +- **Download progress modal** — shows file progress, percentage, and speed when downloading a new model +- **Thinking mode** — models like Qwen3.5 can reason internally before responding; thinking content appears in a collapsible box. Toggle on/off in Settings. - **Streaming responses** with live token display +- **Export chat** — File > Export Chat (Cmd+Shift+S) saves conversations as Markdown or RTF (Pages-compatible) - **Status bar** showing model name, context window, tokens/sec, token counts, GPU memory, API server status -- **Keyboard shortcuts**: `Cmd+N` (new chat), `Cmd+Return` (send), `Escape` (stop), `Cmd+1/2/3` (switch models) -- **Settings** (`Cmd+,`): system prompt, API port, API auto-start +- **Keyboard shortcuts**: `Cmd+N` (new chat), `Cmd+Return` (send), `Escape` (stop), `Cmd+1/2/3/4` (switch models), `Cmd+Shift+S` (export) +- **Settings** (`Cmd+,`): default model, thinking mode toggle, system prompt, API port, API auto-start, idle unload timeout +- **Idle auto-unload** — model is unloaded after configurable idle time (resets on both user input and model output), reloaded on next request ## API Server @@ -74,23 +81,29 @@ MLXServer/ ├── ContentView.swift — Main layout, toolbar, keyboard shortcuts ├── Models/ │ ├── ModelConfig.swift — Model definitions, alias/repoId resolution -│ └── ChatMessage.swift — Chat message data model +│ └── ChatMessage.swift — Chat message data model, thinking tag parser ├── ViewModels/ -│ ├── ModelManager.swift — Model loading/switching via VLMModelFactory +│ ├── ModelManager.swift — Model loading/switching, download tracking, idle unload │ └── ChatViewModel.swift — Chat state, ChatSession, API server lifecycle ├── Views/ -│ ├── ModelPickerView.swift — Toolbar model selector -│ ├── ChatMessagesView.swift — Scrollable message list with markdown -│ ├── ChatInputView.swift — Text input + image attach +│ ├── ModelPickerView.swift — Toolbar model selector with re-download +│ ├── ChatMessagesView.swift — Scrollable message list with markdown + thinking blocks +│ ├── ChatInputView.swift — Text input + image attach (paste, drag, picker) +│ ├── DownloadModalView.swift — Model download progress overlay │ ├── StatusBarView.swift — Model info, tok/s, GPU memory, API status -│ └── SettingsView.swift — System prompt + API settings +│ ├── MonitorView.swift — Inference statistics monitor +│ └── SettingsView.swift — System prompt, thinking mode, API, idle settings +├── Commands/ +│ └── SaveChatCommands.swift — File menu export command ├── Server/ │ ├── APIServer.swift — NWListener HTTP server, SSE streaming, KV cache reuse │ ├── APIModels.swift — OpenAI-compatible Codable structs │ ├── ToolCallParser.swift — Parses tool calls from model output │ └── ToolPromptBuilder.swift — Model-specific tool prompt formatting └── Utilities/ - ├── LocalModelResolver.swift — Offline-first HuggingFace cache resolution + ├── LocalModelResolver.swift — Offline-first HuggingFace cache resolution (sandbox + system) + ├── ChatExporter.swift — Export conversations to Markdown or RTF + ├── FocusedValues.swift — FocusedValue keys for menu bar integration └── Preferences.swift — UserDefaults wrapper project.yml — xcodegen project spec (dependencies, settings, deployment target) @@ -99,17 +112,11 @@ build.sh — One-command build script (xcodegen + xcodebuild) ## Key Design Decisions -- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) for inference — supports both text and vision in a single model load -- **Offline-first**: `LocalModelResolver` checks `~/.cache/huggingface/hub/` for locally-cached snapshots before downloading +- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) for inference — loads any MLX-format model from HuggingFace +- **Offline-first**: `LocalModelResolver` checks both the sandboxed app container and `~/.cache/huggingface/hub/` for locally-cached models before downloading +- **No duplicate storage**: custom `HubApi` with blob cache disabled — models are stored once in the snapshot cache - **KV cache reuse** across API requests — reuses `ChatSession` when conversation history prefix matches +- **Thinking mode**: `enable_thinking` passed via Jinja template context; `` tags parsed in real-time during streaming - HTTP server built on `Network.framework` (`NWListener`) — no third-party server dependencies - Model-specific prompt formatting: Gemma uses `tool_code` blocks, Qwen uses `` XML tags - GPU cache limit set to 20 MB; cache cleared on model unload - -## Design Notes - -- Uses `mlx_vlm` (not `mlx_lm`) as the backend — supports both text and vision in a single model load -- Offline-first: if the model is cached locally (`~/.cache/huggingface/hub/`), no network requests are made -- Thread lock on generation — MLX models aren't safe for concurrent generation -- KV prefix caching for multi-turn conversations -- Context window read from each model's config (Gemma 3 4B: 128k, Qwen3-VL 4B: 256k) with automatic summarization fallback diff --git a/project.yml b/project.yml index 43b11d0..c49ab3d 100644 --- a/project.yml +++ b/project.yml @@ -22,7 +22,7 @@ targets: - MLXServer settings: base: - PRODUCT_BUNDLE_IDENTIFIER: com.mlxserver.app + PRODUCT_BUNDLE_IDENTIFIER: de.rfc1437.mlxserver PRODUCT_NAME: MLX Server MARKETING_VERSION: "1.0.0" CURRENT_PROJECT_VERSION: "1"