feat: first tries at save dialog, so far failing

This commit is contained in:
2026-03-18 11:40:43 +01:00
parent af8b8c9532
commit 82a77fdb0a
11 changed files with 445 additions and 128 deletions

View File

@@ -1,6 +1,6 @@
# MLX Server
Native macOS SwiftUI app for local LLMs on Apple Silicon via MLX. Provides a chat UI and an embedded OpenAI-compatible API server. Supports vision and tool use.
Native macOS SwiftUI app for local LLMs on Apple Silicon via MLX. Provides a chat UI and an embedded OpenAI-compatible API server. Supports vision, tool use, and thinking mode.
## Quick Start
@@ -14,18 +14,24 @@ open "build/Debug/MLX Server.app"
## Project Structure
- `MLXServer/MLXServerApp.swift` — App entry point, GPU cache config
- `MLXServer/ContentView.swift` — Main layout, toolbar, keyboard shortcuts
- `MLXServer/MLXServerApp.swift` — App entry point, GPU cache config, menu commands
- `MLXServer/ContentView.swift` — Main layout, toolbar, keyboard shortcuts, focused values
- `MLXServer/Models/ModelConfig.swift` — Model definitions (alias, repoId, contextLength), resolution
- `MLXServer/Models/ChatMessage.swift` — Chat message data model
- `MLXServer/ViewModels/ModelManager.swift` — Model loading/switching via VLMModelFactory, offline-first resolution
- `MLXServer/Models/ChatMessage.swift` — Chat message data model, `<think>` tag parsing
- `MLXServer/ViewModels/ModelManager.swift` — Model loading/switching via VLMModelFactory, download tracking, idle unload
- `MLXServer/ViewModels/ChatViewModel.swift` — Chat state, ChatSession management, API server lifecycle
- `MLXServer/Server/APIServer.swift` — NWListener HTTP server, SSE streaming, KV cache reuse, vision, tool call handling
- `MLXServer/Server/APIModels.swift` — OpenAI-compatible Codable structs
- `MLXServer/Server/ToolCallParser.swift` — Parses tool calls from model output (Gemma tool_code, Qwen XML tags)
- `MLXServer/Server/ToolPromptBuilder.swift` — Model-specific tool prompt formatting
- `MLXServer/Utilities/LocalModelResolver.swift` — Resolves HF repo IDs to ~/.cache/huggingface/hub/ snapshots
- `MLXServer/Utilities/Preferences.swift` — UserDefaults wrapper
- `MLXServer/Views/DownloadModalView.swift` — Modal overlay for model download progress
- `MLXServer/Views/ChatMessagesView.swift` — Message bubbles with markdown rendering and collapsible thinking blocks
- `MLXServer/Views/ChatInputView.swift` — Text input, image attach (file picker, drag & drop, Finder copy-paste)
- `MLXServer/Commands/SaveChatCommands.swift` — File > Export Chat menu command
- `MLXServer/Utilities/LocalModelResolver.swift` — Resolves HF repo IDs to local snapshots (sandbox + system cache + flat layouts)
- `MLXServer/Utilities/ChatExporter.swift` — Export conversations to Markdown or RTF (Pages-compatible)
- `MLXServer/Utilities/FocusedValues.swift` — FocusedValue keys for menu bar integration
- `MLXServer/Utilities/Preferences.swift` — UserDefaults wrapper (model, thinking mode, API, idle timeout)
- `project.yml` — xcodegen project spec
- `build.sh` — Build script (xcodegen + xcodebuild)
@@ -35,6 +41,9 @@ open "build/Debug/MLX Server.app"
|-------|---------------|-------|
| `gemma` | `mlx-community/gemma-3-4b-it-4bit` | Vision + tool use via `tool_code` blocks (128k context) |
| `qwen` | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | Vision + tool use via `<tool_call>` tags (256k context) |
| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | Thinking mode, tool use (256k context) |
Any model in MLX format on HuggingFace can be added — no restriction on uploader or architecture.
## Critical Performance Rule
@@ -47,9 +56,15 @@ open "build/Debug/MLX Server.app"
## Key Design Decisions
- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) as the inference backend — supports both text and vision in a single model load
- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) as the inference backend — loads any MLX-format model from HuggingFace
- Model-specific prompt formatting: Gemma uses `tool_code` blocks; Qwen uses `<tool_call>` XML tags
- Offline-first: if the model is already cached locally (~/.cache/huggingface/hub/), `LocalModelResolver` resolves the local snapshot path directly — no network requests
- **Offline-first**: `LocalModelResolver` checks the sandboxed app container, system `~/.cache/huggingface/hub/`, and flat download layouts — no network requests if model is cached
- **No duplicate storage**: custom `HubApi(cache: nil)` with explicit `downloadBase` — models stored once in the snapshot cache, not duplicated across blob cache and snapshots
- **Thinking mode**: `enable_thinking` passed to Jinja template context via `additionalContext`; `<think>...</think>` tags parsed in real-time during streaming and shown in collapsible UI blocks. Toggleable in Settings.
- **Download progress**: separate `isDownloading` state from `isLoading`; modal overlay shows file count, percentage, speed
- **Idle unload**: timer resets on both user input and model generation completion (not just request start)
- **Chat export**: Markdown (user messages as blockquotes) and RTF (Pages-compatible with formatted markdown)
- **Finder paste**: local event monitor intercepts Cmd+V to check pasteboard for image file URLs before TextField handles it
- HTTP server built on `Network.framework` (`NWListener`) — no third-party server dependencies
- KV cache reuse across API requests — reuses `ChatSession` when conversation history prefix matches
- GPU cache limit set to 20 MB; cache cleared on model unload

View File

@@ -10,9 +10,12 @@
0168AEE16009097901363E16 /* ModelManager.swift in Sources */ = {isa = PBXBuildFile; fileRef = 922CBDC9206737BD04AF2874 /* ModelManager.swift */; };
165E8AB6ADAE1D59B1A86420 /* Preferences.swift in Sources */ = {isa = PBXBuildFile; fileRef = 145B888FBDD4F931512C5473 /* Preferences.swift */; };
189362AAE2CDE5D4B3428334 /* ToolCallParser.swift in Sources */ = {isa = PBXBuildFile; fileRef = E73B165A1822729C907791AE /* ToolCallParser.swift */; };
29879D696584B96CC56560DF /* ChatExporter.swift in Sources */ = {isa = PBXBuildFile; fileRef = D7C9BAD674E29688ACE53B0B /* ChatExporter.swift */; };
2CAAF7129F7CC45200FA9F6B /* ModelPickerView.swift in Sources */ = {isa = PBXBuildFile; fileRef = C3C3A76C02AF70A9D8F868FC /* ModelPickerView.swift */; };
2D08769282BD71C170DB0943 /* InferenceStats.swift in Sources */ = {isa = PBXBuildFile; fileRef = E35452B166893B25E765FF70 /* InferenceStats.swift */; };
4158FA884D981D73288FB74C /* SaveChatCommands.swift in Sources */ = {isa = PBXBuildFile; fileRef = 2E2FCA55CEBEBCED78D9479A /* SaveChatCommands.swift */; };
4CB13DC1AC7A500DDBB443EC /* ChatInputView.swift in Sources */ = {isa = PBXBuildFile; fileRef = E5E6AD02CDF23BDAB64700A7 /* ChatInputView.swift */; };
4DC033E45880B2948B47DEB1 /* FocusedValues.swift in Sources */ = {isa = PBXBuildFile; fileRef = EF518FEBF3A38E830E3CE1A5 /* FocusedValues.swift */; };
50B6861FF8610B3ED4FFAD9D /* MLXServerApp.swift in Sources */ = {isa = PBXBuildFile; fileRef = C67742651DB486871CEF1612 /* MLXServerApp.swift */; };
50DD129CCF2843482DEC3B96 /* APIServer.swift in Sources */ = {isa = PBXBuildFile; fileRef = 3D08828E16B17EF02C14243E /* APIServer.swift */; };
5946258F1DE88CE904584E0B /* ContentView.swift in Sources */ = {isa = PBXBuildFile; fileRef = 944C699FBB76C734C9DF2F2E /* ContentView.swift */; };
@@ -38,6 +41,7 @@
145B888FBDD4F931512C5473 /* Preferences.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = Preferences.swift; sourceTree = "<group>"; };
16AE82A64D1D07AE3CD8D33A /* ToolPromptBuilder.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ToolPromptBuilder.swift; sourceTree = "<group>"; };
2DC8C86D397B1FCA08E07CBD /* DownloadModalView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = DownloadModalView.swift; sourceTree = "<group>"; };
2E2FCA55CEBEBCED78D9479A /* SaveChatCommands.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = SaveChatCommands.swift; sourceTree = "<group>"; };
38DFC212AF4359A45FBE22BA /* ModelConfig.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ModelConfig.swift; sourceTree = "<group>"; };
3AF462805202797F61422AEE /* MLXServer.entitlements */ = {isa = PBXFileReference; lastKnownFileType = text.plist.entitlements; path = MLXServer.entitlements; sourceTree = "<group>"; };
3D08828E16B17EF02C14243E /* APIServer.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = APIServer.swift; sourceTree = "<group>"; };
@@ -53,10 +57,12 @@
C3C3A76C02AF70A9D8F868FC /* ModelPickerView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ModelPickerView.swift; sourceTree = "<group>"; };
C67742651DB486871CEF1612 /* MLXServerApp.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = MLXServerApp.swift; sourceTree = "<group>"; };
D733A0D1D4AC25DDDA6C8684 /* LocalModelResolver.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = LocalModelResolver.swift; sourceTree = "<group>"; };
D7C9BAD674E29688ACE53B0B /* ChatExporter.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ChatExporter.swift; sourceTree = "<group>"; };
DB1A5E8B1C9F2BC4D262C53A /* ChatMessagesView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ChatMessagesView.swift; sourceTree = "<group>"; };
E35452B166893B25E765FF70 /* InferenceStats.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = InferenceStats.swift; sourceTree = "<group>"; };
E5E6AD02CDF23BDAB64700A7 /* ChatInputView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ChatInputView.swift; sourceTree = "<group>"; };
E73B165A1822729C907791AE /* ToolCallParser.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ToolCallParser.swift; sourceTree = "<group>"; };
EF518FEBF3A38E830E3CE1A5 /* FocusedValues.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = FocusedValues.swift; sourceTree = "<group>"; };
F1A52E2C9964ADA9D841A89B /* APIModels.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = APIModels.swift; sourceTree = "<group>"; };
/* End PBXFileReference section */
@@ -78,6 +84,8 @@
05B1BAE308E64D2FB2E73823 /* Utilities */ = {
isa = PBXGroup;
children = (
D7C9BAD674E29688ACE53B0B /* ChatExporter.swift */,
EF518FEBF3A38E830E3CE1A5 /* FocusedValues.swift */,
D733A0D1D4AC25DDDA6C8684 /* LocalModelResolver.swift */,
145B888FBDD4F931512C5473 /* Preferences.swift */,
);
@@ -99,6 +107,7 @@
944C699FBB76C734C9DF2F2E /* ContentView.swift */,
3AF462805202797F61422AEE /* MLXServer.entitlements */,
C67742651DB486871CEF1612 /* MLXServerApp.swift */,
B459409ED6FD8797FDD81E94 /* Commands */,
BD0E350482D91238B4B59721 /* Models */,
E13C1AAA0C49D0ED85EFD94D /* Server */,
05B1BAE308E64D2FB2E73823 /* Utilities */,
@@ -122,6 +131,14 @@
path = Views;
sourceTree = "<group>";
};
B459409ED6FD8797FDD81E94 /* Commands */ = {
isa = PBXGroup;
children = (
2E2FCA55CEBEBCED78D9479A /* SaveChatCommands.swift */,
);
path = Commands;
sourceTree = "<group>";
};
BD0E350482D91238B4B59721 /* Models */ = {
isa = PBXGroup;
children = (
@@ -238,12 +255,14 @@
files = (
D96DDE66F76FDDA642629E17 /* APIModels.swift in Sources */,
50DD129CCF2843482DEC3B96 /* APIServer.swift in Sources */,
29879D696584B96CC56560DF /* ChatExporter.swift in Sources */,
4CB13DC1AC7A500DDBB443EC /* ChatInputView.swift in Sources */,
FAF7D4714AC6D02674920208 /* ChatMessage.swift in Sources */,
5C1E8FE1C521914CEF98D3AA /* ChatMessagesView.swift in Sources */,
B5AA6E3B4BE21676226B342B /* ChatViewModel.swift in Sources */,
5946258F1DE88CE904584E0B /* ContentView.swift in Sources */,
C07A377244DCD67F4FE709FE /* DownloadModalView.swift in Sources */,
4DC033E45880B2948B47DEB1 /* FocusedValues.swift in Sources */,
2D08769282BD71C170DB0943 /* InferenceStats.swift in Sources */,
6828CCA8B78AB40906F87CAB /* LocalModelResolver.swift in Sources */,
50B6861FF8610B3ED4FFAD9D /* MLXServerApp.swift in Sources */,
@@ -252,6 +271,7 @@
2CAAF7129F7CC45200FA9F6B /* ModelPickerView.swift in Sources */,
B1D9BC407DB7DB1489230C20 /* MonitorView.swift in Sources */,
165E8AB6ADAE1D59B1A86420 /* Preferences.swift in Sources */,
4158FA884D981D73288FB74C /* SaveChatCommands.swift in Sources */,
D666A311788375E8A061C832 /* SettingsView.swift in Sources */,
621B7E4382199AC1378F5F9C /* StatusBarView.swift in Sources */,
189362AAE2CDE5D4B3428334 /* ToolCallParser.swift in Sources */,
@@ -399,7 +419,7 @@
);
MACOSX_DEPLOYMENT_TARGET = 15.0;
MARKETING_VERSION = 1.0.0;
PRODUCT_BUNDLE_IDENTIFIER = com.mlxserver.app;
PRODUCT_BUNDLE_IDENTIFIER = de.rfc1437.mlxserver;
PRODUCT_NAME = "MLX Server";
SDKROOT = macosx;
SWIFT_VERSION = 6.0;
@@ -424,7 +444,7 @@
);
MACOSX_DEPLOYMENT_TARGET = 15.0;
MARKETING_VERSION = 1.0.0;
PRODUCT_BUNDLE_IDENTIFIER = com.mlxserver.app;
PRODUCT_BUNDLE_IDENTIFIER = de.rfc1437.mlxserver;
PRODUCT_NAME = "MLX Server";
SDKROOT = macosx;
SWIFT_VERSION = 6.0;

View File

@@ -0,0 +1,16 @@
import SwiftUI
/// Adds "Export Chat" to the File menu.
struct SaveChatCommands: Commands {
@FocusedBinding(\.exportTrigger) var isExporting
var body: some Commands {
CommandGroup(after: .saveItem) {
Button("Export Chat…") {
isExporting = true
}
.keyboardShortcut("e", modifiers: [.command, .shift])
.disabled(isExporting == nil)
}
}
}

View File

@@ -1,10 +1,12 @@
import SwiftUI
import UniformTypeIdentifiers
struct ContentView: View {
@Environment(ModelManager.self) private var modelManager
@State private var chatVM: ChatViewModel?
@State private var showLoadError = false
@State private var showMonitor = false
@State private var isExporting = false
var body: some View {
mainContent
@@ -52,6 +54,21 @@ struct ContentView: View {
.background {
modelSwitchShortcuts
}
// Expose export trigger to menu bar command
.focusedSceneValue(\.exportTrigger, $isExporting)
.fileExporter(
isPresented: $isExporting,
document: ChatExportDocument(
messages: chatVM?.conversation.messages ?? [],
modelName: modelManager.currentModel?.displayName
),
contentTypes: ChatExportDocument.writableContentTypes,
defaultFilename: "chat"
) { result in
if case .failure(let error) = result {
print("[Export] Failed: \(error.localizedDescription)")
}
}
}
@ViewBuilder

View File

@@ -23,6 +23,9 @@ struct MLXServerApp: App {
}
.windowStyle(.titleBar)
.defaultSize(width: 800, height: 700)
.commands {
SaveChatCommands()
}
#if os(macOS)
Settings {

View File

@@ -0,0 +1,290 @@
import AppKit
import Foundation
import SwiftUI
import UniformTypeIdentifiers
/// A FileDocument that exports a chat conversation as Markdown or RTF.
struct ChatExportDocument: FileDocument {
static var readableContentTypes: [UTType] { [.plainText] }
static var writableContentTypes: [UTType] {
[UTType(filenameExtension: "md") ?? .plainText, .rtf]
}
let messages: [ChatMessage]
let modelName: String?
init(messages: [ChatMessage], modelName: String?) {
self.messages = messages
self.modelName = modelName
}
init(configuration: ReadConfiguration) throws {
self.messages = []
self.modelName = nil
}
func fileWrapper(configuration: WriteConfiguration) throws -> FileWrapper {
let contentType = configuration.contentType
if contentType == .rtf, let data = ChatExporter.exportRTF(messages: messages, modelName: modelName) {
return FileWrapper(regularFileWithContents: data)
} else {
let md = ChatExporter.exportMarkdown(messages: messages, modelName: modelName)
return FileWrapper(regularFileWithContents: Data(md.utf8))
}
}
}
/// Exports a chat conversation to Markdown or RTF (Pages-compatible) format.
enum ChatExporter {
// MARK: - Markdown export
static func exportMarkdown(messages: [ChatMessage], modelName: String?) -> String {
var lines: [String] = []
// Header
lines.append("# Chat Session")
if let modelName {
lines.append("**Model:** \(modelName)")
}
let formatter = DateFormatter()
formatter.dateStyle = .long
formatter.timeStyle = .short
if let first = messages.first {
lines.append("**Date:** \(formatter.string(from: first.timestamp))")
}
lines.append("")
lines.append("---")
lines.append("")
for message in messages {
guard message.role != .system else { continue }
if message.role == .user {
// User messages as blockquotes
lines.append("**You:**")
lines.append("")
for line in message.content.components(separatedBy: "\n") {
lines.append("> \(line)")
}
} else {
// Assistant messages: carry over original markdown
lines.append("**Assistant:**")
lines.append("")
lines.append(message.content)
}
lines.append("")
lines.append("---")
lines.append("")
}
return lines.joined(separator: "\n")
}
// MARK: - RTF export
static func exportRTF(messages: [ChatMessage], modelName: String?) -> Data? {
let doc = NSMutableAttributedString()
let bodyFont = NSFont.systemFont(ofSize: 13)
let bodyBoldFont = NSFont.boldSystemFont(ofSize: 13)
let titleFont = NSFont.boldSystemFont(ofSize: 20)
let metaFont = NSFont.systemFont(ofSize: 11)
let codeFont = NSFont.monospacedSystemFont(ofSize: 12, weight: .regular)
let bodyParagraph = NSMutableParagraphStyle()
bodyParagraph.paragraphSpacing = 8
bodyParagraph.lineSpacing = 2
let userParagraph = NSMutableParagraphStyle()
userParagraph.paragraphSpacing = 8
userParagraph.lineSpacing = 2
userParagraph.headIndent = 20
userParagraph.firstLineHeadIndent = 20
// Title
doc.append(NSAttributedString(
string: "Chat Session\n",
attributes: [.font: titleFont, .paragraphStyle: bodyParagraph]
))
// Metadata
let formatter = DateFormatter()
formatter.dateStyle = .long
formatter.timeStyle = .short
var metaText = ""
if let modelName { metaText += "Model: \(modelName) " }
if let first = messages.first {
metaText += "Date: \(formatter.string(from: first.timestamp))"
}
if !metaText.isEmpty {
doc.append(NSAttributedString(
string: metaText + "\n\n",
attributes: [.font: metaFont, .foregroundColor: NSColor.secondaryLabelColor]
))
}
for message in messages {
guard message.role != .system else { continue }
if message.role == .user {
doc.append(NSAttributedString(
string: "You\n",
attributes: [
.font: bodyBoldFont,
.foregroundColor: NSColor.systemBlue,
]
))
doc.append(NSAttributedString(
string: message.content + "\n\n",
attributes: [
.font: bodyFont,
.paragraphStyle: userParagraph,
.foregroundColor: NSColor.labelColor,
]
))
} else {
doc.append(NSAttributedString(
string: "Assistant\n",
attributes: [
.font: bodyBoldFont,
.foregroundColor: NSColor.labelColor,
]
))
let rendered = renderMarkdown(message.content, bodyFont: bodyFont, codeFont: codeFont, paragraph: bodyParagraph)
doc.append(rendered)
doc.append(NSAttributedString(string: "\n\n"))
}
doc.append(NSAttributedString(
string: "\n",
attributes: [
.strikethroughStyle: NSUnderlineStyle.single.rawValue,
.strikethroughColor: NSColor.separatorColor,
.font: NSFont.systemFont(ofSize: 4),
]
))
}
return doc.rtf(from: NSRange(location: 0, length: doc.length), documentAttributes: [
.documentType: NSAttributedString.DocumentType.rtf,
])
}
// MARK: - Markdown NSAttributedString (basic)
private static func renderMarkdown(
_ text: String,
bodyFont: NSFont,
codeFont: NSFont,
paragraph: NSParagraphStyle
) -> NSAttributedString {
let result = NSMutableAttributedString()
let lines = text.components(separatedBy: "\n")
var inCodeBlock = false
var codeBlockLines: [String] = []
for line in lines {
if line.hasPrefix("```") {
if inCodeBlock {
let code = codeBlockLines.joined(separator: "\n")
let codePara = NSMutableParagraphStyle()
codePara.paragraphSpacing = 4
codePara.headIndent = 12
codePara.firstLineHeadIndent = 12
result.append(NSAttributedString(
string: code + "\n",
attributes: [
.font: codeFont,
.foregroundColor: NSColor.secondaryLabelColor,
.backgroundColor: NSColor.quaternaryLabelColor,
.paragraphStyle: codePara,
]
))
codeBlockLines = []
inCodeBlock = false
} else {
inCodeBlock = true
}
continue
}
if inCodeBlock {
codeBlockLines.append(line)
continue
}
if line.hasPrefix("### ") {
result.append(NSAttributedString(
string: String(line.dropFirst(4)) + "\n",
attributes: [.font: NSFont.boldSystemFont(ofSize: 14), .paragraphStyle: paragraph]
))
} else if line.hasPrefix("## ") {
result.append(NSAttributedString(
string: String(line.dropFirst(3)) + "\n",
attributes: [.font: NSFont.boldSystemFont(ofSize: 15), .paragraphStyle: paragraph]
))
} else if line.hasPrefix("# ") {
result.append(NSAttributedString(
string: String(line.dropFirst(2)) + "\n",
attributes: [.font: NSFont.boldSystemFont(ofSize: 17), .paragraphStyle: paragraph]
))
} else {
let styled = applyInlineFormatting(line, bodyFont: bodyFont, codeFont: codeFont)
result.append(styled)
result.append(NSAttributedString(string: "\n", attributes: [.font: bodyFont]))
}
}
return result
}
private static func applyInlineFormatting(
_ text: String,
bodyFont: NSFont,
codeFont: NSFont
) -> NSAttributedString {
let result = NSMutableAttributedString()
var remaining = text[text.startIndex...]
while !remaining.isEmpty {
if remaining.hasPrefix("`"), let end = remaining.dropFirst().firstIndex(of: "`") {
let code = String(remaining[remaining.index(after: remaining.startIndex)..<end])
result.append(NSAttributedString(
string: code,
attributes: [
.font: codeFont,
.foregroundColor: NSColor.secondaryLabelColor,
.backgroundColor: NSColor.quaternaryLabelColor,
]
))
remaining = remaining[remaining.index(after: end)...]
} else if remaining.hasPrefix("**"), let end = remaining.dropFirst(2).range(of: "**") {
let bold = String(remaining[remaining.index(remaining.startIndex, offsetBy: 2)..<end.lowerBound])
result.append(NSAttributedString(
string: bold,
attributes: [.font: NSFont.boldSystemFont(ofSize: bodyFont.pointSize)]
))
remaining = remaining[end.upperBound...]
} else if remaining.hasPrefix("*"), let end = remaining.dropFirst().firstIndex(of: "*") {
let italic = String(remaining[remaining.index(after: remaining.startIndex)..<end])
result.append(NSAttributedString(
string: italic,
attributes: [.font: NSFontManager.shared.convert(bodyFont, toHaveTrait: .italicFontMask)]
))
remaining = remaining[remaining.index(after: end)...]
} else {
let ch = remaining[remaining.startIndex]
result.append(NSAttributedString(
string: String(ch),
attributes: [.font: bodyFont]
))
remaining = remaining[remaining.index(after: remaining.startIndex)...]
}
}
return result
}
}

View File

@@ -0,0 +1,13 @@
import SwiftUI
/// Focused value key for triggering chat export from the menu bar.
struct FocusedExportTriggerKey: FocusedValueKey {
typealias Value = Binding<Bool>
}
extension FocusedValues {
var exportTrigger: Binding<Bool>? {
get { self[FocusedExportTriggerKey.self] }
set { self[FocusedExportTriggerKey.self] = newValue }
}
}

View File

@@ -1,75 +1,28 @@
import Foundation
/// Resolves HuggingFace model repos to local snapshot directories,
/// matching the cache layout used by Python's `huggingface_hub`.
/// Resolves HuggingFace model repos to local directories.
///
/// Checks two locations:
/// 1. App sandbox container: ~/Library/Containers/com.mlxserver.app/.../huggingface/hub/
/// 2. System-wide cache: ~/.cache/huggingface/hub/ (shared with Python tools)
///
/// Cache structure:
/// .../huggingface/hub/models--{org}--{name}/snapshots/{hash}/
/// HubApi(downloadBase: .cachesDirectory, cache: nil) downloads models to:
/// ~/Library/Containers/de.rfc1437.mlxserver/Data/Library/Caches/models/{org}/{name}/
enum LocalModelResolver {
/// All HuggingFace cache directories to search, in priority order.
/// The sandboxed container path is checked first (where the app downloads to),
/// then the system-wide Python cache (for models downloaded via huggingface-cli).
private static let cacheBases: [URL] = {
var bases: [URL] = []
// 1. Sandboxed app container cache (where swift-transformers Hub downloads to)
let containerCache = FileManager.default.homeDirectoryForCurrentUser
.appendingPathComponent("Library/Caches/huggingface/hub", isDirectory: true)
bases.append(containerCache)
// 2. System-wide ~/.cache/huggingface/hub/ (Python huggingface_hub)
// When sandboxed, homeDirectory points to the container, so construct the real path.
let realHome = URL(fileURLWithPath: NSHomeDirectory())
let systemCache = realHome
.appendingPathComponent(".cache/huggingface/hub", isDirectory: true)
// Avoid duplicate if they resolve to the same path
if systemCache.path != containerCache.path {
bases.append(systemCache)
}
// 3. Also try the unsandboxed home directory path
let globalHome = FileManager.default.homeDirectoryForCurrentUser
.appendingPathComponent(".cache/huggingface/hub", isDirectory: true)
if globalHome.path != containerCache.path && globalHome.path != systemCache.path {
bases.append(globalHome)
}
return bases
/// Base directory where HubApi stores downloaded models.
private static let modelsBase: URL? = {
FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first?
.appendingPathComponent("models", isDirectory: true)
}()
/// Resolve a HuggingFace repo ID (e.g. "mlx-community/gemma-3-4b-it-4bit")
/// to its local snapshot directory, if it exists.
/// to its local directory, if it exists.
///
/// Returns `nil` if the model hasn't been downloaded yet.
static func resolve(repoId: String) -> URL? {
let dirName = "models--" + repoId.replacingOccurrences(of: "/", with: "--")
for cacheBase in cacheBases {
let snapshotsDir = cacheBase
.appendingPathComponent(dirName, isDirectory: true)
.appendingPathComponent("snapshots", isDirectory: true)
guard let contents = try? FileManager.default.contentsOfDirectory(
at: snapshotsDir,
includingPropertiesForKeys: [.isDirectoryKey],
options: [.skipsHiddenFiles]
) else {
continue
}
if let snapshot = contents
.filter({ (try? $0.resourceValues(forKeys: [.isDirectoryKey]).isDirectory) == true })
.sorted(by: { $0.lastPathComponent < $1.lastPathComponent })
.last {
return snapshot
}
guard let base = modelsBase else { return nil }
let modelDir = base.appendingPathComponent(repoId, isDirectory: true)
var isDir: ObjCBool = false
if FileManager.default.fileExists(atPath: modelDir.path, isDirectory: &isDir), isDir.boolValue {
return modelDir
}
return nil
}
@@ -79,39 +32,18 @@ enum LocalModelResolver {
}
/// Delete the local cache for a model so it will be re-downloaded next time.
/// Removes from all cache locations.
/// Returns true if something was deleted.
@discardableResult
static func deleteLocal(repoId: String) -> Bool {
let dirName = "models--" + repoId.replacingOccurrences(of: "/", with: "--")
var deleted = false
for cacheBase in cacheBases {
let modelDir = cacheBase.appendingPathComponent(dirName, isDirectory: true)
guard FileManager.default.fileExists(atPath: modelDir.path) else { continue }
do {
try FileManager.default.removeItem(at: modelDir)
print("[LocalModelResolver] Deleted \(modelDir.path)")
deleted = true
} catch {
print("[LocalModelResolver] Failed to delete \(modelDir.path): \(error)")
}
guard let base = modelsBase else { return false }
let modelDir = base.appendingPathComponent(repoId, isDirectory: true)
guard FileManager.default.fileExists(atPath: modelDir.path) else { return false }
do {
try FileManager.default.removeItem(at: modelDir)
print("[LocalModelResolver] Deleted \(modelDir.path)")
return true
} catch {
print("[LocalModelResolver] Failed to delete \(modelDir.path): \(error)")
return false
}
// Also clean up the per-model cache in the container (used by swift-transformers)
let containerModelsDir = FileManager.default.homeDirectoryForCurrentUser
.appendingPathComponent("Library/Caches/models", isDirectory: true)
.appendingPathComponent(repoId, isDirectory: true)
if FileManager.default.fileExists(atPath: containerModelsDir.path) {
do {
try FileManager.default.removeItem(at: containerModelsDir)
print("[LocalModelResolver] Deleted \(containerModelsDir.path)")
deleted = true
} catch {
print("[LocalModelResolver] Failed to delete \(containerModelsDir.path): \(error)")
}
}
return deleted
}
}

View File

@@ -12,7 +12,12 @@ final class ModelManager {
/// HubApi with blob cache disabled to avoid storing every model twice.
/// swift-huggingface defaults to caching in both huggingface/hub/ (snapshots)
/// AND models/ (content-addressed blobs). We only need the snapshots.
private static let hub = HubApi(cache: nil)
/// Must use the same downloadBase as defaultHubApi (.cachesDirectory) so
/// LocalModelResolver can find downloaded models.
private static let hub: HubApi = {
let cachesDir = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first
return HubApi(downloadBase: cachesDir, cache: nil)
}()
var currentModel: ModelConfig?
var modelContainer: ModelContainer?
var isLoading = false
@@ -52,7 +57,6 @@ final class ModelManager {
}
do {
let container: ModelContainer
let progressHandler: @Sendable (Progress) -> Void = { progress in
Task { @MainActor in
self.downloadProgress = progress.fractionCompleted
@@ -73,7 +77,7 @@ final class ModelManager {
configuration = config.modelConfiguration
}
container = try await VLMModelFactory.shared.loadContainer(
let container = try await VLMModelFactory.shared.loadContainer(
hub: Self.hub,
configuration: configuration,
progressHandler: progressHandler

View File

@@ -1,6 +1,6 @@
# MLX Server
Native macOS app for running local LLMs on Apple Silicon via [MLX](https://github.com/ml-explore/mlx). Built with SwiftUI, it provides both a **chat UI** and an embedded **OpenAI-compatible API server**. Supports vision and tool use with automatic model swapping.
Native macOS app for running local LLMs on Apple Silicon via [MLX](https://github.com/ml-explore/mlx). Built with SwiftUI, it provides both a **chat UI** and an embedded **OpenAI-compatible API server**. Supports vision, tool use, and thinking mode.
## Supported Models
@@ -8,6 +8,9 @@ Native macOS app for running local LLMs on Apple Silicon via [MLX](https://githu
|-------|-------|---------|-------------|
| `gemma` | `mlx-community/gemma-3-4b-it-4bit` | 128k | Vision, tool use (`tool_code` blocks) |
| `qwen` | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | 256k | Vision, tool use (`<tool_call>` tags) |
| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | 256k | Thinking mode, tool use |
Any model in MLX format on HuggingFace can be added — there is no restriction on uploader or architecture.
## Quick Start
@@ -20,12 +23,16 @@ open "build/Debug/MLX Server.app"
## App Features
- **Chat interface** with markdown rendering, image attachments (file picker, drag & drop, clipboard paste)
- **Model picker** in toolbar with local/download status indicators
- **Chat interface** with markdown rendering, image attachments (file picker, drag & drop, clipboard paste, Finder copy-paste)
- **Model picker** in toolbar with local/download status indicators and re-download button
- **Download progress modal** — shows file progress, percentage, and speed when downloading a new model
- **Thinking mode** — models like Qwen3.5 can reason internally before responding; thinking content appears in a collapsible box. Toggle on/off in Settings.
- **Streaming responses** with live token display
- **Export chat** — File > Export Chat (Cmd+Shift+S) saves conversations as Markdown or RTF (Pages-compatible)
- **Status bar** showing model name, context window, tokens/sec, token counts, GPU memory, API server status
- **Keyboard shortcuts**: `Cmd+N` (new chat), `Cmd+Return` (send), `Escape` (stop), `Cmd+1/2/3` (switch models)
- **Settings** (`Cmd+,`): system prompt, API port, API auto-start
- **Keyboard shortcuts**: `Cmd+N` (new chat), `Cmd+Return` (send), `Escape` (stop), `Cmd+1/2/3/4` (switch models), `Cmd+Shift+S` (export)
- **Settings** (`Cmd+,`): default model, thinking mode toggle, system prompt, API port, API auto-start, idle unload timeout
- **Idle auto-unload** — model is unloaded after configurable idle time (resets on both user input and model output), reloaded on next request
## API Server
@@ -74,23 +81,29 @@ MLXServer/
├── ContentView.swift — Main layout, toolbar, keyboard shortcuts
├── Models/
│ ├── ModelConfig.swift — Model definitions, alias/repoId resolution
│ └── ChatMessage.swift — Chat message data model
│ └── ChatMessage.swift — Chat message data model, thinking tag parser
├── ViewModels/
│ ├── ModelManager.swift — Model loading/switching via VLMModelFactory
│ ├── ModelManager.swift — Model loading/switching, download tracking, idle unload
│ └── ChatViewModel.swift — Chat state, ChatSession, API server lifecycle
├── Views/
│ ├── ModelPickerView.swift — Toolbar model selector
│ ├── ChatMessagesView.swift — Scrollable message list with markdown
│ ├── ChatInputView.swift — Text input + image attach
│ ├── ModelPickerView.swift — Toolbar model selector with re-download
│ ├── ChatMessagesView.swift — Scrollable message list with markdown + thinking blocks
│ ├── ChatInputView.swift — Text input + image attach (paste, drag, picker)
│ ├── DownloadModalView.swift — Model download progress overlay
│ ├── StatusBarView.swift — Model info, tok/s, GPU memory, API status
── SettingsView.swift — System prompt + API settings
── MonitorView.swift Inference statistics monitor
│ └── SettingsView.swift — System prompt, thinking mode, API, idle settings
├── Commands/
│ └── SaveChatCommands.swift — File menu export command
├── Server/
│ ├── APIServer.swift — NWListener HTTP server, SSE streaming, KV cache reuse
│ ├── APIModels.swift — OpenAI-compatible Codable structs
│ ├── ToolCallParser.swift — Parses tool calls from model output
│ └── ToolPromptBuilder.swift — Model-specific tool prompt formatting
└── Utilities/
├── LocalModelResolver.swift — Offline-first HuggingFace cache resolution
├── LocalModelResolver.swift — Offline-first HuggingFace cache resolution (sandbox + system)
├── ChatExporter.swift — Export conversations to Markdown or RTF
├── FocusedValues.swift — FocusedValue keys for menu bar integration
└── Preferences.swift — UserDefaults wrapper
project.yml — xcodegen project spec (dependencies, settings, deployment target)
@@ -99,17 +112,11 @@ build.sh — One-command build script (xcodegen + xcodebuild)
## Key Design Decisions
- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) for inference — supports both text and vision in a single model load
- **Offline-first**: `LocalModelResolver` checks `~/.cache/huggingface/hub/` for locally-cached snapshots before downloading
- Uses `mlx-swift-lm` (`MLXVLM` / `VLMModelFactory`) for inference — loads any MLX-format model from HuggingFace
- **Offline-first**: `LocalModelResolver` checks both the sandboxed app container and `~/.cache/huggingface/hub/` for locally-cached models before downloading
- **No duplicate storage**: custom `HubApi` with blob cache disabled — models are stored once in the snapshot cache
- **KV cache reuse** across API requests — reuses `ChatSession` when conversation history prefix matches
- **Thinking mode**: `enable_thinking` passed via Jinja template context; `<think>` tags parsed in real-time during streaming
- HTTP server built on `Network.framework` (`NWListener`) — no third-party server dependencies
- Model-specific prompt formatting: Gemma uses `tool_code` blocks, Qwen uses `<tool_call>` XML tags
- GPU cache limit set to 20 MB; cache cleared on model unload
## Design Notes
- Uses `mlx_vlm` (not `mlx_lm`) as the backend — supports both text and vision in a single model load
- Offline-first: if the model is cached locally (`~/.cache/huggingface/hub/`), no network requests are made
- Thread lock on generation — MLX models aren't safe for concurrent generation
- KV prefix caching for multi-turn conversations
- Context window read from each model's config (Gemma 3 4B: 128k, Qwen3-VL 4B: 256k) with automatic summarization fallback

View File

@@ -22,7 +22,7 @@ targets:
- MLXServer
settings:
base:
PRODUCT_BUNDLE_IDENTIFIER: com.mlxserver.app
PRODUCT_BUNDLE_IDENTIFIER: de.rfc1437.mlxserver
PRODUCT_NAME: MLX Server
MARKETING_VERSION: "1.0.0"
CURRENT_PROJECT_VERSION: "1"