chore: added more qwen 3.5 models

This commit is contained in:
2026-03-20 16:47:10 +01:00
parent 455eba7caf
commit 7d25955042
4 changed files with 20 additions and 9 deletions

View File

@@ -52,8 +52,9 @@ open "build/Debug/MLX Server.app"
| Alias | HuggingFace ID | Notes | | Alias | HuggingFace ID | Notes |
|-------|---------------|-------| |-------|---------------|-------|
| `gemma` | `mlx-community/gemma-3-4b-it-4bit` | Vision + tool use via `tool_code` blocks (128k context) | | `gemma` | `mlx-community/gemma-3-4b-it-4bit` | Vision + tool use via `tool_code` blocks (128k context) |
| `qwen` | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | Vision + tool use via `<tool_call>` tags (256k context) | | `qwen` | `mlx-community/Qwen3.5-4B-MLX-4bit` | Vision + thinking mode + tool use via `<tool_call>` tags (256k context) |
| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | Thinking mode, tool use (256k context) | | `qwen3.5-0.8b` | `mlx-community/Qwen3.5-0.8B-4bit` | Vision + thinking mode + tool use via `<tool_call>` tags (256k context) |
| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | Vision + thinking mode + tool use via `<tool_call>` tags (256k context) |
Any model in MLX format on HuggingFace can be added — no restriction on uploader or architecture. Any model in MLX format on HuggingFace can be added — no restriction on uploader or architecture.

View File

@@ -29,8 +29,17 @@ struct ModelConfig: Identifiable, Hashable {
), ),
ModelConfig( ModelConfig(
id: "qwen", id: "qwen",
repoId: "mlx-community/Qwen3-VL-4B-Instruct-4bit", repoId: "mlx-community/Qwen3.5-4B-MLX-4bit",
displayName: "Qwen3 VL 4B", displayName: "Qwen3.5 4B",
contextLength: 256_000,
loaderKind: .vlm,
supportsImages: true,
supportsTools: true
),
ModelConfig(
id: "qwen3.5-0.8b",
repoId: "mlx-community/Qwen3.5-0.8B-4bit",
displayName: "Qwen3.5 0.8B",
contextLength: 256_000, contextLength: 256_000,
loaderKind: .vlm, loaderKind: .vlm,
supportsImages: true, supportsImages: true,

View File

@@ -114,7 +114,7 @@ final class PromptBuilderTests: XCTestCase {
n: nil n: nil
) )
let prepared = PromptBuilder.build(from: request, modelId: "mlx-community/Qwen3-VL-4B-Instruct-4bit", thinkingEnabled: true) let prepared = PromptBuilder.build(from: request, modelId: "mlx-community/Qwen3.5-4B-MLX-4bit", thinkingEnabled: true)
XCTAssertEqual(prepared.chatMessages.count, 1) XCTAssertEqual(prepared.chatMessages.count, 1)
XCTAssertTrue(prepared.chatMessages[0].content.contains("Let me check.")) XCTAssertTrue(prepared.chatMessages[0].content.contains("Let me check."))

View File

@@ -7,8 +7,9 @@ Native macOS app for running local LLMs on Apple Silicon via [MLX](https://githu
| Alias | Model | Context | Loader | Capabilities | | Alias | Model | Context | Loader | Capabilities |
|-------|-------|---------|--------|-------------| |-------|-------|---------|--------|-------------|
| `gemma` | `mlx-community/gemma-3-4b-it-4bit` | 128k | `VLMModelFactory` | Vision, tool use (`tool_code` blocks) | | `gemma` | `mlx-community/gemma-3-4b-it-4bit` | 128k | `VLMModelFactory` | Vision, tool use (`tool_code` blocks) |
| `qwen` | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | 256k | `VLMModelFactory` | Vision, tool use (`<tool_call>` tags) | | `qwen` | `mlx-community/Qwen3.5-4B-MLX-4bit` | 256k | `VLMModelFactory` | Vision, thinking mode, tool use (`<tool_call>` tags) |
| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | 256k | `LLMModelFactory` | Vision, thinking mode, tool use | | `qwen3.5-0.8b` | `mlx-community/Qwen3.5-0.8B-4bit` | 256k | `VLMModelFactory` | Vision, thinking mode, tool use (`<tool_call>` tags) |
| `qwen3.5-9b` | `mlx-community/Qwen3.5-9B-4bit` | 256k | `VLMModelFactory` | Vision, thinking mode, tool use (`<tool_call>` tags) |
| `stheno` | `synk/L3-8B-Stheno-v3.2-MLX` | 8k | `LLMModelFactory` | Text-only, llama-based | | `stheno` | `synk/L3-8B-Stheno-v3.2-MLX` | 8k | `LLMModelFactory` | Text-only, llama-based |
Any model in MLX format on HuggingFace can be added — there is no restriction on uploader or architecture. Any model in MLX format on HuggingFace can be added — there is no restriction on uploader or architecture.
@@ -33,7 +34,7 @@ open "build/Debug/MLX Server.app"
- **Native chat documents** — save chats as `.mlxchat` package documents, reopen them from File > Open Chat or by double-clicking them in Finder, and continue the conversation with restored model context, thinking blocks, and images - **Native chat documents** — save chats as `.mlxchat` package documents, reopen them from File > Open Chat or by double-clicking them in Finder, and continue the conversation with restored model context, thinking blocks, and images
- **Export chat** — File > Export Chat (Cmd+Shift+E) saves conversations as Markdown or RTF (Pages-compatible) - **Export chat** — File > Export Chat (Cmd+Shift+E) saves conversations as Markdown or RTF (Pages-compatible)
- **Status bar** showing model name, context window, tokens/sec, token counts, GPU memory, API server status - **Status bar** showing model name, context window, tokens/sec, token counts, GPU memory, API server status
- **Keyboard shortcuts**: `Cmd+N` (new chat), `Cmd+O` (open chat document), `Cmd+S` (save chat document), `Cmd+Shift+S` (save chat document as), `Cmd+Shift+E` (export), `Cmd+Return` (send), `Escape` (stop), `Cmd+1/2/3/4` (switch models) - **Keyboard shortcuts**: `Cmd+N` (new chat), `Cmd+O` (open chat document), `Cmd+S` (save chat document), `Cmd+Shift+S` (save chat document as), `Cmd+Shift+E` (export), `Cmd+Return` (send), `Escape` (stop), `Cmd+1/2/3/4/5` (switch models)
- **Scene management** — create and edit reusable roleplay/task presets from the New Chat flow or Settings - **Scene management** — create and edit reusable roleplay/task presets from the New Chat flow or Settings
- **Settings** (`Cmd+,`): default model, thinking mode toggle, base system prompt, scene management, API port, API auto-start, idle unload timeout - **Settings** (`Cmd+,`): default model, thinking mode toggle, base system prompt, scene management, API port, API auto-start, idle unload timeout
- **Idle auto-unload** — model is unloaded after configurable idle time (resets on both user input and model output), reloaded on next request - **Idle auto-unload** — model is unloaded after configurable idle time (resets on both user input and model output), reloaded on next request
@@ -75,7 +76,7 @@ Pass images as base64 data URIs in the `image_url` content part:
} }
``` ```
Text-only models such as `qwen3.5-9b` and `stheno` reject image inputs. Text-only models such as `stheno` reject image inputs.
### Tool Use ### Tool Use