initial commit

2026-03-17 09:14:27 +01:00
commit df81afe8d7
10 changed files with 1389 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,38 @@
+# MLX Server
+
+OpenAI-compatible API server for Gemma 3 4B (vision + tool use) on Apple Silicon via MLX.
+
+## Quick Start
+
+```bash
+# Activate virtual environment
+source .venv/bin/activate
+
+# Run the server (downloads model on first run)
+./run.sh
+
+# Or directly:
+python -m mlx_server.main --model mlx-community/gemma-3-4b-it-4bit --port 1234
+```
+
+## Project Structure
+
+- `mlx_server/main.py` — FastAPI server, endpoints, CLI entrypoint
+- `mlx_server/engine.py` — Model loading, prompt building, generation (mlx_vlm)
+- `mlx_server/models.py` — Pydantic models for OpenAI API request/response types
+
+## Key Design Decisions
+
+- Uses `mlx_vlm` (not `mlx_lm`) as the inference backend — this supports both text and vision in a single model load
+- Gemma 3 has no system role — system messages are converted to user/assistant pairs
+- Tool use is prompt-engineered: tools are injected into the system prompt with `<tool_call>` XML tags, and parsed from model output
+- Thread lock on generation (single-request-at-a-time) — MLX models aren't safe for concurrent generation
+- 128k context window supported via the model's native capabilities
+
+## Dependencies
+
+Managed via `uv` and `pyproject.toml`. Virtual environment in `.venv/`.
+
+```bash
+uv pip install -e "."
+```