Skip to main content

Mac + Local LLM Setup

For Mac users who want the full experience: local LLM for smart clip scoring, Apple Silicon hardware acceleration, and native install without Docker.

Who this is for

You have a Mac with Apple Silicon (M1/M2/M3/M4). You want LLM-powered content analysis running entirely on your machine, no cloud APIs. You're comfortable with the terminal.

Architecture

┌───────────────────────────────────────────────────┐
│ Mac (Apple Silicon) │
│ │
│ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ mlx-vlm │ │ Immich Memories │ │
│ │ (Qwen2.5-VL)│←─│ (native Python) │ │
│ │ port 8081 │ │ VideoToolbox encoding │ │
│ │ │ │ Vision face detection │ │
│ └──────────────┘ └──────────────────────────┘ │
│ │ │
│ ┌────────┴─────────┐ │
│ │ Immich server │ │
│ │ (local or remote)│ │
│ └──────────────────┘ │
└───────────────────────────────────────────────────┘

Mac setup diagram

Install

# Install Immich Memories
uv tool install immich-memories

# Start the UI
immich-memories ui

Open http://localhost:8080.

Set up mlx-vlm for LLM analysis

mlx-vlm runs vision-language models natively on Apple Silicon via MLX. Qwen2.5-VL is the recommended model: fast, accurate, handles video frames well.

# Install mlx-vlm
pip install mlx-vlm

# Start the server (downloads the model on first run, ~4 GB)
mlx_vlm.server --model mlx-community/Qwen2.5-VL-7B-Instruct-8bit --port 8081

Then configure Immich Memories to use it. Add to ~/.immich-memories/config.yaml:

advanced:
llm:
provider: openai-compatible
base_url: http://localhost:8081/v1
model: mlx-community/Qwen2.5-VL-7B-Instruct-8bit
content_analysis:
enabled: true

Or set via environment variables:

export IMMICH_MEMORIES_LLM__BASE_URL=http://localhost:8081/v1
export IMMICH_MEMORIES_LLM__MODEL=mlx-community/Qwen2.5-VL-7B-Instruct-8bit

What works

  • LLM content analysis: Qwen2.5-VL analyzes video frames and scores clips based on content (birthday cakes, sunsets, kids playing). Adds a content score weighted at 35% in the overall clip ranking.
  • VideoToolbox encoding: hardware-accelerated H.264/H.265 encoding via Apple's VideoToolbox. 5-10x faster than CPU encoding.
  • Vision framework face detection: uses macOS native Vision framework for face detection. More accurate than the CPU fallback, no additional model downloads needed.
  • Taichi GPU title renderer: particle effects, animated globes, and gradient backgrounds rendered on Apple GPU.
  • All memory types and features: everything works natively on Mac.

What doesn't work locally

  • AI music generation: MusicGen needs a GPU server (NVIDIA or a hosted API). ACE-Step can run locally on Mac but requires ~8 GB RAM and takes 5-10 minutes per track. If you have a remote MusicGen/ACE-Step server, configure it in the musicgen or ace_step config sections.

Performance expectations

On an M2 Pro (12-core, 32 GB):

ClipsResolutionLLM analysisTotal time
151080p~3 min~5 min
301080p~5 min~8 min
304K~5 min~14 min
501080p~8 min~12 min

LLM analysis is the slowest phase. The 7B model analyzes 2 frames per clip at ~3 seconds per frame. After the first run, analysis results are cached: subsequent runs for the same clips skip analysis entirely.

Memory usage: ~2 GB for Immich Memories, ~5 GB for mlx-vlm with the 8-bit model. Keep at least 16 GB total RAM.

Tips

  • Start mlx-vlm before Immich Memories. If the LLM server isn't running, content analysis silently falls back to metadata-only scoring. You'll still get results, just without the LLM content understanding.
  • The 8-bit quant is the sweet spot. The 4-bit version is faster but less accurate. The full 16-bit version needs 16 GB+ of unified memory just for the model.
  • Ollama works too. If you prefer Ollama: ollama run qwen2.5-vl, then set provider: ollama and base_url: http://localhost:11434 in config.
  • Use --analysis-depth thorough on the CLI to force LLM analysis even for clips that already have metadata scores. The default fast mode only uses the LLM to fill gaps.