Apple Silicon

Apple Silicon Macs (M1, M2, M3, M4, M5) are probably the best platform for this tool. Everything accelerates: video encoding, face detection, and even local LLM inference. Unified memory means no copying data between CPU and GPU.

What you get

VideoToolbox encoding: 5-10x faster than software libx264. Uses the dedicated media engine on the chip.
Vision Framework face detection: runs on the Neural Engine, ~10x faster than OpenCV CPU. More accurate too, especially with small or partially occluded faces.
Unified memory: no CPU/GPU transfer overhead. Frames stay in the same memory pool whether the CPU, GPU, or Neural Engine is working on them.
mlx-vlm for local LLMs: run vision models locally with Metal acceleration for LLM content analysis. No API costs, no data leaving your machine.

Installation

uv sync --extra mac

The mac extra installs the Apple-specific dependencies (pyobjc bindings for Vision Framework, etc.).

Configuration

hardware:
  enabled: true
  backend: "apple"   # or "auto" (auto detects it)

auto works perfectly on macOS. It'll find VideoToolbox and Vision Framework without any manual config.

Supported chips

All Apple Silicon chips are supported:

M1, M1 Pro, M1 Max, M1 Ultra
M2, M2 Pro, M2 Max, M2 Ultra
M3, M3 Pro, M3 Max, M3 Ultra
M4, M4 Pro, M4 Max, M4 Ultra
M5 and newer

The media engine and Neural Engine get faster with each generation, but even a base M1 runs the full pipeline comfortably.

What you get​

Installation​

Configuration​

Supported chips​

What you get

Installation

Configuration

Supported chips