Where you actually are.

Before reclaiming, see what you have. devpulse status shows unified memory used/free, current GPU allocation, swap, battery, and zombies in one shot.

$ devpulse status
memory  41.3 / 64.0 GB  (65%)  [healthy]
swap    4.1 GB
gpu     14.2 / 48.0 GB
battery 67%  [battery]  3h 47m to empty

⚠  6 zombie procs using 812 MB — run: devpulse zombies --kill
⚠  3 idle dev servers using 1.4 GB — projects: api-gateway, marketing-site

# JSON form for scripts:
$ devpulse status --json | jq '.gpu.allocatedMB, .memory.usedGB'

Six steps from cheapest to deepest.

1 · Unload idle Ollama models

Ollama holds models warm for 5 minutes after last use. Often 4–8 GB of free VRAM is sitting in a model you queried earlier and forgot about.devpulse ai shows you which.

2 · Kill zombies and stale dev servers

TypeScript LSPs, ESLint daemons, file watchers, and dev servers from projects you closed yesterday. Often hundreds of MB, occasionally GBs. devpulse zombies --kill.

3 · Quit memory-heavy apps

Chrome (15–25 GB on most dev Macs) and Docker (3–6 GB idle) are the heaviest. If you don't need them right now, quit them. Memory Saver in Chrome partially helps; closing tabs helps more.

4 · Close idle Electron apps

Slack, Discord, Notion, Cursor — each carries a Chromium runtime worth 1–4 GB. Cumulatively a real win.

5 · Run --auto-clean

devpulse ai --before-load <MB> --auto-cleandoes steps 1–2 automatically, then re-evaluates. Won't touch your foreground apps. Returns exit code 0 when the model fits.

6 · Raise the ceiling (last resort)

sudo sysctl iogpu.wired_limit_mb=57000raises Apple Silicon's GPU cap. Risky above ~85% of total RAM; resets on reboot. Reclaiming existing usage is almost always safer.

Or just do this.

All the safe parts of the protocol, executed and reported, in one call. Pass the size of the model you're about to load and DevPulse tells you whether it fits — and what to do if not.

# Llama 3.3 70B (Q4_K_M ≈ 42 GB)
$ devpulse ai --before-load 42000 --auto-clean
before: Won't fit — 8.2 GB short
  - unloaded idle ollama model: qwen2.5:7b (4.2 GB)
  - killed 6 zombie procs (812 MB reclaimed)
after:  Fits comfortably — 4.4 GB headroom

# Use the exit code in scripts
$ devpulse ai --before-load 42000 --auto-clean && ollama run llama3.3:70b
$ # exit 0 = safe to load; 1 = won't fit; 2 = fits after unload; 3 = tight

This works for every local-AI runtime.

Apple Silicon's unified memory ceiling is a property of the OS, not the inference framework. Whatever you're launching, the headroom you reclaim with DevPulse is available to it.

Ollama

The most common case. devpulse ai integrates directly: lists loaded models, surfaces idle ones, can unload via --auto-clean.

llama.cpp

No daemon — model loads on demand, unloads on exit. Pre-flight before./main -m model.gguf still applies; the same DevPulse check returns the same headroom.

LM Studio

Holds models more deliberately than Ollama. Use LM Studio's “Eject” to unload, then run --before-load with the new model's size.

MLX

Apple's own ML framework. Same unified-memory rules; same DevPulse pre-flight. MLX models tend to be slightly smaller in RAM than the same quant in GGUF.

vLLM

Less common on Mac (CUDA-first), but BYOK setups via Factory's Droid or similar use it. Same memory accounting applies.

Raw transformers

If you're loading a HuggingFace model directly via PyTorch with MPS, the OS-level ceiling still rules. Pre-flight first.

FAQ.

How much unified memory can the GPU actually use?

Default ceiling is ~75% of total RAM (Metal'srecommendedMaxWorkingSetSize). On 64 GB → ~48 GB usable. Raise via sudo sysctl iogpu.wired_limit_mb=<MB>; going past ~85% destabilizes the system.

Is sudo purge useful?

Marginally. Flushes file caches and inactive pages — not wired/GPU memory. Buys hundreds of MB at most. Killing Chrome reclaims 100x more.

Can I do this without quitting apps?

Partially — zombies and idle models are reclaimable without touching foreground apps. But the heaviest source (Chrome) needs Memory Saver or fewer tabs.

Safe way in a script?

devpulse ai --before-load <MB> --auto-clean. Only reversible cleanup; stable exit codes (0/1/2/3); won't touch foreground apps or system settings.

Stop guessing whether the model will fit.

DevPulse runs the math, reclaims the safe parts, and tells you in one command.

Download for macOS

macOS 14+ · Apple Silicon & Intel · Free during launch