One command. Pre-flight + auto-clean.

DevPulse's CLI knows what's safe to reclaim — idle Ollama models, orphaned LSPs, stale dev servers — and unloads them before retrying the load.

# install once
brew install --cask devpulse        # (coming soon — for now: download the DMG)

# diagnose + fix in one shot
$ devpulse ai --before-load 42000 --auto-clean
before: Won't fit — 8.2 GB short
  - unloaded idle ollama model: qwen2.5:7b (4.2 GB)
  - killed 6 zombie procs (812 MB reclaimed)
after:  Fits comfortably — 4.4 GB headroom

$ ollama run llama3.3:70b           # now succeeds

Exit code semantics for use in scripts: 0 fits · 1won't fit · 2 fits after unload · 3 tight. Branch in shell, no parsing required.

What's eating your unified memory.

Across hundreds of developer Macs, the offenders are stubbornly consistent. Run devpulse processes -n 8and you'll almost certainly see this list.

Chrome (15–25 GB)

Each tab is a separate process. 50 tabs = 50 renderer procs. Chrome Shame Score →

Docker (3–6 GB idle)

Docker Desktop's VM hoards memory whether containers are running or not. Quit it before loading a 70B if you don't need containers right now.

Stale Ollama models

Ollama keeps recently-used models in memory for 5 minutes by default. If you tested qwen:7bearlier, it's still resident. --auto-clean unloads them.

Orphaned LSPs & watchers

TypeScript language servers, ESLint daemons, and file watchers from projects you closed days ago. devpulse zombies --kill.

Electron apps (1–4 GB ea)

Slack, Discord, Notion, Cursor — each ships its own Chromium runtime. Quit the ones you're not actively using.

Idle dev servers

Next.js, Vite, Webpack, nodemon — all happy to sit at 500 MB each forever. DevPulse flags these by project so you know which to kill.

Babysit your model so it doesn't crash mid-task.

If you're running an agent loop or processing a queue, devpulse babysit watches free memory + battery + swap and auto-cleans when pressure builds. Built for the 11-hour-flight-with-a-70B workflow.

$ devpulse babysit --target-free-mb 8192 --json > babysit.log &
$ ollama run llama3.3:70b < my-queue.txt

# tail the log to see auto-cleans triggered by memory pressure:
$ tail -f babysit.log
{"event":"tick","tickNum":47,"availableForAIMB":7200,"pressure":"free<8192MB",...}
{"event":"cleanup","reasons":"free<8192MB","reclaimedMB":5400,...}

More on the local-AI CLI →

Ollama OOM on Mac — FAQ.

Why OOM on a Mac with 64 GB RAM?

Unified memory is shared CPU/GPU and capped (~75% of total) for GPU allocation. On 64 GB you get ~48 GB usable. Chrome alone routinely claims 20+ GB. The OOM is your stack, not the model.

How do I raise the GPU memory ceiling?

sudo sysctl iogpu.wired_limit_mb=<MB> on Apple Silicon. Resets on reboot. For 64 GB Macs, ~57000 is reasonably safe; more risks instability. Freeing existing usage is usually the safer fix.

How much RAM for Llama 3.3 70B?

~42 GB at Q4_K_M. Full 70B compatibility table →

Does killing Chrome actually help?

Often dramatically. 50+ tabs = 15–25 GB on most dev Macs. Closing Chrome (or just most tabs) is usually the biggest single win.

Stop letting your stack OOM your local AI.

DevPulse is free, native, and uses less RAM than this webpage.

Download for macOS

macOS 14+ · Apple Silicon & Intel · Free during launch