Why does Ollama say out of memory on a Mac with 64 GB RAM?

Apple Silicon Macs use unified memory shared between CPU and GPU, with a hard ceiling on how much the GPU process can allocate (about 75% of total RAM by default). On 64 GB you get ~48 GB usable for models. Chrome, Docker, your IDE, and stale dev servers all eat into that ceiling before Ollama gets a slice. The OOM is almost always your stack, not the model.

How do I increase the Metal recommendedMaxWorkingSetSize on macOS?

On Apple Silicon, override with `sudo sysctl iogpu.wired_limit_mb= `. For 64 GB Macs, raising to ~57000 (57 GB) is reasonably safe; more risks system instability. Reset on reboot. This is a workaround — usually freeing existing usage is safer than raising the ceiling.

How much RAM do I need to run Llama 3.3 70B locally?

Around 42 GB at Q4_K_M quantization. On a 64 GB Mac that fits if you keep the rest of your stack lean. With Chrome at 22 GB and Docker at 4 GB you'll OOM. DevPulse's `devpulse ai --before-load 42000 --auto-clean` reclaims idle resources first.

Does killing Chrome help with Ollama OOM?

Yes, often dramatically. Chrome with 50+ tabs routinely consumes 15–25 GB on developer Macs. Closing it (or just the tabs you're not using) is the single biggest quick win for local AI memory pressure.

Ollama Out of Memory on Mac — fix the OOM, not the model

The fix · 30 seconds

One command. Pre-flight + auto-clean.

DevPulse's CLI knows what's safe to reclaim — idle Ollama models, orphaned LSPs, stale dev servers — and unloads them before retrying the load.

# install once
brew install --cask devpulse        # (coming soon — for now: download the DMG)

# diagnose + fix in one shot
$ devpulse ai --before-load 42000 --auto-clean
before: Won't fit — 8.2 GB short
  - unloaded idle ollama model: qwen2.5:7b (4.2 GB)
  - killed 6 zombie procs (812 MB reclaimed)
after:  Fits comfortably — 4.4 GB headroom

$ ollama run llama3.3:70b           # now succeeds

Exit code semantics for use in scripts: 0 fits · 1won't fit · 2 fits after unload · 3 tight. Branch in shell, no parsing required.

The actual culprits · in order

What's eating your unified memory.

Across hundreds of developer Macs, the offenders are stubbornly consistent. Run devpulse processes -n 8and you'll almost certainly see this list.

Chrome (15–25 GB)

Each tab is a separate process. 50 tabs = 50 renderer procs. Chrome Shame Score →

Docker (3–6 GB idle)

Docker Desktop's VM hoards memory whether containers are running or not. Quit it before loading a 70B if you don't need containers right now.

Stale Ollama models

Ollama keeps recently-used models in memory for 5 minutes by default. If you tested qwen:7bearlier, it's still resident. --auto-clean unloads them.

Orphaned LSPs & watchers

TypeScript language servers, ESLint daemons, and file watchers from projects you closed days ago. devpulse zombies --kill.

Electron apps (1–4 GB ea)

Slack, Discord, Notion, Cursor — each ships its own Chromium runtime. Quit the ones you're not actively using.

Idle dev servers

Next.js, Vite, Webpack, nodemon — all happy to sit at 500 MB each forever. DevPulse flags these by project so you know which to kill.

For long-running workloads

Babysit your model so it doesn't crash mid-task.

If you're running an agent loop or processing a queue, devpulse babysit watches free memory + battery + swap and auto-cleans when pressure builds. Built for the 11-hour-flight-with-a-70B workflow.

$ devpulse babysit --target-free-mb 8192 --json > babysit.log &
$ ollama run llama3.3:70b < my-queue.txt

# tail the log to see auto-cleans triggered by memory pressure:
$ tail -f babysit.log
{"event":"tick","tickNum":47,"availableForAIMB":7200,"pressure":"free<8192MB",...}
{"event":"cleanup","reasons":"free<8192MB","reclaimedMB":5400,...}

Ollama OOM on Mac — FAQ.

Why OOM on a Mac with 64 GB RAM?

Unified memory is shared CPU/GPU and capped (~75% of total) for GPU allocation. On 64 GB you get ~48 GB usable. Chrome alone routinely claims 20+ GB. The OOM is your stack, not the model.

How do I raise the GPU memory ceiling?

sudo sysctl iogpu.wired_limit_mb=<MB> on Apple Silicon. Resets on reboot. For 64 GB Macs, ~57000 is reasonably safe; more risks instability. Freeing existing usage is usually the safer fix.

How much RAM for Llama 3.3 70B?

~42 GB at Q4_K_M. Full 70B compatibility table →

Does killing Chrome actually help?

Often dramatically. 50+ tabs = 15–25 GB on most dev Macs. Closing Chrome (or just most tabs) is usually the biggest single win.

Ollama said “out of memory.”
Your Mac says it has plenty.

One command. Pre-flight + auto-clean.

What's eating your unified memory.

Babysit your model so it doesn't crash mid-task.

Ollama OOM on Mac — FAQ.

Why OOM on a Mac with 64 GB RAM?

How do I raise the GPU memory ceiling?

How much RAM for Llama 3.3 70B?

Does killing Chrome actually help?

Stop letting your stack OOM your local AI.

Ollama said “out of memory.”Your Mac says it has plenty.

One command. Pre-flight + auto-clean.

What's eating your unified memory.

Babysit your model so it doesn't crash mid-task.

Ollama OOM on Mac — FAQ.

Why OOM on a Mac with 64 GB RAM?

How do I raise the GPU memory ceiling?

How much RAM for Llama 3.3 70B?

Does killing Chrome actually help?

Stop letting your stack OOM your local AI.

Ollama said “out of memory.”
Your Mac says it has plenty.