Local AI on Apple Silicon
DevPulse is the pre-flight check for local inference on Apple Silicon. Reclaim VRAM before loading a 70B, branch on exit codes from a one-liner, and stream pressure to long-running agents — so Ollama, llama.cpp, LM Studio, and MLX get the headroom they need.
Same binary also lives in your menu bar — see the local-AI playbook or jump straight to the CLI below.
Not local-maximalist. Open models still trail frontier by ~8 months once you adjust for tokens, evals, and distillation — the case for hybrid local + cloud →
Everything in your menu bar
Groups by project, not PID. Chrome's 59 helpers become one line. Node processes get attributed to the project that spawned them.
Runs every 5 minutes. Kills zombies, flags idle servers, warns about Chrome leaks. You don't think about it.
Chrome gets a full breakdown: tabs, extensions, MB/tab. Docker shows VM reservation vs actual container usage.
Monitors swap pressure over time. Warns before your Mac starts thrashing to disk.
Always visible. Click to expand. No separate window. Uses <30 MB itself.
Memory trends, top offenders, and optimization history. Delivered to your inbox or notification center.
No analytics. No network calls. No account required. Your data stays on your Mac.
NEW · v1.2.0Now scriptable
DevPulse ships a devpulse CLI from the same binary. Same intelligence the menu bar shows, exposed as JSON and exit codes — so Ollama, llama.cpp, Claude Code, Cursor, and your own scripts can ask the obvious question before loading a 70B model: will this fit?
--before-load returns exit codes 0/1/2/3— fits / won't fit / unload-first / tight. Branch in shell, no parsing.
devpulse watch --json emits one snapshot per tick. Pipe it into a long-running agent loop and react to VRAM pressure in real time.
Same process as the menu bar app. No extra permissions, no network calls, no telemetry. Your model load decisions never leave your Mac.
NEW · v1.7.0Hybrid routing, made visible
Wrappers can swap Claude Code's backend to DeepSeek, OpenRouter, or Fireworks by setting ANTHROPIC_BASE_URL. Cheaper — but easy to forget which one is live. DevPulse reads your shell config and surfaces the active backend right next to your local-capacity verdict.
Anthropic, DeepSeek, OpenRouter, Fireworks, or a custom URL — detected from ~/.claude/settings.json and your shell rc files. No daemon, no shell hooks.
We tell you where the override lives —~/.zshrc, ~/.zshenv, or Claude's own settings — so you can fix it in one click of the editor instead of hunting.
DevPulse reads only the base URL — never the API key. Routing posture stays a local signal: nothing leaves your Mac, nothing gets logged, nothing gets uploaded.
Local AI Models
Based on your actual RAM usage and recoverable waste, DevPulse tells you which AI models your Mac can handle.
| Model | Quant | RAM | Status |
|---|---|---|---|
| Llama 3.1 8B → | Q8_0 | 9.5 GB | Runs great |
| Qwen 3.5 9B → | Q4_K_M | 9.5 GB | Runs great |
| DeepSeek R1 32B → | Q4_K_M | 20 GB | Runs OK |
| Llama 3.3 70B → | Q4_K_M | 42 GB | After cleanup |
| DeepSeek R1 671B → | Q4_K_M | 350 GB | Too heavy |
See all 20+ models → Results update as your memory usage changes.
VRAM estimates sourced from CanIRun.ai — model data from llama.cpp, Ollama, and LM Studio.
Install DevPulse, wire devpulse ai --before-load into your model loader, and stop guessing whether the next 70B fits.