27B
Parameters
128K
Context
16.5 GB
RAM (Q4_K_M)

RAM by quantization

Lower quantization = less RAM but lower quality. Q4_K_M is the recommended sweet spot for most users.

FormatBitsRAMQualityVerdict
Q3_K_M313.5 GBModerateRuns OK
Q4_K_MREC416.5 GBGoodRuns OK
Q5_K_M519.5 GBGoodRuns OK
Q6_K622.5 GBExcellentAfter cleanup
Q8_0829.0 GBExcellentAfter cleanup
F161655.8 GBLosslessNeeds high RAM

Which Mac can run Gemma 3 27B?

Based on the recommended Q4_K_M quantization. You need RAM for both the model and your running apps — DevPulse calculates this for you. No CUDA installation. No driver hell. Just Apple Silicon doing what Jensen charges $30K for.

8 GB
Can’t run
16 GB
Can’t run
24 GB
Close apps first
~8 GB for apps
32 GB
Runs well
~16 GB for apps
36 GB
Runs well
~20 GB for apps
48 GB
Runs great
~32 GB for apps
64 GB
Runs great
~48 GB for apps
96 GB
Runs great
~80 GB for apps
128 GB
Runs great
~112 GB for apps
192 GB
Runs great
~176 GB for apps

Tips for running Gemma 3 27B

1 Q4_K_M at 16.5 GB is ideal for 32 GB Macs — leaves room for your dev stack

2 On 24 GB Macs, use Q3_K_M and close Docker/Chrome before loading

3 Best open model for vision tasks at this size — analyze UIs, docs, charts

4 Use DevPulse to monitor memory pressure while the model is loaded

How fast will Gemma 3 27B run on each chip?

Apple Silicon inference is bandwidth-bound — every generated token streams the model's active weights through unified memory once. Estimates are for single-batch generation at Q4_K_M (16.5 GB) at ~70% of peak bandwidth (typical llama.cpp / Ollama efficiency). Speculative decoding can lift these another 30-60%.

ChipBandwidthSmallest RAM that fitstok/s (est.)
M168 GB/swon't fit
M2100 GB/s24 GB~4 tok/s
M3100 GB/s24 GB~4 tok/s
M4120 GB/s24 GB~5 tok/s
M2 Pro200 GB/s32 GB~8 tok/s
M3 Pro150 GB/s36 GB~6 tok/s
M4 Pro273 GB/s24 GB~12 tok/s
M2 Max400 GB/s32 GB~17 tok/s
M3 Max400 GB/s36 GB~17 tok/s
M4 Max546 GB/s36 GB~23 tok/s
M2 Ultra800 GB/s64 GB~34 tok/s
M3 Ultra819 GB/s96 GB~35 tok/s

“Smallest RAM that fits” assumes ~40% headroom for context, OS, and your dev stack. Reclaim VRAM before loading →

Local-AI guides for Gemma 3 27B.

Knowing the model fits is half the problem. The other half is keeping your Mac's unified memory free enough to actually load it, and keeping the load alive across a long session.

Related Pages

Run Gemma 3 27B locally. No GPU required.

While cloud GPU prices keep climbing, your Mac can run Gemma 3 27B for free. DevPulse tells you if it fits alongside your dev tools — before you download 16.5 GB of model weights.

Download for macOS

macOS 14+ · Apple Silicon & Intel · Free during launch