Recommended models

Three tiers based on your available RAM and quality needs.

Tips for rag & knowledge base

1 Embed documents with a small embedding model (e.g., nomic-embed-text) — this is separate from the generation model

2 Chunk documents into 500–1000 token pieces for retrieval, then pass top-k chunks to the model

3 Use DevPulse to monitor RAM during indexing — embedding large document sets can spike memory temporarily

4 Start with a 9B model and only move up if answer quality isn't sufficient

Related Pages

Find the right model for your Mac

DevPulse monitors your actual RAM usage and tells you exactly which models will run alongside your dev tools.

Download for macOS

macOS 14+ · Apple Silicon & Intel · Free during launch