🔥 Google Unveils 27B Gemma 3: Quantized for Consumer GPUs

rw-book-cover

Metadata

Google released Quantization-Aware Trained (QAT) versions of its Gemma 3 models, including 27B. These models maintain performance while reducing memory needs to run on consumer GPUs. The 27B QAT model loads in 14.1 GB of VRAM. (View Highlight)
Memory Reductions with int4 Quantization QAT lowers memory use for model weights across all Gemma 3 sizes. • 27B: from 54 GB (BF16) to 14.1 GB (int4) for model weights • 12B: from 24 GB to 6.6 GB • 4B: from 8 GB to 2.6 GB • 1B: from 2 GB to 0.5 GB (View Highlight)