Full Title: π₯ Google Unveils 27B Gemma 3: Quantized for Consumer GPUs
Highlights
Google released Quantization-Aware Trained (QAT) versions of its Gemma 3 models, including 27B. These models maintain performance while reducing memory needs to run on consumer GPUs. The 27B QAT model loads in 14.1 GB of VRAM. (View Highlight)
Memory Reductions with int4 Quantization
QAT lowers memory use for model weights across all Gemma 3 sizes.
β’ 27B: from 54 GB (BF16) to 14.1 GB (int4) for model weights
β’ 12B: from 24 GB to 6.6 GB
β’ 4B: from 8 GB to 2.6 GB
β’ 1B: from 2 GB to 0.5 GB (View Highlight)