💜Qwen3.5 - How to Run Locally Guide

rw-book-cover

Metadata

Qwen3.5 is Alibaba’s new model family, including Qwen3.5-35B-A3B, 27B, 122B-A10B and 397B-A17B. The multimodal hybrid reasoning LLMs deliver the strongest performances for their sizes. They support 256K context across 201 languages, have thinking and **non-**thinking modes, and excel in agentic coding, vision, chat, and long-context tasks. The 35B and 27B models work on a 21GB Mac / RAM device. See all GGUFs here. (View Highlight)
For best performance, make sure your total available memory (VRAM + system RAM) exceeds the size of the quantized model file you’re downloading. If it doesn’t, llama.cpp can still run via SSD/HDD offloading, but inference will be slower. (View Highlight)
Between 27B and 35B-A3B, use 27B if you want slightly more accurate results and can’t fit in your device. Go for 35B-A3B if you want much faster inference. (View Highlight)
presence_penalty = 0.0 to 2.0 default this is off, but to reduce repetitions, you can use this, however using a higher value may result in slight decrease in performance (View Highlight)