rw-book-cover

Metadata

Highlights

  • Qwen2.5 Omni: See, Hear, Talk, Write, Do It All! I’m not sure how I missed this one at the time, but last month (March 27th) Qwen released their first multi-modal model that can handle audio and video in addition to text and images - and that has audio output as a core model feature. (View Highlight)
  • As far as I can tell nobody has an easy path to getting it working on a Mac yet (the closest report I saw was this comment on Hugging Face). This release is notable because, while there’s a pretty solid collection of open weight vision LLMs now, multi-modal models that go beyond that are still very rare. Like most of Qwen’s recent models, Qwen2.5 Omni is released under an Apache 2.0 license. (View Highlight)