Pelayo Arbués

Recent Notes

  • Power and Prediction

    Apr 30, 2025

  • Why Software Engineers Should Learn a Bit of Data Science

    Apr 01, 2025

  • A recommender beast

    Feb 05, 2025

See 90 more →

Home

❯

Literature Notes

❯

Articles

❯

7 Ways to Speed Up Inference of Your Hosted LLMs

7 Ways to Speed Up Inference of Your Hosted LLMs

Apr 16, 20251 min read

  • articles
  • literature-note

rw-book-cover

Metadata

  • Author: Sergei Savvov
  • Full Title: 7 Ways to Speed Up Inference of Your Hosted LLMs
  • URL: https://slgero.medium.com/speed-up-llm-inference-83653aa24c47

Highlights

  • tldr; techniques to speed up inference of LLMs to increase token generation speed and reduce memory consumption: Mixed-Precision, Bfloat16, Quantization, Fine-tuning with Adapters, Pruning, Continuous Batching and Multiple GPUs. (View Highlight)

Graph View

  • Metadata
  • Highlights

Now Reading

  • ColPali: Efficient Document Retrieval With Vision Language Models 👀

    May 19, 2025

See 1353 more →

Created with Quartz, © 2025

  • Bluesky
  • Linkedin
  • Mastodon
  • Twitter
  • Unsplash
  • GitHub
  • RSS