Pelayo Arbués

Recent Notes

Tres Millones de viviendas
Oct 08, 2025
Context Engineering
Sep 23, 2025
Learning to Read the Maps AI Gives Us
Sep 02, 2025

See 94 more →

❯

Literature Notes

❯

❯

7 Ways to Speed Up Inference of Your Hosted LLMs

7 Ways to Speed Up Inference of Your Hosted LLMs

Apr 16, 20251 min read

articles
literature-note

rw-book-cover

Metadata

Author: Sergei Savvov
Full Title: 7 Ways to Speed Up Inference of Your Hosted LLMs
URL: https://slgero.medium.com/speed-up-llm-inference-83653aa24c47

Highlights

tldr; techniques to speed up inference of LLMs to increase token generation speed and reduce memory consumption: Mixed-Precision, Bfloat16, Quantization, Fine-tuning with Adapters, Pruning, Continuous Batching and Multiple GPUs. (View Highlight)

Graph View

Metadata
Highlights

Now Reading

Unlock the Power of Images With AI Sheets
Oct 22, 2025

See 1540 more →

Created with Quartz, © 2025

Bluesky
Linkedin
Mastodon
Twitter
Unsplash
GitHub
RSS