07·1 notion
LLM Optimization
Inference servers, KV cache, batching, page attention. Serving LLMs fast and cheap.
Apr 13,
vLLM
High-performance LLM inference server, de facto reference for production in 2024-2026. Implements Page Attention, continuous batching, KV cache, and runtime LoRA adapters.