07·1 notion

LLM Optimization

Inference servers, KV cache, batching, page attention. Serving LLMs fast and cheap.

vLLM

High-performance LLM inference server, de facto reference for production in 2024-2026. Implements Page Attention, continuous batching, KV cache, and runtime LoRA adapters.