all domains
01·1 notion

LLMs

Models themselves: loading, serving, quantization, inference-time optimization.

Apr 19,

KV Cache

Cache of the keys and values (K, V) from the attention mechanism so we do not recompute attention over tokens we have already seen. Essential for inference server performance.