04·1 notion

Context Engineering

Managing the context window: compression, memory, prompt caching, budgeting.

Prompt caching

API-level exposure of the inference engine's KV cache: you pay once for the static prefix (tool defs, system prompt, project context) then read it back at 0.1x input price on all following requests. Claude Code shows 92% hit-rate and -81% cost on a session. It is not a toggle, it is an architectural discipline.