04·1 notion
Context Engineering
Managing the context window: compression, memory, prompt caching, budgeting.
Apr 19,
Prompt caching
API-level exposure of the inference engine's KV cache: you pay once for the static prefix (tool defs, system prompt, project context) then read it back at 0.1x input price on all following requests. Claude Code shows 92% hit-rate and -81% cost on a session. It is not a toggle, it is an architectural discipline.