Agent Levels, Architecture Layers, Deployment Strategies (+ glossary)
Three complementary frames for understanding agentic systems: 5 levels of autonomy (who controls the flow), 4 layers of the stack (LLM -> Agent -> Multi-agent -> Infrastructure), 4 deployment patterns (batch, stream, real-time, edge). Plus a 30-term glossary so you stop confusing Orchestration with Routing.
Agent Levels, Architecture Layers, Deployment Strategies (+ glossary)
Watch or read first
- Daily Dose DS, "5 Levels of Agentic AI Systems", "4 Layers of Agentic AI", "AI Agent Deployment Strategies", and "30 Must-Know Agentic AI Terms" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- Anthropic, "Building effective agents" (2024): https://www.anthropic.com/research/building-effective-agents
TL;DR
Three complementary frames for understanding agentic systems: 5 levels of autonomy (who controls the flow), 4 layers of the stack (LLM -> Agent -> Multi-agent -> Infrastructure), 4 deployment patterns (batch, stream, real-time, edge). Plus a 30-term glossary so you stop confusing Orchestration with Routing.
The 5 levels of agentic AI systems (Daily Dose DS)
A ladder of autonomy. Each level gives more control to the LLM.
Level 1: Basic responder
Human drives the flow completely.
LLM just produces an output given an input.
This is "I paste a question into ChatGPT and copy the answer". No agent.
Level 2: Router pattern
Human defines the paths/functions.
LLM picks which path to take.
Example: a chatbot with predefined buttons where the model decides whether to go to "FAQ", "tech support", or "sales".
Level 3: Tool calling
Human defines a set of tools.
LLM decides when to use them and with what arguments.
This is the classic modern agent with function calling. Most 2026 production agents live here.
Level 4: Multi-agent pattern
A manager agent coordinates multiple sub-agents.
Human lays out the hierarchy, roles, and tools.
LLM controls execution flow and delegates.
See agentic design patterns for the 7 multi-agent topologies.
Level 5: Autonomous pattern
LLM generates and executes new code independently.
Effectively an AI developer.
Reference products in 2026: Devin (Cognition), Manus, Claude Code in agent mode, Cursor Composer, Replit Agent. Powerful, riskier to deploy.
Practical takeaway
Match the lowest level that solves your problem. Level 5 is not always better, often worse (less controllable, more expensive).
The 4 layers of agentic AI (Daily Dose DS)
Architectural layering, ground up.
Layer 1: LLMs (foundation)
Models like GPT, Claude, Gemini, DeepSeek. Core concerns:
- Tokenization and inference parameters
- Prompt engineering
- LLM APIs
This is what every higher layer depends on.
Layer 2: AI agents (built on LLMs)
Wrap an LLM with autonomy:
- Tool usage / function calling
- Agent reasoning (ReAct, CoT)
- Task planning and decomposition
- Memory management
See what is an agent, react pattern, agent building blocks.
Layer 3: Agentic systems (multi-agent)
Multiple agents collaborating:
- Inter-agent communication (A2A, ACP)
- Routing and scheduling
- State coordination
- Multi-agent RAG
- Agent roles and specialization
- Orchestration frameworks (CrewAI, LangGraph)
See agentic design patterns, agent protocols.
Layer 4: Agentic infrastructure
The production wrapper:
- Observability and logging (Langfuse, LangSmith, Arize, DeepEval)
- Error handling and retries
- Security and access control
- Rate limiting and cost management
- Workflow automation
- Human-in-the-loop controls
Without this layer, your agent is a prototype, not a product. See [[../11-infrastructure/README]].
Why layers matter
When something breaks, you debug the correct layer:
- Bad answer -> probably Layer 1 (wrong model, weak prompt)
- Wrong tool call -> Layer 2 (reasoning or tool design)
- Duplicated work across agents -> Layer 3 (coordination)
- OOM, latency spikes, cost blowout -> Layer 4 (infra)
The 4 deployment strategies (Daily Dose DS)
How you run the agent in production.
1. Batch deployment
Scheduled CLI job. Runs periodically.
- Connects to DBs, APIs, tools
- Processes data in bulk
- Optimized for throughput, not latency
Best for: large volume of data that does not need immediate response. Example: nightly report generation, weekly competitive analysis.
2. Stream deployment
Part of a streaming data pipeline.
- Continuously processes data as it flows
- Handles concurrent streams
- Connects to streaming storage (Kafka, Kinesis) and backend services
Best for: continuous data processing, real-time monitoring, anomaly detection.
3. Real-time deployment
The agent sits behind an API (REST or gRPC).
- Request arrives, agent reasons, agent responds
- Load balancers scale concurrency
- Sub-second latency expectations
Best for: chatbots, assistants, interactive apps. The default for user-facing products.
4. Edge deployment
The agent runs on the device (mobile, smartwatch, laptop).
- No server round-trip
- Sensitive data stays local
- Works offline
Best for: privacy-first apps, offline functionality, low-latency needs where a network is unreliable.
Quick picker
| Optimization target | Deployment |
|---|---|
| Maximum throughput, async | Batch |
| Continuous processing | Stream |
| Instant interaction | Real-time |
| Privacy + offline | Edge |
Most 2026 products use Real-time for the main interface + Batch for nightly enrichment + Stream for monitoring. Edge is niche but growing as local models get good.
Relevance today (2026)
The levels ladder is the right framing
Daily Dose DS's 5 levels is a clean way to discuss scope and risk with stakeholders. Most teams overshoot to level 4 or 5 when level 3 would do. Pushing toward lower levels in production pays dividends in reliability.
Layer 4 is where most teams fail
Great models and clever agents. Zero observability. No cost cap. No retries. The gap between "agent works on my laptop" and "agent works for 1000 paying users" is the infrastructure layer.
Deployment strategies are converging
Hybrid deployments are now standard:
- Real-time interactive UX
- Batch for heavy enrichment that doesn't need to block
- Stream for monitoring and log analysis
- Edge for privacy tier
Frameworks like Inngest, Temporal, Dagster make multi-deployment agents practical.
2026 reality check
In 2024, most production agents were Level 3 real-time chatbots with OK observability. By 2026:
- Level 4 multi-agent systems are mainstream
- Edge deployment is rising with good local models (Llama 3, Gemma, Phi)
- Stream deployment for security/fraud detection is booming
- Level 5 is still risky but used in agentic IDEs and code agents
30 Must-Know Agentic AI Terms (Daily Dose DS glossary)
A reference list. Quick definitions, cross-references to deeper notions in this KB.
| Term | Definition | More in KB |
|---|---|---|
| Agent | Autonomous AI entity that perceives, reasons, acts toward a goal | what is an agent |
| Environment | The world or system where an agent operates | - |
| Action | A task performed by an agent | react pattern |
| Observation | Data the agent receives from its environment | react pattern |
| Goal | The outcome the agent is designed to achieve | what is an agent |
| LLMs | Large Language Models powering agent reasoning | language models |
| Tools | APIs or utilities agents use to interact with the world | function calling |
| Evaluation | Assessing how well an agent performs | [[../08-evaluations/README]] |
| Orchestration | Coordinating multiple agents | agentic design patterns |
| Multi-agent system | Group of agents collaborating | agentic design patterns |
| Human-in-the-loop | Human intervention in agent decisions | agent building blocks |
| Reflection | Agent self-assessing its actions | agentic design patterns |
| Planning | Determining the sequence of steps to reach a goal | agentic design patterns |
| ReAct | Reasoning + Acting combined | react pattern |
| Feedback loop | Continuous outcome observation and adjustment | react pattern |
| Context window | Maximum info an agent can consider at once | [[../04-context-engineering/README]] |
| System prompt | Persistent instructions defining agent behavior | agent building blocks |
| Few-shot learning | Teaching new behavior via a few examples | [[../02-prompt-engineering/README]] |
| Hierarchical Agents | Multi-level structure with supervisor + sub-agents | agentic design patterns |
| Short-term memory | Context within a session | agent memory |
| Long-term memory | Context across sessions | agent memory |
| Knowledge base | Structured store of info for reasoning | vector databases |
| Context engineering | Shaping info seen by the agent | [[../04-context-engineering/README]] |
| Guardrails | Rules preventing harmful or undesired actions | [[../12-safety-guardrails/README]] |
| Tool call | API invocation by an agent | function calling |
| Guidelines | Policies aligning agent behavior | agent building blocks |
| ARQ | Structured reasoning via JSON schema | reasoning prompting techniques |
| MCP | Standardized agent-to-tool protocol | [[../06-mcp/README]] / agent protocols |
| A2A | Agent-to-Agent protocol | agent protocols |
| Router | Mechanism that directs tasks to the right agent or tool | agentic design patterns |
Critical questions
- Does every agent need to be Level 5? (No. Level 5 is riskier and more expensive. Pick the lowest level that works.)
- When do you split Layer 3 from Layer 2? (When you genuinely need multiple specialized agents. Resist the urge if one agent with more tools would do.)
- Can you deploy the same agent logic in multiple modes? (Yes, if you decouple the agent core from the invocation layer. A well-architected agent runs as real-time API, batch job, or stream consumer.)
- Which deployment is cheapest? (Batch usually. Real-time is most expensive per request because of over-provisioning.)
- Why is "Orchestration" different from "Routing"? (Orchestration coordinates multiple agents' actions over time. Routing picks one agent or tool per task.)
- Do you need Layer 4 if you have 10 users? (Yes. Observability is not optional. You will regret lacking it.)
Production pitfalls
- Level overshoot. Starting at Level 4 multi-agent when Level 3 single-agent would work. Premature complexity.
- Layer 4 as afterthought. Observability bolted on months after launch. You already lost months of data.
- Wrong deployment mode. Running a real-time agent that does a 45-second task. Users time out. Use batch or async.
- Glossary drift. Team members use "agent", "workflow", "orchestration" inconsistently. Align on the 30 terms early.
- Edge deployment without quantization. Trying to run Llama 70B on a phone. Use small models (Phi-3, Gemma-2B) or quantized versions.
- Batch jobs without idempotency. Retry on failure doubles the work. Always design batch jobs to be safe to re-run.
Mental parallels (non-AI)
- DevOps maturity model: from manual ops (Level 1) to GitOps (Level 3) to self-healing platforms (Level 5). Same ladder of automation.
- Self-driving cars (SAE levels 0-5): Level 0 (no automation) to Level 5 (fully autonomous). Agentic AI borrows the framing directly.
- Employee autonomy: intern (Level 1) -> junior (Level 2) -> senior IC with tools (Level 3) -> team lead (Level 4) -> staff engineer who writes systems (Level 5).
- Network stack: LLM = physical layer, Agent = transport, Agentic system = application, Infra = ops. Layering clarifies ownership.
Mini-lab
labs/agent-deployment/ (to create):
- Build one agent logic (simple research agent).
- Deploy it in three modes:
- Real-time: FastAPI endpoint, streaming responses
- Batch: CLI that processes 100 queries overnight, writes to SQLite
- Stream: Kafka consumer that triggers the agent on each event
- Add observability with Langfuse on all three.
- Measure cost per task, latency, throughput per mode.
- Bonus: port the real-time version to run on-device with a quantized Gemma-2B.
Stack: uv, langgraph or custom ReAct, fastapi, kafka-python, langfuse.
Further reading
Canonical
- Daily Dose DS, "5 Levels of Agentic AI Systems", "4 Layers of Agentic AI", "Deployment Strategies", "30 Must-Know Agentic AI Terms" (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- Anthropic, "Building effective agents" (2024): https://www.anthropic.com/research/building-effective-agents
Related in this KB
- what is an agent
- agent building blocks
- react pattern
- agentic design patterns
- agent memory
- agent protocols
- function calling
- agentic rag
- [[../06-mcp/README]]
- [[../08-evaluations/README]]
- [[../09-observability/README]]
- [[../11-infrastructure/README]]
- [[../12-safety-guardrails/README]]
Tools
- Orchestration frameworks: CrewAI (https://docs.crewai.com/), LangGraph (https://langchain-ai.github.io/langgraph/), LlamaIndex Agents (https://docs.llamaindex.ai/en/stable/understanding/agent/), AutoGen (https://github.com/microsoft/autogen), PydanticAI (https://ai.pydantic.dev/)
- Workflow engines: Inngest (https://www.inngest.com/), Temporal (https://temporal.io/), Dagster (https://dagster.io/), Prefect (https://www.prefect.io/)
- Observability: Langfuse (https://langfuse.com/docs), LangSmith (https://docs.smith.langchain.com/), Arize (https://github.com/Arize-ai/phoenix), Helicone (https://www.helicone.ai/), DeepEval (https://github.com/confident-ai/deepeval)
- Edge LLMs: llama.cpp (https://github.com/ggerganov/llama.cpp), MLX (https://github.com/ml-explore/mlx), ollama (https://ollama.com/), LM Studio (https://lmstudio.ai/), WebLLM (https://github.com/mlc-ai/web-llm)