Building Blocks of AI Agents
Effective agents are built on six building blocks: **role-playing, focus, tools, cooperation, guardrails, memory**. Skip any one and your agent drifts, hallucinates, or loops. These map directly to framework primitives (CrewAI roles, LangGraph state, OpenAI function calls).
Building Blocks of AI Agents
Watch or read first
- Daily Dose DS, "Building blocks of AI Agents" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- CrewAI docs (https://docs.crewai.com/) and LangGraph docs (https://langchain-ai.github.io/langgraph/) - both materialize these blocks as framework primitives.
- Anthropic, "Building effective agents" (2024): https://www.anthropic.com/research/building-effective-agents
TL;DR
Effective agents are built on six building blocks: role-playing, focus, tools, cooperation, guardrails, memory. Skip any one and your agent drifts, hallucinates, or loops. These map directly to framework primitives (CrewAI roles, LangGraph state, OpenAI function calls).
The historical problem
In 2022-2023, "build an agent" meant wrapping an LLM call in a while loop with a few tools. It worked in demos. It failed in production because:
- The agent had no clear role, so it answered generically
- It had access to too many tools and got confused
- It had no memory across turns
- It hallucinated or ran off on unrelated tasks
- Multiple agents shipped together produced noise, not signal
The field learned: an agent is not just an LLM + tools. It is a system with structural concerns. Daily Dose DS and CrewAI articulate these as six building blocks.
How it works: the six building blocks
1. Role-playing
Give the agent a specific role in its system prompt.
- Bad: "You are a helpful AI assistant."
- Better: "You are a senior contract lawyer specializing in SaaS enterprise deals."
Why: role assignment shapes the agent's reasoning, vocabulary, and retrieval priorities. The more specific, the sharper the output.
2. Focus (narrow tasks)
Overloading an agent hurts performance. Giving it 20 tools and 15 goals leads to confusion.
Pattern: one agent, one narrow responsibility. Use multiple agents with clean interfaces instead of one do-it-all agent. Example:
- Marketing agent: tone, audience, messaging. NOT pricing, NOT market analysis.
- Separate agents handle what is outside scope.
Daily Dose DS rule: specialized agents perform better, every time.
3. Tools
Give the agent exactly the tools it needs. Not more.
Typical tools for a research agent:
- Web search (Tavily, Exa, Brave)
- Summarization (internal LLM call)
- Citation formatter
Adding irrelevant tools (speech-to-text, code exec) confuses the LLM about when to use what.
Custom tools
Frameworks support custom Python tools. Example in CrewAI: a currency converter tool that hits an exchange rate API.
class CurrencyConverterTool(BaseTool):
name = "currency_converter"
description = "Convert an amount from one currency to another"
def _run(self, amount: float, source: str, target: str) -> str:
rate = fetch_rate(source, target)
return f"{amount * rate} {target}"
Custom tools via MCP
Instead of embedding the tool in every agent, expose it as an MCP server. Any agent (CrewAI, LangGraph, Claude Code, custom) can connect via MCP and use the tool without re-implementing.
See [[../06-mcp/README]] and agent protocols.
4. Cooperation
Multi-agent systems work best when agents collaborate and exchange feedback.
Example: AI-powered financial analysis system:
- Agent A gathers data
- Agent B assesses risk
- Agent C builds strategy
- Agent D writes the report
Each specializes. They share intermediate outputs. See agentic design patterns for the 7 multi-agent patterns.
5. Guardrails
Unconstrained agents go off track: hallucinate, loop endlessly, call dangerous tools.
Typical guardrails:
- Tool usage limits: max N calls, rate limits per tool
- Validation checkpoints: verify output matches schema before proceeding
- Fallback: if stuck, escalate to a human or another agent
- Input filters: detect prompt injection, PII, jailbreaks
- Output filters: ensure compliance, remove PII, check format
Example: a legal assistant must avoid outdated laws or false claims. A guardrail verifies citations against a trusted database.
See [[../12-safety-guardrails/README]].
6. Memory
Without memory, an agent starts fresh every turn. User said "my name is David" five seconds ago? Forgotten.
Memory types (quick list, full detail in agent memory):
- Short-term: conversation history within a session
- Long-term: facts across sessions
- Entity memory: tracked entities (users, products, orders)
- Episodic: past interactions
- Semantic: learned facts
- Procedural: learned how-to
Without memory, no personalization, no continuous learning, no context awareness.
Relevance today (2026)
CrewAI, LangGraph, OpenAI SDK all materialize these blocks
Every serious 2026 framework exposes the 6 blocks as primitives:
- Role:
agent.role,agent.backstory,system_prompt - Focus:
agent.goal, task descriptions - Tools:
agent.tools=[...], function_calling schema - Cooperation: crews, graphs, handoffs
- Guardrails: validators, output parsers, policy middleware
- Memory: built-in short-term buffer, long-term stores
MCP changed the tools block
In 2024, each framework had its own tool format. By 2026 with MCP, you expose tools ONCE as an MCP server, and any compliant agent can use them. This is a major architectural shift. See agent protocols.
Guardrails became a product category
NeMo Guardrails (NVIDIA), Lakera Guard, Prompt Armor, Guardrails AI. In 2026, building guardrails by hand is a mistake. Use a library.
Memory is the frontier
Most frameworks have short-term memory solved. Long-term memory is still immature in 2026. Libraries like Zep, Letta (ex-MemGPT), Mem0 are competing to become the standard.
ARQ adds a seventh block
Some teams now add Attentive Reasoning Queries (ARQ) as a 7th block: the reasoning schema that keeps the agent aligned with complex policies. See reasoning prompting techniques. Parlant productizes this.
Critical questions
- What happens if you skip role-playing? (Generic output, no expert voice. Easiest fix, always do it.)
- Is it better to have one agent with 20 tools or 5 agents with 4 tools each? (Usually the second, if the tools cluster by domain. But more agents = more handoffs = more latency.)
- How do you test memory works? (Turn-by-turn eval: "my name is X" in turn 1, "what is my name?" in turn 3. Plus a week-later test for long-term memory.)
- Can guardrails hurt the agent? (Yes, over-strict filters block legitimate answers. Tune carefully.)
- Which framework exposes these blocks best? (CrewAI has the cleanest mapping for role/focus. LangGraph for cooperation. OpenAI Agents SDK for tools/guardrails. Memory is immature everywhere.)
Production pitfalls
- Role too generic. "Helpful assistant" with 50 tools: drift city.
- Tool descriptions too vague. The LLM cannot decide when to use the tool. Write specific, example-rich descriptions.
- No cooperation protocol. Multi-agent system where each agent freelances. Define handoff conditions.
- No guardrails on tool output. Tool returns malicious content, LLM acts on it. Sanitize.
- No short-term memory limit. Buffer grows, context costs explode, model degrades.
- No long-term memory eviction. Store grows indefinitely. Stale facts override fresh ones.
- Over-engineering. Starting with all 6 blocks before validating the core loop. Prototype with role + tools + memory, add the rest as needed.
Alternatives / Comparisons
Framework-by-framework mapping of the 6 blocks:
| Framework | Role | Focus | Tools | Cooperation | Guardrails | Memory |
|---|---|---|---|---|---|---|
| CrewAI | agent.role | agent.goal | agent.tools | Crew | Basic | Basic |
| LangGraph | system prompt | Task node | ToolNode | Multi-agent graph | Middleware | Checkpointer |
| LlamaIndex | system prompt | query | Tools API | Multi-agent workflows | Custom | Built-in |
| OpenAI Agents SDK | instructions | - | tools | Handoffs | Guardrails API | Threads |
| PydanticAI | system_prompt | - | tools | Graph | Validators | Dependencies |
No framework is strictly better. Pick based on your stack and team familiarity.
Mental parallels (non-AI)
- Company organization chart:
- Role = job title
- Focus = narrow scope of responsibility
- Tools = software and access rights
- Cooperation = cross-functional team
- Guardrails = compliance, code review, HR policies
- Memory = CRM, knowledge base, personal notes A company without any of these six breaks. Same for agents.
- Chef in a restaurant:
- Role = pastry chef vs saucier
- Focus = owns their station only
- Tools = pans, knives, mise-en-place
- Cooperation = passes to next station
- Guardrails = health code, allergy protocols
- Memory = today's prep notes, customer preferences
Mini-lab
labs/agent-building-blocks/ (to create):
- Build a customer support agent with CrewAI that has:
- Role: "Senior customer support specialist"
- Focus: "Handle returns and refunds only"
- Tools: order_lookup (SQL), issue_refund (API), escalate_to_human
- Cooperation: handoff to a billing agent for complex cases
- Guardrails: never promise refunds > $500, never share other customer's info
- Memory: per-user conversation history + known order history
- Test against 20 scenarios.
- Measure: accuracy, correct tool use, guardrail violation rate, memory usage.
Stack: uv, crewai, sqlite, anthropic.
Further reading
Canonical
- Daily Dose DS, "Building blocks of AI Agents" (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
- CrewAI docs - https://docs.crewai.com
- LangGraph docs - https://langchain-ai.github.io/langgraph/
- Anthropic, "Building effective agents" (2024): https://www.anthropic.com/research/building-effective-agents
Related in this KB
- what is an agent
- function calling
- agent memory
- react pattern
- agentic design patterns
- agent protocols
- reasoning prompting techniques
- [[../06-mcp/README]]
- [[../12-safety-guardrails/README]]
Tools
- NeMo Guardrails (https://github.com/NVIDIA/NeMo-Guardrails), Lakera Guard (https://www.lakera.ai/lakera-guard), Guardrails AI (https://www.guardrailsai.com/)
- Zep (https://www.getzep.com/), Letta (https://www.letta.com/), Mem0 (https://mem0.ai/) - memory
- Parlant (ARQ + guardrails): https://github.com/emcie-co/parlant