Building Blocks of AI Agents

Watch or read first

Daily Dose DS, "Building blocks of AI Agents" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
CrewAI docs (https://docs.crewai.com/) and LangGraph docs (https://langchain-ai.github.io/langgraph/) - both materialize these blocks as framework primitives.
Anthropic, "Building effective agents" (2024): https://www.anthropic.com/research/building-effective-agents

TL;DR

Effective agents are built on six building blocks: role-playing, focus, tools, cooperation, guardrails, memory. Skip any one and your agent drifts, hallucinates, or loops. These map directly to framework primitives (CrewAI roles, LangGraph state, OpenAI function calls).

The historical problem

In 2022-2023, "build an agent" meant wrapping an LLM call in a while loop with a few tools. It worked in demos. It failed in production because:

The agent had no clear role, so it answered generically
It had access to too many tools and got confused
It had no memory across turns
It hallucinated or ran off on unrelated tasks
Multiple agents shipped together produced noise, not signal

The field learned: an agent is not just an LLM + tools. It is a system with structural concerns. Daily Dose DS and CrewAI articulate these as six building blocks.

How it works: the six building blocks

1. Role-playing

Give the agent a specific role in its system prompt.

Bad: "You are a helpful AI assistant."
Better: "You are a senior contract lawyer specializing in SaaS enterprise deals."

Why: role assignment shapes the agent's reasoning, vocabulary, and retrieval priorities. The more specific, the sharper the output.

2. Focus (narrow tasks)

Overloading an agent hurts performance. Giving it 20 tools and 15 goals leads to confusion.

Pattern: one agent, one narrow responsibility. Use multiple agents with clean interfaces instead of one do-it-all agent. Example:

Marketing agent: tone, audience, messaging. NOT pricing, NOT market analysis.
Separate agents handle what is outside scope.

Daily Dose DS rule: specialized agents perform better, every time.

3. Tools

Give the agent exactly the tools it needs. Not more.

Typical tools for a research agent:

Web search (Tavily, Exa, Brave)
Summarization (internal LLM call)
Citation formatter

Adding irrelevant tools (speech-to-text, code exec) confuses the LLM about when to use what.

Custom tools

Frameworks support custom Python tools. Example in CrewAI: a currency converter tool that hits an exchange rate API.

class CurrencyConverterTool(BaseTool):
    name = "currency_converter"
    description = "Convert an amount from one currency to another"

    def _run(self, amount: float, source: str, target: str) -> str:
        rate = fetch_rate(source, target)
        return f"{amount * rate} {target}"

Custom tools via MCP

Instead of embedding the tool in every agent, expose it as an MCP server. Any agent (CrewAI, LangGraph, Claude Code, custom) can connect via MCP and use the tool without re-implementing.

See [[../06-mcp/README]] and agent protocols.

4. Cooperation

Multi-agent systems work best when agents collaborate and exchange feedback.

Example: AI-powered financial analysis system:

Agent A gathers data
Agent B assesses risk
Agent C builds strategy
Agent D writes the report

Each specializes. They share intermediate outputs. See agentic design patterns for the 7 multi-agent patterns.

5. Guardrails

Unconstrained agents go off track: hallucinate, loop endlessly, call dangerous tools.

Typical guardrails:

Tool usage limits: max N calls, rate limits per tool
Validation checkpoints: verify output matches schema before proceeding
Fallback: if stuck, escalate to a human or another agent
Input filters: detect prompt injection, PII, jailbreaks
Output filters: ensure compliance, remove PII, check format

Example: a legal assistant must avoid outdated laws or false claims. A guardrail verifies citations against a trusted database.

See [[../12-safety-guardrails/README]].

6. Memory

Without memory, an agent starts fresh every turn. User said "my name is Alice" five seconds ago? Forgotten.

Memory types (quick list, full detail in agent memory):

Short-term: conversation history within a session
Long-term: facts across sessions
Entity memory: tracked entities (users, products, orders)
Episodic: past interactions
Semantic: learned facts
Procedural: learned how-to

Without memory, no personalization, no continuous learning, no context awareness.

Relevance today (2026)

CrewAI, LangGraph, OpenAI SDK all materialize these blocks

Every serious 2026 framework exposes the 6 blocks as primitives:

Role: agent.role, agent.backstory, system_prompt
Focus: agent.goal, task descriptions
Tools: agent.tools=[...], function_calling schema
Cooperation: crews, graphs, handoffs
Guardrails: validators, output parsers, policy middleware
Memory: built-in short-term buffer, long-term stores

MCP changed the tools block

In 2024, each framework had its own tool format. By 2026 with MCP, you expose tools ONCE as an MCP server, and any compliant agent can use them. This is a major architectural shift. See agent protocols.

Guardrails became a product category

NeMo Guardrails (NVIDIA), Lakera Guard, Prompt Armor, Guardrails AI. In 2026, building guardrails by hand is a mistake. Use a library.

Memory is the frontier

Most frameworks have short-term memory solved. Long-term memory is still immature in 2026. Libraries like Zep, Letta (ex-MemGPT), Mem0 are competing to become the standard.

ARQ adds a seventh block

Some teams now add Attentive Reasoning Queries (ARQ) as a 7th block: the reasoning schema that keeps the agent aligned with complex policies. See reasoning prompting techniques. Parlant productizes this.

Critical questions

What happens if you skip role-playing? (Generic output, no expert voice. Easiest fix, always do it.)
Is it better to have one agent with 20 tools or 5 agents with 4 tools each? (Usually the second, if the tools cluster by domain. But more agents = more handoffs = more latency.)
How do you test memory works? (Turn-by-turn eval: "my name is X" in turn 1, "what is my name?" in turn 3. Plus a week-later test for long-term memory.)
Can guardrails hurt the agent? (Yes, over-strict filters block legitimate answers. Tune carefully.)
Which framework exposes these blocks best? (CrewAI has the cleanest mapping for role/focus. LangGraph for cooperation. OpenAI Agents SDK for tools/guardrails. Memory is immature everywhere.)

Production pitfalls

Role too generic. "Helpful assistant" with 50 tools: drift city.
Tool descriptions too vague. The LLM cannot decide when to use the tool. Write specific, example-rich descriptions.
No cooperation protocol. Multi-agent system where each agent freelances. Define handoff conditions.
No guardrails on tool output. Tool returns malicious content, LLM acts on it. Sanitize.
No short-term memory limit. Buffer grows, context costs explode, model degrades.
No long-term memory eviction. Store grows indefinitely. Stale facts override fresh ones.
Over-engineering. Starting with all 6 blocks before validating the core loop. Prototype with role + tools + memory, add the rest as needed.

Alternatives / Comparisons

Framework-by-framework mapping of the 6 blocks:

Framework	Role	Focus	Tools	Cooperation	Guardrails	Memory
CrewAI	agent.role	agent.goal	agent.tools	Crew	Basic	Basic
LangGraph	system prompt	Task node	ToolNode	Multi-agent graph	Middleware	Checkpointer
LlamaIndex	system prompt	query	Tools API	Multi-agent workflows	Custom	Built-in
OpenAI Agents SDK	instructions	-	tools	Handoffs	Guardrails API	Threads
PydanticAI	system_prompt	-	tools	Graph	Validators	Dependencies

No framework is strictly better. Pick based on your stack and team familiarity.

Mental parallels (non-AI)

Company organization chart:
- Role = job title
- Focus = narrow scope of responsibility
- Tools = software and access rights
- Cooperation = cross-functional team
- Guardrails = compliance, code review, HR policies
- Memory = CRM, knowledge base, personal notes A company without any of these six breaks. Same for agents.
Chef in a restaurant:
- Role = pastry chef vs saucier
- Focus = owns their station only
- Tools = pans, knives, mise-en-place
- Cooperation = passes to next station
- Guardrails = health code, allergy protocols
- Memory = today's prep notes, customer preferences

Mini-lab

labs/agent-building-blocks/ (to create):

Build a customer support agent with CrewAI that has:
- Role: "Senior customer support specialist"
- Focus: "Handle returns and refunds only"
- Tools: order_lookup (SQL), issue_refund (API), escalate_to_human
- Cooperation: handoff to a billing agent for complex cases
- Guardrails: never promise refunds > $500, never share other customer's info
- Memory: per-user conversation history + known order history
Test against 20 scenarios.
Measure: accuracy, correct tool use, guardrail violation rate, memory usage.

Stack: uv, crewai, sqlite, anthropic.

Building Blocks of AI Agents

Building Blocks of AI Agents

Watch or read first

TL;DR

The historical problem

How it works: the six building blocks

1. Role-playing

2. Focus (narrow tasks)

3. Tools

Custom tools

Custom tools via MCP

4. Cooperation

5. Guardrails

6. Memory

Relevance today (2026)

CrewAI, LangGraph, OpenAI SDK all materialize these blocks

MCP changed the tools block

Guardrails became a product category

Memory is the frontier

ARQ adds a seventh block

Critical questions

Production pitfalls

Alternatives / Comparisons

Mental parallels (non-AI)

Mini-lab

Further reading

Canonical

Related in this KB

Tools