Glossary
Key terms and concepts from the Agent Harness Engineering course, in order of appearance.
Agent Loop The Agent Loop
The core while-true loop that sends messages to the LLM, checks if it wants to use tools, executes them, and feeds results back. The heartbeat of every agent.
Harness The Agent Loop
The code that wraps the LLM — tool execution, context management, safety checks. The model decides; the harness executes.
Tool Schema Tool Use
A JSON object describing a tool's name, description, and input parameters. The LLM reads schemas to decide which tool to call and with what arguments.
Dispatch Map Tool Use
A dictionary mapping tool names to handler functions: {"bash": run_bash, "read_file": run_read}. One lookup replaces a chain of if/elif.
Path Sandboxing Tool Use
Resolving file paths against an allowed root directory to prevent the agent from reading or writing outside its workspace.
TodoManager TodoWrite
A planning tool that tracks tasks with statuses (pending, in_progress, completed). Enforces one-at-a-time focus and injects nag reminders.
Subagent Subagents
A disposable child agent spawned with a fresh context for a single subtask. Returns a summary and dies — no context pollution.
Skill Loading Skills
Two-layer pattern: cheap descriptions in the system prompt, expensive full content loaded on-demand. Keeps the context lean.
Context Compaction Context Compact
Progressive compression of conversation history. Micro: trim old tool results. Mid: summarize at 50k tokens. Hard: LLM-written summary at 80k tokens.
Micro Compaction Context Compact
Replacing tool results older than 3 turns with one-line placeholders like "[Previous: used read_file]". Silent, runs every turn.
Hard Compaction Context Compact
Asking the LLM to write a dense summary of the entire conversation, then replacing all history with that summary. Last resort.
Task Graph Tasks
A directed acyclic graph (DAG) of tasks with dependencies (blockedBy). Persisted as JSON files on disk, surviving crashes and context compression.
Background Task Background Tasks
A subprocess running in a thread while the agent continues thinking. Results are injected via a shared queue before each LLM call.
Agent Mailbox Agent Teams
An append-only JSONL file (e.g., .team/inbox/alice.jsonl) for inter-agent communication. Write by appending; read by draining (read + truncate).
Drain Pattern Agent Teams
Read all messages from a mailbox, then truncate the file. Messages are consumed once and not replayed on the next poll.
Team Roster Agent Teams
A JSON config listing all teammates with names, roles, and lifecycle statuses (IDLE, WORKING, SHUTDOWN).
Request-Response Protocol Team Protocols
A structured mailbox message format with unique IDs, enabling agents to match responses to requests.
Autonomous Claiming Autonomous Agents
Agents scan the task board independently and claim ready tasks without being told. Self-organizing teamwork.
Worktree Worktree + Task Isolation
A separate git working directory linked to the same repository. Each worktree is on its own branch, providing filesystem isolation for parallel agent work.
Eval Case Agent Evals
A structured test scenario for an agent: a prompt, resource limits, and a checker function that scores the outcome.
Scoring Rubric Agent Evals
Criteria that define agent success: pass/fail checks, partial credit, efficiency metrics. Turns subjective quality into numbers.
Guardrail Guardrails & Safety
A permission layer that checks every tool call before execution. Can auto-approve, require human approval, or deny outright.
Human-in-the-Loop Guardrails & Safety
Pausing the agent before dangerous actions to ask the user for confirmation. The harness intercepts the tool call and waits for approval.
Cost Cap Guardrails & Safety
A hard ceiling on total token spending. Once exceeded, all tool calls are blocked regardless of other permissions.
Agent Trace Observability & Debugging
A JSONL recording of every event in an agent session: LLM calls, tool executions, errors, compressions. The flight recorder for debugging.
Replay Debugging Observability & Debugging
Re-running an agent using recorded LLM responses instead of live API calls. Deterministic reproduction of bugs, zero token cost.
Model Routing Shipping to Production
Using different models for different tasks: cheap models (Haiku) for simple decisions, powerful models (Opus/Sonnet) for complex reasoning.
Exponential Backoff Shipping to Production
Retry strategy where wait time doubles between attempts (1s, 2s, 4s, 8s). Prevents hammering a failing API.