3. TodoWrite
"An agent without a plan drifts"
New to this?
What is a TodoManager?
A simple class that keeps a list of tasks with statuses (pending, in_progress, done). It forces the model to write down its plan before executing, so it doesn't lose track of what to do next.
Why only one task in_progress at a time?
It forces sequential focus. The model must finish or update the current task before moving on. This prevents the agent from starting 5 things and finishing none.
What's a nag reminder?
A small text injection (<reminder>) added to the conversation if the model hasn't updated its todo list in 3+ rounds. It nudges the model to stay on track without the user having to intervene.
How is this different from a regular checklist?
A regular checklist is static text the model might ignore. TodoWrite is a tool the model actively calls to update statuses. The harness tracks rounds and injects reminders, making it a feedback loop, not just a note.
The Problem
On multi-step tasks, the model loses track. It repeats work, skips steps, or wanders off. Long conversations make this worse β the system prompt fades as tool results fill the context. A 10-step refactoring might complete steps 1-3, then the model starts improvising because it forgot steps 4-10.
The Solution
+--------+ +-------+ +---------+
| User | ---> | LLM | ---> | Tools |
| prompt | | | | + todo |
+--------+ +---+---+ +----+----+
^ |
| tool_result |
+----------------+
|
+-----------+-----------+
| TodoManager state |
| [ ] task A |
| [>] task B <- doing |
| [x] task C |
+-----------------------+
|
if rounds_since_todo >= 3:
inject <reminder> into tool_result
How It Works
- TodoManager stores items with statuses. Only one item can be
in_progressat a time.
class TodoManager:
def update(self, items: list) -> str:
validated, in_progress_count = [], 0
for item in items:
status = item.get("status", "pending")
if status == "in_progress":
in_progress_count += 1
validated.append({"id": item["id"], "text": item["text"],
"status": status})
if in_progress_count > 1:
raise ValueError("Only one task can be in_progress")
self.items = validated
return self.render()
- The
todotool goes into the dispatch map like any other tool.
TOOL_HANDLERS = {
# ...base tools...
"todo": lambda **kw: TODO.update(kw["items"]),
}
- A nag reminder injects a nudge if the model goes 3+ rounds without calling
todo.
if rounds_since_todo >= 3 and messages:
last = messages[-1]
if last["role"] == "user" and isinstance(last.get("content"), list):
last["content"].insert(0, {
"type": "text",
"text": "<reminder>Update your todos.</reminder>",
})
The βone in_progress at a timeβ constraint forces sequential focus. The nag reminder creates accountability.
What Changed From Tool Use
| Component | Before (Tool Use) | After (TodoWrite) |
|---|---|---|
| Tools | 4 | 5 (+todo) |
| Planning | None | TodoManager with statuses |
| Nag injection | None | <reminder> after 3 rounds |
| Agent loop | Simple dispatch | + rounds_since_todo counter |
Key Takeaway
Planning is not optional for multi-step work. The TodoWrite pattern gives the model a structured way to track its own progress, with the harness enforcing accountability through nag reminders. The loop barely changes β one new tool, one counter, one injection point.
Interactive Code Walkthrough
1class TodoManager:2 def update(self, items: list) -> str:3 validated, in_progress_count = [], 04 for item in items:5 status = item.get("status", "pending")6 if status == "in_progress":7 in_progress_count += 18 validated.append({"id": item["id"], "text": item["text"],9 "status": status})10 if in_progress_count > 1:11 raise ValueError("Only one task can be in_progress")12 self.items = validated13 return self.render()14 15if rounds_since_todo >= 3 and messages:16 last = messages[-1]17 if last["role"] == "user" and isinstance(last.get("content"), list):18 last["content"].insert(0, {19 "type": "text",20 "text": "<reminder>Update your todos.</reminder>",21 })22 TodoManager is a simple class. The update() method is the only write operation β the model calls it with the full list of items every time it wants to change anything.Why does the TodoManager enforce only one task as in_progress at a time? What would go wrong if the agent could start multiple tasks simultaneously?
Hint
Think about context switching, partial completions, and how the nag reminder works.
Create a multi-step task and watch the agent make a todo list. Then add a priority field to tasks (high/medium/low) and modify the nag reminder to show the highest-priority pending task.
Hint
Sort pending tasks by priority before picking which one to nag about.
Add task time tracking: record when each task moves to in_progress and completed. Generate a summary at the end showing time spent per task and total session time.
Hint
Store timestamps in the task dict and compute durations in a summary() method.