3. TodoWrite

"An agent without a plan drifts"

20 min read
πŸ’‘New to this?

What is a TodoManager?

A simple class that keeps a list of tasks with statuses (pending, in_progress, done). It forces the model to write down its plan before executing, so it doesn't lose track of what to do next.

Why only one task in_progress at a time?

It forces sequential focus. The model must finish or update the current task before moving on. This prevents the agent from starting 5 things and finishing none.

What's a nag reminder?

A small text injection (<reminder>) added to the conversation if the model hasn't updated its todo list in 3+ rounds. It nudges the model to stay on track without the user having to intervene.

How is this different from a regular checklist?

A regular checklist is static text the model might ignore. TodoWrite is a tool the model actively calls to update statuses. The harness tracks rounds and injects reminders, making it a feedback loop, not just a note.

The Problem

On multi-step tasks, the model loses track. It repeats work, skips steps, or wanders off. Long conversations make this worse β€” the system prompt fades as tool results fill the context. A 10-step refactoring might complete steps 1-3, then the model starts improvising because it forgot steps 4-10.

The Solution

+--------+      +-------+      +---------+
|  User  | ---> |  LLM  | ---> | Tools   |
| prompt |      |       |      | + todo  |
+--------+      +---+---+      +----+----+
                    ^                |
                    |   tool_result  |
                    +----------------+
                          |
              +-----------+-----------+
              | TodoManager state     |
              | [ ] task A            |
              | [>] task B  <- doing  |
              | [x] task C            |
              +-----------------------+
                          |
              if rounds_since_todo >= 3:
                inject <reminder> into tool_result

How It Works

  1. TodoManager stores items with statuses. Only one item can be in_progress at a time.
class TodoManager:
    def update(self, items: list) -> str:
        validated, in_progress_count = [], 0
        for item in items:
            status = item.get("status", "pending")
            if status == "in_progress":
                in_progress_count += 1
            validated.append({"id": item["id"], "text": item["text"],
                              "status": status})
        if in_progress_count > 1:
            raise ValueError("Only one task can be in_progress")
        self.items = validated
        return self.render()
  1. The todo tool goes into the dispatch map like any other tool.
TOOL_HANDLERS = {
    # ...base tools...
    "todo": lambda **kw: TODO.update(kw["items"]),
}
  1. A nag reminder injects a nudge if the model goes 3+ rounds without calling todo.
if rounds_since_todo >= 3 and messages:
    last = messages[-1]
    if last["role"] == "user" and isinstance(last.get("content"), list):
        last["content"].insert(0, {
            "type": "text",
            "text": "<reminder>Update your todos.</reminder>",
        })

The β€œone in_progress at a time” constraint forces sequential focus. The nag reminder creates accountability.

What Changed From Tool Use

ComponentBefore (Tool Use)After (TodoWrite)
Tools45 (+todo)
PlanningNoneTodoManager with statuses
Nag injectionNone<reminder> after 3 rounds
Agent loopSimple dispatch+ rounds_since_todo counter

Key Takeaway

Planning is not optional for multi-step work. The TodoWrite pattern gives the model a structured way to track its own progress, with the harness enforcing accountability through nag reminders. The loop barely changes β€” one new tool, one counter, one injection point.

Interactive Code Walkthrough

The TodoManager Class
1class TodoManager:
2 def update(self, items: list) -> str:
3 validated, in_progress_count = [], 0
4 for item in items:
5 status = item.get("status", "pending")
6 if status == "in_progress":
7 in_progress_count += 1
8 validated.append({"id": item["id"], "text": item["text"],
9 "status": status})
10 if in_progress_count > 1:
11 raise ValueError("Only one task can be in_progress")
12 self.items = validated
13 return self.render()
14 
15if rounds_since_todo >= 3 and messages:
16 last = messages[-1]
17 if last["role"] == "user" and isinstance(last.get("content"), list):
18 last["content"].insert(0, {
19 "type": "text",
20 "text": "<reminder>Update your todos.</reminder>",
21 })
22 
TodoManager is a simple class. The update() method is the only write operation β€” the model calls it with the full list of items every time it wants to change anything.
Step 1 of 5
πŸ§ͺ Try it yourself
πŸ”₯ Warm-up ~5 min

Why does the TodoManager enforce only one task as in_progress at a time? What would go wrong if the agent could start multiple tasks simultaneously?

Hint

Think about context switching, partial completions, and how the nag reminder works.

πŸ”¨ Build ~20 min

Create a multi-step task and watch the agent make a todo list. Then add a priority field to tasks (high/medium/low) and modify the nag reminder to show the highest-priority pending task.

Hint

Sort pending tasks by priority before picking which one to nag about.

πŸš€ Stretch ~45 min

Add task time tracking: record when each task moves to in_progress and completed. Generate a summary at the end showing time spent per task and total session time.

Hint

Store timestamps in the task dict and compute durations in a summary() method.

Found a mistake? Report it β†’