The Problem

On multi-step tasks, the model loses track. It repeats work, skips steps, or wanders off. Long conversations make this worse — the system prompt fades as tool results fill the context. A 10-step refactoring might complete steps 1-3, then the model starts improvising because it forgot steps 4-10.

The Solution

+--------+      +-------+      +---------+
|  User  | ---> |  LLM  | ---> | Tools   |
| prompt |      |       |      | + todo  |
+--------+      +---+---+      +----+----+
                    ^                |
                    |   tool_result  |
                    +----------------+
                          |
              +-----------+-----------+
              | TodoManager state     |
              | [ ] task A            |
              | [>] task B  <- doing  |
              | [x] task C            |
              +-----------------------+
                          |
              if rounds_since_todo >= 3:
                inject <reminder> into tool_result

How It Works

TodoManager stores items with statuses. Only one item can be in_progress at a time.

class TodoManager:
    def update(self, items: list) -> str:
        validated, in_progress_count = [], 0
        for item in items:
            status = item.get("status", "pending")
            if status == "in_progress":
                in_progress_count += 1
            validated.append({"id": item["id"], "text": item["text"],
                              "status": status})
        if in_progress_count > 1:
            raise ValueError("Only one task can be in_progress")
        self.items = validated
        return self.render()

The todo tool goes into the dispatch map like any other tool.

TOOL_HANDLERS = {
    # ...base tools...
    "todo": lambda **kw: TODO.update(kw["items"]),
}

A nag reminder injects a nudge if the model goes 3+ rounds without calling todo.

if rounds_since_todo >= 3 and messages:
    last = messages[-1]
    if last["role"] == "user" and isinstance(last.get("content"), list):
        last["content"].insert(0, {
            "type": "text",
            "text": "<reminder>Update your todos.</reminder>",
        })

The “one in_progress at a time” constraint forces sequential focus. The nag reminder creates accountability.

What Changed From Tool Use

Component	Before (Tool Use)	After (TodoWrite)
Tools	4	5 (+todo)
Planning	None	TodoManager with statuses
Nag injection	None	`<reminder>` after 3 rounds
Agent loop	Simple dispatch	+ rounds_since_todo counter

Key Takeaway

Planning is not optional for multi-step work. The TodoWrite pattern gives the model a structured way to track its own progress, with the harness enforcing accountability through nag reminders. The loop barely changes — one new tool, one counter, one injection point.

Interactive Code Walkthrough

The TodoManager Class

1class TodoManager:
2    def update(self, items: list) -> str:
3        validated, in_progress_count = [], 0
4        for item in items:
5            status = item.get("status", "pending")
6            if status == "in_progress":
7                in_progress_count += 1
8            validated.append({"id": item["id"], "text": item["text"],
9                              "status": status})
10        if in_progress_count > 1:
11            raise ValueError("Only one task can be in_progress")
12        self.items = validated
13        return self.render()
14 
15if rounds_since_todo >= 3 and messages:
16    last = messages[-1]
17    if last["role"] == "user" and isinstance(last.get("content"), list):
18        last["content"].insert(0, {
19            "type": "text",
20            "text": "<reminder>Update your todos.</reminder>",
21        })
22

TodoManager is a simple class. The update() method is the only write operation — the model calls it with the full list of items every time it wants to change anything.

Step 1 of 5

3. TodoWrite

What is a TodoManager?

Why only one task in_progress at a time?

What's a nag reminder?

How is this different from a regular checklist?