The Problem

You want the agent to follow domain-specific workflows: git conventions, testing patterns, code review checklists. Putting everything in the system prompt wastes tokens on unused skills. 10 skills at 2000 tokens each = 20,000 tokens, most of which are irrelevant to any given task.

The Solution

System prompt (Layer 1 -- always present):
+--------------------------------------+
| You are a coding agent.              |
| Skills available:                    |
|   - git: Git workflow helpers        |  ~100 tokens/skill
|   - test: Testing best practices     |
+--------------------------------------+

When model calls load_skill("git"):
+--------------------------------------+
| tool_result (Layer 2 -- on demand):  |
| <skill name="git">                   |
|   Full git workflow instructions...  |  ~2000 tokens
|   Step 1: ...                        |
| </skill>                             |
+--------------------------------------+

Layer 1: skill names in system prompt (cheap). Layer 2: full body via tool_result (on demand).

How It Works

Each skill is a directory containing a SKILL.md with YAML frontmatter.

skills/
  pdf/
    SKILL.md       # ---\n name: pdf\n description: Process PDF files\n ---\n ...
  code-review/
    SKILL.md       # ---\n name: code-review\n description: Review code\n ---\n ...

SkillLoader scans for SKILL.md files, uses the directory name as the skill identifier.

class SkillLoader:
    def __init__(self, skills_dir: Path):
        self.skills = {}
        for f in sorted(skills_dir.rglob("SKILL.md")):
            text = f.read_text()
            meta, body = self._parse_frontmatter(text)
            name = meta.get("name", f.parent.name)
            self.skills[name] = {"meta": meta, "body": body}

    def get_descriptions(self) -> str:
        lines = []
        for name, skill in self.skills.items():
            desc = skill["meta"].get("description", "")
            lines.append(f"  - {name}: {desc}")
        return "\n".join(lines)

    def get_content(self, name: str) -> str:
        skill = self.skills.get(name)
        if not skill:
            return f"Error: Unknown skill '{name}'."
        return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"

Layer 1 goes into the system prompt. Layer 2 is just another tool handler.

SYSTEM = f"""You are a coding agent at {WORKDIR}.
Skills available:
{SKILL_LOADER.get_descriptions()}"""

TOOL_HANDLERS = {
    # ...base tools...
    "load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
}

The model learns what skills exist (cheap) and loads them when relevant (expensive).

What Changed From Subagents

Component	Before (Subagents)	After (Skills)
Tools	5 (base + task)	5 (base + load_skill)
System prompt	Static string	+ skill descriptions
Knowledge	None	skills/*/SKILL.md files
Injection	None	Two-layer (system + result)

Key Takeaway

On-demand knowledge loading is a token optimization pattern. Instead of front-loading all instructions, you expose a menu (cheap) and load full content (expensive) only when the model decides it’s relevant. This pattern scales to hundreds of skills without bloating every conversation.

Interactive Code Walkthrough

The Skill Loading Mechanism

1class SkillLoader:
2    def __init__(self, skills_dir: Path):
3        self.skills = {}
4        for f in sorted(skills_dir.rglob("SKILL.md")):
5            text = f.read_text()
6            meta, body = self._parse_frontmatter(text)
7            name = meta.get("name", f.parent.name)
8            self.skills[name] = {"meta": meta, "body": body}
9 
10    def get_descriptions(self) -> str:
11        lines = []
12        for name, skill in self.skills.items():
13            desc = skill["meta"].get("description", "")
14            lines.append(f"  - {name}: {desc}")
15        return "\n".join(lines)
16 
17    def get_content(self, name: str) -> str:
18        skill = self.skills.get(name)
19        if not skill:
20            return f"Error: Unknown skill '{name}'."
21        return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
22 
23TOOL_HANDLERS = {
24    "load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
25}
26

__init__ scans the skills directory recursively for any SKILL.md file. It parses YAML frontmatter to get metadata (name, description) and stores the body separately. The directory name is used as fallback if 'name' is missing from frontmatter.

Step 1 of 4

5. Skills

What is a skill in this context?

What's the two-layer approach?

Why not put everything in the system prompt?

How does the model know which skill to load?