5. Skills

"Load knowledge when you need it, not upfront"

15 min read
πŸ’‘New to this?

What is a skill in this context?

A markdown file (SKILL.md) containing domain-specific instructions -- like a git workflow guide or a code review checklist. The agent loads it on demand rather than having all knowledge stuffed into the system prompt.

What's the two-layer approach?

Layer 1: short skill descriptions in the system prompt (cheap, always visible). Layer 2: full skill content loaded via tool_result when the model requests it (expensive, on-demand). This saves tokens.

Why not put everything in the system prompt?

10 skills at 2000 tokens each = 20,000 tokens of instructions, most irrelevant to any given task. The two-layer approach costs ~100 tokens per skill in the system prompt. Full content loads only when needed.

How does the model know which skill to load?

The system prompt lists available skill names with short descriptions. The model reads those and decides which skill is relevant to the current task, then calls load_skill('name') to get the full instructions.

The Problem

You want the agent to follow domain-specific workflows: git conventions, testing patterns, code review checklists. Putting everything in the system prompt wastes tokens on unused skills. 10 skills at 2000 tokens each = 20,000 tokens, most of which are irrelevant to any given task.

The Solution

System prompt (Layer 1 -- always present):
+--------------------------------------+
| You are a coding agent.              |
| Skills available:                    |
|   - git: Git workflow helpers        |  ~100 tokens/skill
|   - test: Testing best practices     |
+--------------------------------------+

When model calls load_skill("git"):
+--------------------------------------+
| tool_result (Layer 2 -- on demand):  |
| <skill name="git">                   |
|   Full git workflow instructions...  |  ~2000 tokens
|   Step 1: ...                        |
| </skill>                             |
+--------------------------------------+

Layer 1: skill names in system prompt (cheap). Layer 2: full body via tool_result (on demand).

How It Works

  1. Each skill is a directory containing a SKILL.md with YAML frontmatter.
skills/
  pdf/
    SKILL.md       # ---\n name: pdf\n description: Process PDF files\n ---\n ...
  code-review/
    SKILL.md       # ---\n name: code-review\n description: Review code\n ---\n ...
  1. SkillLoader scans for SKILL.md files, uses the directory name as the skill identifier.
class SkillLoader:
    def __init__(self, skills_dir: Path):
        self.skills = {}
        for f in sorted(skills_dir.rglob("SKILL.md")):
            text = f.read_text()
            meta, body = self._parse_frontmatter(text)
            name = meta.get("name", f.parent.name)
            self.skills[name] = {"meta": meta, "body": body}

    def get_descriptions(self) -> str:
        lines = []
        for name, skill in self.skills.items():
            desc = skill["meta"].get("description", "")
            lines.append(f"  - {name}: {desc}")
        return "\n".join(lines)

    def get_content(self, name: str) -> str:
        skill = self.skills.get(name)
        if not skill:
            return f"Error: Unknown skill '{name}'."
        return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
  1. Layer 1 goes into the system prompt. Layer 2 is just another tool handler.
SYSTEM = f"""You are a coding agent at {WORKDIR}.
Skills available:
{SKILL_LOADER.get_descriptions()}"""

TOOL_HANDLERS = {
    # ...base tools...
    "load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
}

The model learns what skills exist (cheap) and loads them when relevant (expensive).

What Changed From Subagents

ComponentBefore (Subagents)After (Skills)
Tools5 (base + task)5 (base + load_skill)
System promptStatic string+ skill descriptions
KnowledgeNoneskills/*/SKILL.md files
InjectionNoneTwo-layer (system + result)

Key Takeaway

On-demand knowledge loading is a token optimization pattern. Instead of front-loading all instructions, you expose a menu (cheap) and load full content (expensive) only when the model decides it’s relevant. This pattern scales to hundreds of skills without bloating every conversation.

Interactive Code Walkthrough

The Skill Loading Mechanism
1class SkillLoader:
2 def __init__(self, skills_dir: Path):
3 self.skills = {}
4 for f in sorted(skills_dir.rglob("SKILL.md")):
5 text = f.read_text()
6 meta, body = self._parse_frontmatter(text)
7 name = meta.get("name", f.parent.name)
8 self.skills[name] = {"meta": meta, "body": body}
9 
10 def get_descriptions(self) -> str:
11 lines = []
12 for name, skill in self.skills.items():
13 desc = skill["meta"].get("description", "")
14 lines.append(f" - {name}: {desc}")
15 return "\n".join(lines)
16 
17 def get_content(self, name: str) -> str:
18 skill = self.skills.get(name)
19 if not skill:
20 return f"Error: Unknown skill '{name}'."
21 return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
22 
23TOOL_HANDLERS = {
24 "load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
25}
26 
__init__ scans the skills directory recursively for any SKILL.md file. It parses YAML frontmatter to get metadata (name, description) and stores the body separately. The directory name is used as fallback if 'name' is missing from frontmatter.
Step 1 of 4
πŸ§ͺ Try it yourself
πŸ”₯ Warm-up ~5 min

Why does the skill system use two layers (cheap description + expensive full content) instead of loading everything upfront? Calculate the token cost difference for 20 skills at 500 tokens each.

Hint

20 skills Γ— 500 tokens = 10,000 tokens added to every API call, even when you only need 1 skill.

πŸ”¨ Build ~20 min

Write your own SKILL.md file for a skill like 'git-workflow' or 'docker-deploy'. Include a description header and full content. Test loading it with the SkillLoader.

Hint

Follow the format: first line is the description, rest is the full content loaded on demand.

πŸš€ Stretch ~45 min

Add skill versioning: each SKILL.md can have a version: 1.2 header. The loader caches loaded skills and only reloads when the version changes. Track cache hit/miss rates.

Hint

Use a dict as cache with skill name as key and (version, content) as value.

Found a mistake? Report it β†’