Cleaning Up 173 Claude Code Skills: From 9 Good to 182 Excellent
After a year of accumulating Claude Code skills, only 9 of 173 reliably activated on natural language. Here's the four-part pattern I lifted from anthropics/launch-your-agent, how I applied it to every skill in one session, and the audit tooling that keeps the catalog clean.
On this page
- The starting state
- The pattern
- Applying it to 173 skills in one session
- Step 1 — Audit the existing state
- Step 2 — Batch-rewrite the worst tier
- Step 3 — Append NOT FOR lines via script
- Step 4 — Archive the obvious redundancies
- Step 5 — Resolve trigger collisions
- Step 6 — Handle the YAML edge case
- Step 7 — Build the skills I didn't have
- The final state
- Keeping it clean
- Why this works (the meta-lesson)
- The cluster map
- What you can steal
I have 21 active Android apps, a few iOS ports, some web stuff, and a stack of CLI tools. Across all of them, Claude Code skills are the way I encode "how I want to work" so I don't repeat myself.
After a year, I had 173 skills. About 100 of them only fired when I typed the slash command. The rest needed me to remember exactly which phrase I'd used in the trigger list. That's not how skills are supposed to work — Claude should match your natural-language intent against the catalog and pick the right one. So I sat down, audited everything, and lifted a pattern from a public Anthropic reference repo that fixed it.
This is the writeup. Real metrics, real before/after, the four-part pattern, and the audit tooling I now run weekly.
The starting state
Total skills: 173
Excellent (4/4): 9
Good (3/4): 0
Needs work: 49
Poor: 28
Missing NOT FOR: 76
Trigger collisions: 15 phrases claimed by 11 skill pairsActivation was unreliable. "Fix this crash" might fire
debug-sudarshanbug-huntersystematic-debuggingsilent-failure-hunterNOT FORThe pattern
I'd been looking for a fix when anthropics/launch-your-agent appeared on GitHub. It's a reference implementation of a Claude Code skill that walks a founder through launching a Claude Managed Agent. The skill's own descriptions are written to a very specific shape, and the shape works. So I extracted the pattern:
description: "[Concrete domain sentence — what it does, with specifics like
file paths, library versions, canonical IDs.] Use when [explicit activation
condition]. Triggers on \"phrase the user actually types\", \"another phrase\",
\"/slash-command\". NOT FOR [neighbor case] (use `neighbor-skill`)."Four parts:
- One concrete sentence with real specifics. Not "Helps with X." — instead "Generate a privacy policy HTML page and publish to covering data collected, third-party SDKs, retention, deletion, contact, and GDPR alignment."code
https://sudarshanchaudhari.github.io/[appname]-privacy-policy/ - — explicit activation condition. The literal phrase. Variants likecode
Use when …,codeUse after,codeUse beforealso work.codeUse immediately when - — comma-separated phrases the user actually types. Not abstract topics. Real surface forms.code
Triggers on "…" - neighbor-skill`)` — every neighbor that might collide, with a redirect.code
NOT FOR … (use \
That fourth part is the magic. It teaches Claude both "fire on these phrases" and "don't fire when the neighbor is the right call." Most existing skill descriptions in the wild only do part 3. They miss the negative half of the activation rule.
Look at the gold-tier skills in the launch-your-agent repo —
RedTeamFirstPrinciplesSystemsThinkingBitterPillEngineeringApplying it to 173 skills in one session
I'm not going to write 173 descriptions by hand. Here's the actual process:
Step 1 — Audit the existing state
I wrote a scorer that extracts every skill description, parses YAML properly (not regex — quoted multi-line YAML scalars trip naive parsers), and checks each of the four parts.
def score_desc(desc):
score = 0
present = []
dl = desc.lower()
# Part 1: concrete sentence
if len(desc) >= 100 and ('.' in desc[:400] or len(desc) >= 200):
score += 1; present.append('concrete-sentence')
# Part 2: Use when (or variant)
if re.search(r'\buse\s+(when|after|before|on|for|at|as|immediately|whenever)\b', dl):
score += 1; present.append('use-when')
# Part 3: quoted trigger phrases (or comma-list after Triggers on:)
if (re.search(r'triggers?\s+on[:\s].*"[^"]+"', desc, re.IGNORECASE)
or re.search(r'use when\s+[a-z][^.]{20,}', desc, re.IGNORECASE)):
score += 1; present.append('triggers')
# Part 4: NOT FOR with explicit neighbor redirect
if 'not for' in dl and ('use `' in dl or 'use /' in dl):
score += 1; present.append('not-for-boundary')
return score, presentRun it. Get a per-skill 0-4 score plus a list of which parts are missing. Group by tier.
Step 2 — Batch-rewrite the worst tier
The 28 "Poor" tier descriptions all looked alike: "Skill X for SudarshanTechLabs." Three words. No triggers. No neighbors.
I rewrote each one by hand, but quickly — read the body, extract the actual capability, write a description that hits the four-part pattern. Each one took 30-60 seconds.
Step 3 — Append NOT FOR lines via script
The 76 skills missing only the
NOT FOR{skill: not_for_line}NOT_FOR = {
'adr': 'NOT FOR runtime decision-making (just decide and proceed) or capturing one-off learnings (use `capture-learning`).',
'agent-workflow': 'NOT FOR running existing agents (use `dispatching-parallel-agents`) or writing prompts (use `prompt-engineer`).',
# ... 74 more
}76 skills updated in 4 seconds. The map itself took 20 minutes to write — I had to make a judgment call per skill about which neighbors mattered.
Step 4 — Archive the obvious redundancies
Eight skills were superseded but never deleted. Things like
privacy-policy-megaprivacy-policy-gendata-privacy-complianceseo-blog-writernew-blogsenior-code-reviewer-megareview-featureI moved them to
~/.claude/skills/_archived/_archived/README.mdStep 5 — Resolve trigger collisions
The audit script also detects when two or more skills claim the same trigger phrase. 15 collisions surfaced:
- claimed by bothcode
"karpathy check"(audit) andcodekarpathy-check(write-time enforcement)codekarpathy-coder - claimed by bothcode
"check all apps"andcodefind-anomaliescodecross-app-parity - claimed by bothcode
"swiftui"(parent) andcodeios-macos-sudarshan(sub-skill)codeswiftui-patterns - ... 12 more
For each, I picked the skill that should primarily own the phrase, removed it from the loser, and made sure both skills had the right
NOT FORStep 6 — Handle the YAML edge case
This is the part that bit me. Many older skills used YAML block-scalar form:
description: |
This is a long description
that spans multiple lines
and looks tidy.My first injection script naively prepended
Use when X.|description: |. Use when X.
Original first line
...Which is invalid YAML (
|Fortunately my
~/.claude/git checkout HEAD -- skills/<broken>/SKILL.mdStep 7 — Build the skills I didn't have
Audit also surfaced gaps — capabilities I'd reach for but had no skill for. I scaffolded 12 new ones using the same four-part pattern:
- — blameless RCA + GitHub issue + CHANGELOG entrycode
incident-postmortem - — Android Play App Signing upload-key rotationcode
keystore-rotate - — Keep-a-Changelog diff between git refscode
api-changelog - — Play Store + App Store screenshot capturecode
screenshot-set - — port ofcode
cma-launch's flow to my stackcodelaunch-your-agent - — ADE lane pickup protocolcode
lane-resume - — simulate which skill fires for a promptcode
cross-skill-test - — promote drafts fromcode
skill-promotecodeauto-skill-reviewer.py - — store listing drift detectorcode
play-listing-screenshot-compare - — portfolio-wide TruffleHog sweepcode
secrets-scan-deep - — eval regression check for CMA agentscode
cma-eval-suite - — clean handoff to another sessioncode
agent-handoff - — end-of-life wrappercode
repo-decommission - — Firebase credential rotation per typecode
firebase-rotate - — triage + resubmission checklistcode
store-rejection-fixer - — bump canonical versions in rules + propagatecode
ai-coding-rule-update - — voice-note → idea → spec pipelinecode
voice-to-spec
The final state
Total skills: 182 (170 original + 12 new, 8 archived)
Excellent (4/4): 182 ↑ from 9
Good (3/4): 0 ↓ from 73
Needs work: 0 ↓ from 49
Poor: 0 ↓ from 28
Trigger collisions: 0 ↓ from 15
YAML parse errors: 0100% of skills at gold-tier. Every collision resolved. Every redundancy archived.
Keeping it clean
A snapshot in time means nothing if it rots in a week. Three things now keep the catalog clean:
1. A PreToolUse hook that scores new SKILL.md edits. Registered in
settings.jsonWrite|Edit|MultiEdit~/.claude/hooks/skill-quality-guard.py2. A weekly cron that re-runs the audit. Sundays at 9am, output goes to
~/.claude/logs/skill-audit-v2.log3. A cross-skill-test
$ python3 ~/.claude/skills/cross-skill-test/test.py "fix this crash"
PROMPT: 'fix this crash'
TOP 5 MATCHES:
1. debug-sudarshan score=15 matched: 'crash', 'fix'
2. bug-hunter score=10 matched: 'crash'
3. systematic-debugging score= 8 matched: 'crash'
✓ Clear winner — gap of 5 pts to runner-upPlus a visual cluster map (
~/.claude/skills/skill-cluster-map.htmlWhy this works (the meta-lesson)
Skill descriptions are prompts. Claude reads them at activation time and picks the best match. Like any prompt, specificity wins:
- Concrete sentences beat abstract topics. "Generate a privacy policy HTML page and publish to GitHub Pages at the standard URL" beats "Helps with privacy policy."
- Triggers users actually type beat synonyms. "ANR" beats "application not responding." "Compose recomposition" beats "UI performance issues."
- Boundaries beat hope. A line redirecting to a neighbor teaches Claude both "fire on me" and "fire on them when …". The neighbor list itself becomes documentation for future-you.code
NOT FOR
Looking back at the 9 skills already at 4/4 before this cleanup, they were all the ones I'd written most recently — after I'd started internalizing what made skills reliable. The 164 others were just my prior shapes accumulating. There was no malicious intent, just drift.
This is why agents/skills need maintenance. Drift compounds. Every six months I'll run the same audit and bring whatever's slipped back into shape.
The cluster map
If you want to see the result visually, my generated cluster map looks like this when filtered to the
releaserelease-sudarshan ──→ ship-check (NOT FOR pre-release verification only)
release-sudarshan ──→ document-release (NOT FOR doc sync)
release-sudarshan ──→ playstore-sudarshan
release-sudarshan ──→ store-listing (NOT FOR copy generation)
release-sudarshan ──→ changelog-gen (NOT FOR release notes)
ship-check ──→ release-sudarshan
document-release ──→ readme-gen
document-release ──→ changelog-gen
changelog-gen ──→ new-blog
changelog-gen ──→ document-release
playstore-sudarshan ──→ store-listing
store-listing ──→ changelog-gen
store-listing ──→ new-blog
screenshot-set ──→ playstore-sudarshan
screenshot-set ──→ store-listing
store-rejection-fixer ──→ ship-check
store-rejection-fixer ──→ playstore-sudarshan
incident-postmortem ──→ release-sudarshan
keystore-rotate ──→ release-sudarshanEach edge is "if a user prompt could fire either of us, this one wins." Cluster boundaries become visible as the graph layout settles.
What you can steal
If your own Claude Code setup has accumulated skills, the cheapest thing to do is run an audit against the four-part pattern. The audit script I wrote is sitting in
~/.claude/skills/skill-audit/audit.pyI might pull it out into a standalone repo if there's interest. The pattern itself you can lift right out of anthropics/launch-your-agent — read their SKILL.md files (RedTeam, FirstPrinciples, SystemsThinking, BitterPillEngineering, IterativeDepth, ExtractWisdom) and you'll see the shape.
A clean skill catalog isn't a one-shot project. It's a hygiene practice. Audit. Score. Archive. Test for collisions. Add boundaries. Then do it again next quarter.
The audit pattern was extracted on 2026-06-21 during a six-hour session that took my Claude Code catalog from 9 → 182 Excellent. The pattern itself is in anthropics/launch-your-agentRedTeamFirstPrinciplesSystemsThinkingBitterPillEngineeringIterativeDepthExtractWisdom
Sudarshan Chaudhari
AI Systems Builder / Product Engineer
Bangkok, Thailand
Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.
Related Posts
Building something? Available for Android dev and QA consulting.
Work with meComments — powered by Giscus
Apps tagged with this
GitGetAppVault
KMP credential vault and release state manager — Kotlin Multiplatform shared module for signing configs, version state, and Play Store metadata across 22+ Android projects.
ReadPushyUncommit
Multi-repo git assistant — scans all local repositories for uncommitted changes, analyzes diffs, and generates structured atomic commit messages. Android companion included.
Read