June 21, 202610 min read

Cleaning Up 173 Claude Code Skills: From 9 Good to 182 Excellent

After a year of accumulating Claude Code skills, only 9 of 173 reliably activated on natural language. Here's the four-part pattern I lifted from anthropics/launch-your-agent, how I applied it to every skill in one session, and the audit tooling that keeps the catalog clean.

Claude CodeDeveloper ToolsAI EngineeringSolo Dev

On this page

The starting state
The pattern
Applying it to 173 skills in one session
Step 1 — Audit the existing state
Step 2 — Batch-rewrite the worst tier
Step 3 — Append NOT FOR lines via script
Step 4 — Archive the obvious redundancies
Step 5 — Resolve trigger collisions
Step 6 — Handle the YAML edge case
Step 7 — Build the skills I didn't have
The final state
Keeping it clean
Why this works (the meta-lesson)
The cluster map
What you can steal

I have 21 active Android apps, a few iOS ports, some web stuff, and a stack of CLI tools. Across all of them, Claude Code skills are the way I encode "how I want to work" so I don't repeat myself.

After a year, I had 173 skills. About 100 of them only fired when I typed the slash command. The rest needed me to remember exactly which phrase I'd used in the trigger list. That's not how skills are supposed to work — Claude should match your natural-language intent against the catalog and pick the right one. So I sat down, audited everything, and lifted a pattern from a public Anthropic reference repo that fixed it.

This is the writeup. Real metrics, real before/after, the four-part pattern, and the audit tooling I now run weekly.

The starting state

code

Total skills:    173
Excellent (4/4): 9
Good (3/4):      0
Needs work:      49
Poor:            28
Missing NOT FOR: 76
Trigger collisions: 15 phrases claimed by 11 skill pairs

Activation was unreliable. "Fix this crash" might fire

code

debug-sudarshan

, or

code

bug-hunter

, or

code

systematic-debugging

, or

code

silent-failure-hunter

, or just plain "I'll help you debug" with no skill at all. Five skills competed for the same prompt with no

code

NOT FOR

boundaries telling Claude when not to pick each one.

The pattern

I'd been looking for a fix when anthropics/launch-your-agent appeared on GitHub. It's a reference implementation of a Claude Code skill that walks a founder through launching a Claude Managed Agent. The skill's own descriptions are written to a very specific shape, and the shape works. So I extracted the pattern:

yaml

description: "[Concrete domain sentence — what it does, with specifics like
file paths, library versions, canonical IDs.] Use when [explicit activation
condition]. Triggers on \"phrase the user actually types\", \"another phrase\",
\"/slash-command\". NOT FOR [neighbor case] (use `neighbor-skill`)."

Four parts:

One concrete sentence with real specifics. Not "Helps with X." — instead "Generate a privacy policy HTML page and publish to
code
```
https://sudarshanchaudhari.github.io/[appname]-privacy-policy/
```
covering data collected, third-party SDKs, retention, deletion, contact, and GDPR alignment."
code
Use when …
— explicit activation condition. The literal phrase. Variants like
code
```
Use after
```
,
code
```
Use before
```
,
code
```
Use immediately when
```
also work.
code
Triggers on "…"
— comma-separated phrases the user actually types. Not abstract topics. Real surface forms.
code
NOT FOR … (use \
neighbor-skill`)` — every neighbor that might collide, with a redirect.

That fourth part is the magic. It teaches Claude both "fire on these phrases" and "don't fire when the neighbor is the right call." Most existing skill descriptions in the wild only do part 3. They miss the negative half of the activation rule.

Look at the gold-tier skills in the launch-your-agent repo —

code

RedTeam

code

FirstPrinciples

code

SystemsThinking

code

BitterPillEngineering

— they all hit this shape. They also stand out as the most reliably-activating skills in any Claude Code setup.

Applying it to 173 skills in one session

I'm not going to write 173 descriptions by hand. Here's the actual process:

Step 1 — Audit the existing state

I wrote a scorer that extracts every skill description, parses YAML properly (not regex — quoted multi-line YAML scalars trip naive parsers), and checks each of the four parts.

python

def score_desc(desc):
    score = 0
    present = []
    dl = desc.lower()
    # Part 1: concrete sentence
    if len(desc) >= 100 and ('.' in desc[:400] or len(desc) >= 200):
        score += 1; present.append('concrete-sentence')
    # Part 2: Use when (or variant)
    if re.search(r'\buse\s+(when|after|before|on|for|at|as|immediately|whenever)\b', dl):
        score += 1; present.append('use-when')
    # Part 3: quoted trigger phrases (or comma-list after Triggers on:)
    if (re.search(r'triggers?\s+on[:\s].*"[^"]+"', desc, re.IGNORECASE)
        or re.search(r'use when\s+[a-z][^.]{20,}', desc, re.IGNORECASE)):
        score += 1; present.append('triggers')
    # Part 4: NOT FOR with explicit neighbor redirect
    if 'not for' in dl and ('use `' in dl or 'use /' in dl):
        score += 1; present.append('not-for-boundary')
    return score, present

Run it. Get a per-skill 0-4 score plus a list of which parts are missing. Group by tier.

Step 2 — Batch-rewrite the worst tier

The 28 "Poor" tier descriptions all looked alike: "Skill X for SudarshanTechLabs." Three words. No triggers. No neighbors.

I rewrote each one by hand, but quickly — read the body, extract the actual capability, write a description that hits the four-part pattern. Each one took 30-60 seconds.

Step 3 — Append NOT FOR lines via script

The 76 skills missing only the

code

NOT FOR

boundary line were a perfect script target. For each one, I knew its cluster (debug, planning, audit, etc.) and which neighbors it would collide with. I wrote a Python script that took a

code

{skill: not_for_line}

map and appended each one to its description in one pass.

python

NOT_FOR = {
    'adr': 'NOT FOR runtime decision-making (just decide and proceed) or capturing one-off learnings (use `capture-learning`).',
    'agent-workflow': 'NOT FOR running existing agents (use `dispatching-parallel-agents`) or writing prompts (use `prompt-engineer`).',
    # ... 74 more
}

76 skills updated in 4 seconds. The map itself took 20 minutes to write — I had to make a judgment call per skill about which neighbors mattered.

Step 4 — Archive the obvious redundancies

Eight skills were superseded but never deleted. Things like

code

privacy-policy-mega

(kept around after

code

privacy-policy-gen

code

data-privacy-compliance

replaced it),

code

seo-blog-writer

(overlaps

code

new-blog

code

senior-code-reviewer-mega

(covered by

code

review-feature

+ language-specific reviewers).

I moved them to

code

~/.claude/skills/_archived/

rather than deleting outright — 30-day recoverable window in case I missed something. Wrote a

code

_archived/README.md

documenting why each was retired and what to use instead.

Step 5 — Resolve trigger collisions

The audit script also detects when two or more skills claim the same trigger phrase. 15 collisions surfaced:

code
```
"karpathy check"
```
claimed by both
code
```
karpathy-check
```
(audit) and
code
```
karpathy-coder
```
(write-time enforcement)
code
```
"check all apps"
```
claimed by both
code
```
find-anomalies
```
and
code
```
cross-app-parity
```
code
```
"swiftui"
```
claimed by both
code
```
ios-macos-sudarshan
```
(parent) and
code
```
swiftui-patterns
```
(sub-skill)
... 12 more

For each, I picked the skill that should primarily own the phrase, removed it from the loser, and made sure both skills had the right

code

NOT FOR

boundary pointing at each other. Another 50-line script.

Step 6 — Handle the YAML edge case

This is the part that bit me. Many older skills used YAML block-scalar form:

yaml

description: |
  This is a long description
  that spans multiple lines
  and looks tidy.

My first injection script naively prepended

code

Use when X.

after the

code

, producing:

yaml

description: |. Use when X.
  Original first line
  ...

Which is invalid YAML (

code

requires nothing after it on that line). It broke 22 files before I caught it. Lesson: when munging YAML, parse with a real library first, manipulate the parsed value, then re-emit. Don't regex.

Fortunately my

code

~/.claude/

is git-tracked.

code

git checkout HEAD -- skills/<broken>/SKILL.md

for each of the 22, then a smarter v2 script that handled all three description forms (single-line, quoted multi-line, block-scalar).

Step 7 — Build the skills I didn't have

Audit also surfaced gaps — capabilities I'd reach for but had no skill for. I scaffolded 12 new ones using the same four-part pattern:

code
```
incident-postmortem
```
— blameless RCA + GitHub issue + CHANGELOG entry
code
```
keystore-rotate
```
— Android Play App Signing upload-key rotation
code
```
api-changelog
```
— Keep-a-Changelog diff between git refs
code
```
screenshot-set
```
— Play Store + App Store screenshot capture
code
```
cma-launch
```
— port of
code
```
launch-your-agent
```
's flow to my stack
code
```
lane-resume
```
— ADE lane pickup protocol
code
```
cross-skill-test
```
— simulate which skill fires for a prompt
code
```
skill-promote
```
— promote drafts from
code
```
auto-skill-reviewer.py
```
code
```
play-listing-screenshot-compare
```
— store listing drift detector
code
```
secrets-scan-deep
```
— portfolio-wide TruffleHog sweep
code
```
cma-eval-suite
```
— eval regression check for CMA agents
code
```
agent-handoff
```
— clean handoff to another session
code
```
repo-decommission
```
— end-of-life wrapper
code
```
firebase-rotate
```
— Firebase credential rotation per type
code
```
store-rejection-fixer
```
— triage + resubmission checklist
code
```
ai-coding-rule-update
```
— bump canonical versions in rules + propagate
code
```
voice-to-spec
```
— voice-note → idea → spec pipeline

The final state

code

Total skills:    182  (170 original + 12 new, 8 archived)
Excellent (4/4): 182  ↑ from 9
Good (3/4):      0    ↓ from 73
Needs work:      0    ↓ from 49
Poor:            0    ↓ from 28
Trigger collisions: 0  ↓ from 15
YAML parse errors:  0

100% of skills at gold-tier. Every collision resolved. Every redundancy archived.

Keeping it clean

A snapshot in time means nothing if it rots in a week. Three things now keep the catalog clean:

1. A PreToolUse hook that scores new SKILL.md edits. Registered in

code

settings.json

under

code

Write|Edit|MultiEdit

. Warns if a SKILL.md edit drops the description below 3/4. Blocks completely if the description is under 30 chars (un-activatable). The hook lives at

code

~/.claude/hooks/skill-quality-guard.py

2. A weekly cron that re-runs the audit. Sundays at 9am, output goes to

code

~/.claude/logs/skill-audit-v2.log

. If anything regresses, I know within a week.

3. A

code

cross-skill-test

tool I can invoke manually. Given a user prompt, it scores every skill's trigger overlap and shows me the top matches with a collision warning if the top two scores are within 20%. Useful before merging any new skill.

bash

$ python3 ~/.claude/skills/cross-skill-test/test.py "fix this crash"
PROMPT: 'fix this crash'

TOP 5 MATCHES:
  1. debug-sudarshan          score=15  matched: 'crash', 'fix'
  2. bug-hunter                score=10  matched: 'crash'
  3. systematic-debugging      score= 8  matched: 'crash'

✓ Clear winner — gap of 5 pts to runner-up

Plus a visual cluster map (

code

~/.claude/skills/skill-cluster-map.html

) — D3 force-directed graph where every NOT-FOR edge becomes a graph link. Reveals the cluster structure of the catalog at a glance: the debug cluster, the planning pipeline, the release flow, the ADE cluster. 287 edges across 182 nodes. Useful before adding a new skill — you can spot if your idea overlaps an existing cluster.

Why this works (the meta-lesson)

Skill descriptions are prompts. Claude reads them at activation time and picks the best match. Like any prompt, specificity wins:

Concrete sentences beat abstract topics. "Generate a privacy policy HTML page and publish to GitHub Pages at the standard URL" beats "Helps with privacy policy."
Triggers users actually type beat synonyms. "ANR" beats "application not responding." "Compose recomposition" beats "UI performance issues."
Boundaries beat hope. A
code
```
NOT FOR
```
line redirecting to a neighbor teaches Claude both "fire on me" and "fire on them when …". The neighbor list itself becomes documentation for future-you.

Looking back at the 9 skills already at 4/4 before this cleanup, they were all the ones I'd written most recently — after I'd started internalizing what made skills reliable. The 164 others were just my prior shapes accumulating. There was no malicious intent, just drift.

This is why agents/skills need maintenance. Drift compounds. Every six months I'll run the same audit and bring whatever's slipped back into shape.

The cluster map

If you want to see the result visually, my generated cluster map looks like this when filtered to the

code

release

cluster:

code

release-sudarshan ──→ ship-check (NOT FOR pre-release verification only)
release-sudarshan ──→ document-release (NOT FOR doc sync)
release-sudarshan ──→ playstore-sudarshan
release-sudarshan ──→ store-listing (NOT FOR copy generation)
release-sudarshan ──→ changelog-gen (NOT FOR release notes)
ship-check        ──→ release-sudarshan
document-release  ──→ readme-gen
document-release  ──→ changelog-gen
changelog-gen     ──→ new-blog
changelog-gen     ──→ document-release
playstore-sudarshan ──→ store-listing
store-listing     ──→ changelog-gen
store-listing     ──→ new-blog
screenshot-set    ──→ playstore-sudarshan
screenshot-set    ──→ store-listing
store-rejection-fixer ──→ ship-check
store-rejection-fixer ──→ playstore-sudarshan
incident-postmortem ──→ release-sudarshan
keystore-rotate   ──→ release-sudarshan

Each edge is "if a user prompt could fire either of us, this one wins." Cluster boundaries become visible as the graph layout settles.

What you can steal

If your own Claude Code setup has accumulated skills, the cheapest thing to do is run an audit against the four-part pattern. The audit script I wrote is sitting in

code

~/.claude/skills/skill-audit/audit.py

— about 150 lines of Python, runs in under a second across 182 skills, prints a prioritized report.

I might pull it out into a standalone repo if there's interest. The pattern itself you can lift right out of anthropics/launch-your-agent — read their SKILL.md files (RedTeam, FirstPrinciples, SystemsThinking, BitterPillEngineering, IterativeDepth, ExtractWisdom) and you'll see the shape.

A clean skill catalog isn't a one-shot project. It's a hygiene practice. Audit. Score. Archive. Test for collisions. Add boundaries. Then do it again next quarter.

The audit pattern was extracted on 2026-06-21 during a six-hour session that took my Claude Code catalog from 9 → 182 Excellent. The pattern itself is in

code

anthropics/launch-your-agent

—
code
RedTeam
,
code
FirstPrinciples
,
code
SystemsThinking
,
code
BitterPillEngineering
,
code
IterativeDepth
, and
code
ExtractWisdom
are the reference skills to study. I may open-source the audit toolkit separately; reach out if useful.

Sudarshan Chaudhari

AI Systems Builder / Product Engineer

Bangkok, Thailand

Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.

GitHub Play Store

Stay updated

Get new posts on Android, Kotlin, and solo dev straight to your inbox.

RSS Feed Telegram

Building a Claude Code Plugin from Scratch: DroidForge as a Case Study

7 min read

Claude CodeAI

Automating Google Play Store Listings with Claude Code

4 min read

AndroidPlay Store

What 13 Years in QA Taught Me About Shipping Software

4 min read

QAEngineering

Building something? Available for Android dev and QA consulting.

Work with me

Comments — powered by Giscus

Apps tagged with this

GitGetAppVault

KMP credential vault and release state manager — Kotlin Multiplatform shared module for signing configs, version state, and Play Store metadata across 22+ Android projects.

Read

PushyUncommit

Multi-repo git assistant — scans all local repositories for uncommitted changes, analyzes diffs, and generates structured atomic commit messages. Android companion included.

Read