The Code I Write, I Don't Write at All

"Testing shows the presence of bugs, not their absence." — Edsger Dijkstra

Dijkstra said that in 1969, back when the bugs were in the code. In 2026, the bugs aren't in the code. They're in the process that generated it.

I remember the exact moment I understood this. It was 2 AM on a Tuesday — the kind of Tuesday that feels like it should be illegal — and I was staring at a pull request that Claude had written for me. Three hundred lines of TypeScript. Clean imports. Proper typing. Even had error handling, which, if you've spent any time with AI assistants, you know is the equivalent of finding a unicorn in your garage. I approved it. Merged it. Went to bed feeling productive.

The next morning, nothing worked.

Not "a few tests failed" nothing. Not "the linting is upset" nothing. Production-grade, catastrophic, "did we even have a working app yesterday?" nothing. Claude had rewritten a shared utility that six other files depended on, changed the return type from an object to an array (a reasonable refactor in isolation), and never updated the callers. Every import still compiled — TypeScript was happy, the linter was happy, Claude was happy. The runtime was not.

Here's the thing that haunted me: the code itself was good. If a human had written it, I'd have called it a solid refactor. The problem wasn't capability. It wasn't even a bug in the traditional sense. It was a process failure. Nobody told Claude to check the callers. Nobody told it to run the tests. Nobody told it to think about the blast radius of a type change in a shared module. And so it didn't.

This isn't a Claude problem. This is every AI coding assistant, everywhere, right now. GPT, Gemini, Copilot — they all share the same fundamental flaw. They're staggeringly good at generating code and absolutely terrible at engineering it. The capability is there. The discipline is not.

So I stopped writing code. I started writing guardrails.

The Discipline Problem

Let me be specific about what goes wrong, because "AI makes mistakes" is too vague to be useful.

The first time I gave Claude a real project — not a toy, not a tutorial, a production feature with a database and an API and a UI that real humans would touch — it did what any brilliant, overconfident new hire would do. It started coding immediately. No questions about requirements. No exploration of the existing codebase. No tests. Just pure, unbridled enthusiasm channeled directly into a file called feature.tsx that would eventually grow to 847 lines.

I've since catalogued the failure modes. There are five, and they show up in roughly this order.

First, the cowboy coding. You say "build X" and the AI starts typing before the sentence is over. It doesn't ask what X connects to, whether there's an existing pattern for X, or whether X even needs to be built (maybe there's already a library). It just... writes. Fast and confident and wrong.

Second, the test amnesia. I cannot overstate how consistently AI assistants skip tests. Not sometimes. Not when pressed for time. Always, unless specifically told. They'll write a beautiful function, declare it working, and move on. The function is untested. It has edge cases that would make a computer science professor weep. But it looks correct, and for an AI, looking correct is the same as being correct. (It is not.)

Third, the convention blindness. Every project has patterns — how you structure components, where you put utilities, what your API response format looks like. The AI ignores all of it. It'll create a new utility file right next to the one that already does the same thing. It'll invent a new error handling pattern when the codebase has a perfectly good one. Not out of rebellion. Out of ignorance. It simply doesn't know to look.

Fourth — and this is the subtle one — context amnesia. AI assistants operate in a conversation window that has a finite capacity. When it fills up, the system compresses earlier messages into a summary. Summaries are lossy. That architectural decision you made forty minutes ago? The one where you specifically said "use Server Components, not Client Components"? Gone. Compressed. Paraphrased into oblivion. The AI proceeds with a clean conscience and a blank memory to reinvent the same wheel, badly.

Fifth, the never-self-evaluating. No AI assistant I've ever used has spontaneously asked itself: "Wait. Is this actually good? Does this do what was asked? Did I miss anything?" They finish. They declare victory. They move on. The AI is a brilliant intern with amnesia. Every morning, it forgets everything you taught it yesterday.

Before I built my system, I once watched Claude refactor a contact form three times in the same session. Each time it "improved" the previous version. Each time it introduced a new bug. By the third iteration, the form didn't submit at all. Claude was satisfied with its work.

That was the last time I let the AI drive.

The Operating System

The shift happened when I stopped thinking about prompts and started thinking about environments.

You can spend your life crafting the perfect prompt. You can write paragraphs of instructions. You can say "please write tests" and "please check the existing code first" and "please don't rewrite things that already work." The AI will nod, agree, and then do whatever it wants. Not because it's defiant — because prompts are suggestions, and suggestions don't survive contact with a 200,000-token context window.

So I built something different. I built an operating system.

The core idea is embarrassingly simple: instead of telling the AI what to do, I made it impossible for it to do the wrong thing. Shell scripts that fire on every lifecycle event — session start, prompt submission, before any file edit, after any file edit, session end. Twelve of them. They watch. They block. They remind. They enforce.

The Hook System

Claude Code hooks are shell scripts that fire at specific lifecycle events. You configure them in ~/.claude/settings.json:

JSON

1{
2  "hooks": {
3    "PreToolUse": [{ "matcher": "Edit|Write", "command": "bash ~/.claude/hooks/superpowers-enforcer.sh" }],
4    "PostToolUse": [
5      { "matcher": "*", "command": "bash ~/.claude/hooks/context-monitor.sh" },
6      { "matcher": "Edit", "command": "bash ~/.claude/hooks/post-edit-quality.sh" }
7    ],
8    "SessionStart": [{ "matcher": "", "command": "bash ~/.claude/hooks/session-start-context.sh" }]
9  }
10}

Each hook receives a JSON payload on stdin with the tool name, file path, session ID, and transcript path. They can allow, deny, or inject context into the conversation. This is the entire enforcement mechanism.

Hook 1: The Thinking Enforcer

The one I'm proudest of. Before Claude can edit any file, this hook checks whether a thinking skill was invoked earlier in the session:

Bash

1#!/bin/bash
2# superpowers-enforcer.sh — HARD BLOCKS Edit/Write unless a thinking skill was invoked
3
4INPUT=$(cat)
5TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // empty')
6
7# Only check for Edit and Write tools
8[[ "$TOOL_NAME" != "Edit" && "$TOOL_NAME" != "Write" ]] && exit 0
9
10TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.transcript_path // empty')
11
12SUPERPOWERS_INVOKED=false
13if [[ -n "$TRANSCRIPT_PATH" && -f "$TRANSCRIPT_PATH" ]]; then
14    # Check for any superpowers: thinking skill
15    if grep -qE '"skill"[[:space:]]*:[[:space:]]*"superpowers:' "$TRANSCRIPT_PATH"; then
16        SUPERPOWERS_INVOKED=true
17    fi
18    # Also accept TDD, frontend-design, security-review
19    if grep -qE '"skill"[[:space:]]*:[[:space:]]*"(tdd-workflow|frontend-design|security-review)"' "$TRANSCRIPT_PATH"; then
20        SUPERPOWERS_INVOKED=true
21    fi
22fi
23
24if [[ "$SUPERPOWERS_INVOKED" == "false" ]]; then
25    cat << 'DENY'
26{
27    "hookSpecificOutput": {
28        "hookEventName": "PreToolUse",
29        "permissionDecision": "deny",
30        "permissionDecisionReason": "🚫 WORKFLOW VIOLATION: No thinking skill invoked!\n\nBefore writing code, invoke one of:\n  • superpowers:brainstorming\n  • superpowers:writing-plans\n  • superpowers:systematic-debugging\n  • tdd-workflow\n  • frontend-design"
31    }
32}
33DENY
34fi
35exit 0

If Claude hasn't thought before it tries to write, the edit is denied. Not warned. Denied. Claude gets a system message explaining what happened, and it can't proceed until it goes back and thinks.

This single hook eliminated the number one failure mode: jumping to code before understanding the problem.

Hook 2: The Context Monitor

This one counts tool calls per session and warns before the context window fills up:

Bash

1#!/bin/bash
2# context-monitor.sh — Suggests /compact before context window overflows
3
4INPUT=$(cat)
5SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "unknown"')
6COUNTER_FILE="/tmp/claude-context-counter-${SESSION_ID}"
7
8# Initialize or increment counter
9if [[ -f "$COUNTER_FILE" ]]; then
10    COUNT=$(cat "$COUNTER_FILE")
11    COUNT=$((COUNT + 1))
12else
13    COUNT=1
14fi
15echo "$COUNT" > "$COUNTER_FILE"
16
17# Also check transcript size
18TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.transcript_path // empty')
19SIZE_KB=0
20if [[ -n "$TRANSCRIPT_PATH" && -f "$TRANSCRIPT_PATH" ]]; then
21    SIZE_KB=$(du -k "$TRANSCRIPT_PATH" | cut -f1)
22fi
23
24# 40 calls ~ 60% context, 60 calls ~ 80%
25if [[ "$COUNT" -ge 60 || "$SIZE_KB" -gt 500 ]]; then
26    # Inject "CRITICAL: Run /compact NOW" into the conversation
27    echo '{"hookSpecificOutput":{"hookEventName":"PostToolUse","additionalContext":"CRITICAL: Context >80%. Run /compact NOW."}}'
28elif [[ "$COUNT" -ge 40 || "$SIZE_KB" -gt 300 ]]; then
29    echo '{"hookSpecificOutput":{"hookEventName":"PostToolUse","additionalContext":"Context ~60%. Consider /compact soon."}}'
30fi
31exit 0

Forty tool calls means you're at 60% — time to start wrapping up. Sixty means 80% — compact now or lose your memory. This prevents the context amnesia failure mode.

Hook 3: The Post-Edit Scanner

Every file modification gets scanned for quality issues:

Bash

1#!/bin/bash
2# post-edit-quality.sh — Scans every edited file for common problems
3
4INPUT=$(cat)
5FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
6EXT="${FILE_PATH##*.}"
7WARNINGS=""
8
9# Check for console.log in JS/TS files
10if [[ "$EXT" =~ ^(js|jsx|ts|tsx)$ ]]; then
11    if grep -q "console\.log" "$FILE_PATH" 2>/dev/null; then
12        WARNINGS+="⚠️  console.log detected - remove before commit\n"
13    fi
14fi
15
16# Check file size (warn if > 400 lines)
17LINES=$(wc -l < "$FILE_PATH" | tr -d ' ')
18if [[ "$LINES" -gt 400 ]]; then
19    WARNINGS+="⚠️  File has $LINES lines (>400) - consider splitting\n"
20fi
21
22# Check for hardcoded secrets
23if grep -qE "(api[_-]?key|password|secret|token)\s*[:=]\s*['\"][^'\"]+['\"]" "$FILE_PATH"; then
24    WARNINGS+="🚨 SECURITY: Possible hardcoded secret detected!\n"
25fi

No console.log gets through. No secret gets committed. No file grows past four hundred lines without a warning.

The Complete Hook Architecture

Loading diagram...

Click to expand

Here are all twelve hooks and what they enforce:

Hook	Event	What It Does
`superpowers-enforcer`	PreToolUse (Edit/Write)	Blocks edits unless a thinking skill was invoked
`context-monitor`	PostToolUse (all)	Warns at 60%/80% context usage
`post-edit-quality`	PostToolUse (Edit)	Scans for console.log, secrets, file size
`session-start-context`	SessionStart	Re-injects workflow rules after every restart
`enforce-subagents`	PostToolUse	Reminds to use parallel agents when possible
`auto-score-reminder`	PostToolUse	Reminds to run reflexion:reflect after implementation
`ui-workflow-enforcer`	PreToolUse (Edit)	Requires frontend-design skill for UI files
`research-enforcer`	PreToolUse	Enforces Explore agents for research tasks
`pre-git-push`	PreToolUse (Bash)	Checks for test passage before git push
`pre-compact-backup`	PreToolUse	Saves context summary before compaction
`cost-analyzer`	PostToolUse	Tracks API cost per session
`obsidian-file-placement`	PostToolUse (Write)	Ensures docs go to the right Obsidian vault folder

The Three-Phase Pipeline

Loading diagram...

Click to expand

The thinking feeds into a three-phase workflow that governs everything I build. Think, then build, then verify. Not as a suggestion — as an enforced pipeline.

Phase 1: THINK

The Think phase requires brainstorming (one question at a time, multiple choice when possible — faster for me, more structured for the AI) or a written plan with phases and dependencies.

Three Claude Code skills handle this. Skills are markdown files in ~/.claude/skills/ that Claude loads on demand:

Markdown

1<!-- ~/.claude/skills/superpowers/brainstorming.md -->
2
3name: brainstorming
4description: Explore requirements before implementation
5
6## Rules
7
81. Ask ONE question at a time
92. Present options as multiple choice (A/B/C/D)
103. Never skip to implementation
114. Summarize decisions before proceeding

When the hook enforcer blocks an edit, Claude is forced to invoke one of these skills first. The AI must think before it writes.

Phase 2: BUILD

The Build phase follows test-driven development: write the failing test first, write minimal code to pass it, refactor while the tests protect you.

CLAUDE.md workflow entry:

THINK  →  BUILD  →  VERIFY

1. THINK: superpowers:brainstorming → superpowers:writing-plans
2. BUILD: /tdd → implement → code-reviewer agent
3. VERIFY: reflexion:reflect (auto-scores, retries if < 4/5)

The /tdd command scaffolds test files first. The code-reviewer agent runs in a separate context window so it doesn't pollute the main conversation. Each agent has access to specific tools — read-only agents can't edit files, preventing accidental damage.

Phase 3: VERIFY

The Verify phase is where it gets interesting. I have a self-evaluation system — based on academic self-refinement research — that scores every piece of work on a weighted rubric:

Criterion	Weight	What It Measures
Instruction Following	30%	Did it do what I asked?
Completeness	25%	Are all requirements met?
Quality	25%	Is the code clean, tested, secure?
Reasoning	10%	Does the approach make sense?
Coherence	10%	Is it consistent with the codebase?

If the score lands below 4 out of 5, the system presents the shortcomings as a multiple-choice list. I pick which ones matter. Claude iterates. We re-score. This loop continues until the work is actually good, not just finished.

The system grades its own homework. And if the homework is bad, it doesn't get turned in.

The Three Plugin Ecosystems

Loading diagram...

Click to expand

Three plugin ecosystems divide the technical labor. Each solves a different category of failure mode:

1. Superpowers — Process Discipline

Handles how to think, plan, and debug. Prevents cowboy coding.

Skill	Purpose
`superpowers:brainstorming`	Structured requirement exploration
`superpowers:writing-plans`	Multi-phase implementation plans
`superpowers:systematic-debugging`	Root-cause analysis with multiple-choice diagnosis
`superpowers:verification-before-completion`	Pre-commit sanity check
`superpowers:dispatching-parallel-agents`	Reminds you to parallelize independent tasks

Repo: anthropics/superpowers (community plugin)

2. Everything Claude Code — Specialized Agents

Provides specialized agents that run in their own context windows. Prevents convention blindness.

Agent	What It Does
`code-reviewer`	Reviews code for bugs, security, quality
`security-reviewer`	OWASP Top 10, secret detection, input validation
`architect`	System design, scalability decisions
`tdd-guide`	Enforces test-first methodology
`build-error-resolver`	Fixes build/type errors with minimal diffs
`planner`	Complex feature planning with dependency analysis

Each agent runs in a separate context window. This is critical — it means the code reviewer doesn't consume tokens from your main conversation. You can spawn multiple agents in parallel.

Repo: anthropics/claude-code

3. Reflexion — Quality Gates

Handles auto-scoring, multi-judge critique, and persistent learning. Prevents the "never self-evaluates" failure mode.

Skill	Purpose
`reflexion:reflect`	Auto-scores output on weighted rubric (0-5)
`reflexion:critique`	Multi-judge review with debate and consensus
`reflexion:memorize`	Saves insights to persistent memory

The memorize skill is the one that still amazes me. Insights from reflections get saved to CLAUDE.md memory files. The system actually gets better over time — it learns from past mistakes and applies those learnings in future sessions.

The CLAUDE.md Configuration

The CLAUDE.md file is the project-level instruction set that Claude reads at the start of every session. It's where you define your workflow, your standards, and your expectations:

Markdown

1# CLAUDE.md
2
3## Workflow (3 Phases)
4
5THINK → BUILD → VERIFY
6
71. THINK: superpowers:brainstorming → superpowers:writing-plans
82. BUILD: /tdd → implement → code-reviewer agent
93. VERIFY: reflexion:reflect (auto-scores, retries if < 4/5)
10
11## Mandatory Checkpoints
12
13### Before Code
14
15- [ ] Brainstorming or plan invoked
16- [ ] Checked for existing libraries/patterns
17
18### During
19
20- [ ] TDD: tests BEFORE implementation
21- [ ] Immutability, no console.log
22- [ ] No security vulnerabilities
23
24### After
25
26- [ ] reflexion:reflect score >= 4/5
27- [ ] Visual verification if UI changes
28- [ ] No debug code left

This file persists across sessions. Every time Claude starts a new conversation in your project, it reads this file and knows your rules. Combined with hooks that enforce those rules, the AI can't forget your standards even if the context window compresses.

The Voice

After I solved the process problem, the friction moved somewhere I didn't expect: my fingers.

I'd built this entire operating system — hooks, skills, agents, auto-scoring — and the bottleneck was... typing. Every task started with me laboriously typing a prompt that expanded into the same workflow. "Use brainstorming to explore this, then writing-plans, then TDD, then reflexion:reflect." Eighty-two words. Every. Single. Time.

So I did what any reasonable person would do. I stopped typing.

Wispr Flow is a speech-to-text tool that runs system-wide on macOS. You hold a key, you talk, and text appears wherever your cursor is. The promise is frictionless voice input. The reality, at least initially, was a comedy of errors. "Superpowers" became "super powers." "Claude" became "cloud." "Shadcn" became "shad CNN." Every technical term in my vocabulary — and my vocabulary is almost entirely technical terms — was being mangled by a model that had been trained on normal human speech, not the fever dreams of a developer who names things like reflexion:memorize.

I could have given up. Instead, I got curious. I exported my conversation history and analyzed it. (When your development environment is an AI, even your debugging process involves AI.) I found seven categories of speech recognition failures: compound technical terms that get split into separate words, proper nouns that get autocorrected into common words, camelCase that gets flattened, acronyms that get spelled out wrong, punctuation terms that get interpreted literally, command syntax that gets normalized, and mixed-language phrases that confuse the model.

The Wispr Flow Configuration

The fix was surgical. A custom dictionary and speech corrections in Wispr's settings:

Custom Dictionary (66 terms):
Claude Code, shadcn, CLAUDE.md, tsx, jsx, PostToolUse, PreToolUse,
reflexion:reflect, reflexion:critique, reflexion:memorize,
superpowers:brainstorming, superpowers:writing-plans, pnpm, Tailwind...

Speech Corrections (95 rules):
"cloud code"      → "Claude Code"
"shad CN"         → "shadcn"
"reflect on reflect" → "reflexion:reflect"
"super powers"    → "superpowers:"
"tail wind"       → "Tailwind"

Then came the part that changed everything. Trigger-phrase snippets. Thirteen core workflows and dozens of utilities. Short voice commands that expand into complete instructions. They fall into three categories:

Solo workflows — for when you're working alone with one Claude instance:

"build feature"  → brainstorming → writing-plans → /tdd → implement → reflect (4/5+)
"fix bug"        → systematic-debugging → multiple choice causes → failing test → fix
"quick change"   → skip planning, just edit + quality check
"ship it"        → review diff → critique/team review → clean commit (no push)
"use the workflow" → full THINK → BUILD → VERIFY chain, all skills enforced

Team workflows — for spawning multiple agents that talk to each other:

"team build"     → contract-first agent team: upstream publishes interfaces,
                   downstream builds against them, lead integrates

"research team"  → 3 specialized teammates: web-researcher (Firecrawl),
                   social-researcher (Grok/Twitter), docs-researcher (Context7)

"review team"    → 3 reviewer teammates: quality (reflexion:critique),
                   security (OWASP), performance (bundle size, N+1 queries)

The team snippets use Agent Teams — a feature where multiple Claude instances coordinate via a shared task list and direct messages. Each teammate gets a specific skill assigned in their spawn prompt. The lead runs on Opus (expensive, good at coordination), teammates run on Sonnet (cheaper, good at execution). One file per teammate, never overlap.

Context workflows — for surviving long sessions and resuming work:

"where was I"         → search persistent memory + git log for last session
"resume after compact" → recover from /compact by searching memory + backup logs
"save and compact"    → summarize + save to memory before compacting
"run it"              → check port 3000 → ask before killing → start dev server
"test and build"      → lint → format check → build, fix anything that fails

These exist because context loss is the silent killer of AI-assisted development. Every /compact wipes the conversation to a summary. Every new session starts fresh. Without explicit memory tools, you lose architectural decisions, debugging context, and half-finished work. The context snippets bridge that gap — they force the AI to search persistent memory before continuing, so it doesn't reinvent what you already decided.

Loading diagram...

Click to expand

Two words. That's what I say into my microphone. "Build feature" expands into a sixty-word instruction that invokes brainstorming, then planning, then TDD, then code review, then reflexion scoring with a 4/5 threshold and iteration on failure. "Research team" spawns three agents with different search tools that report back independently. "Resume after compact" recovers my entire working context from persistent memory. Two words in, an entire orchestrated workflow out.

My development workflow starts with me saying two words into a microphone. Everything else is automated.

The philosophical weight of this didn't hit me until weeks later. I had built a system where the input was my voice, the process was automated, the quality was self-evaluated, and the output was production code. The human in the loop — me — had been reduced to the one thing humans are still irreplaceably good at: deciding what to build and whether the result is good enough.

The Meta-Developer

Loading diagram...

Click to expand

Here's the question I keep coming back to: what am I, now?

I don't write code anymore. Not really. I haven't manually typed a function in weeks. I don't debug by reading stack traces — I invoke a systematic debugging skill that diagnoses root causes and presents them as multiple choice. I don't write tests — a TDD agent scaffolds them. I don't review my own code — specialized reviewer agents with their own context windows do that. I don't even type my instructions — I say them into a microphone and they expand into structured workflows.

So what do I do?

I design systems. I architect the process by which code gets written. I decide what gets built. I evaluate whether the result meets the bar. I set the bar. And increasingly, I spend my time not on the code itself but on the meta-layer: improving the hooks, refining the skills, calibrating the scoring rubrics, teaching the voice model new words. I am building the machine that builds the software.

There's a blog post by Kailash Nadh — the CTO of Zerodha, India's largest stock broker — called "Code is Cheap." His thesis is elegant: code has never been cheaper to produce. AI has driven the marginal cost of a line of code toward zero. The question, then, is what becomes expensive when code becomes cheap?

Process. Discipline. Taste.

Those are the scarce resources now. Anyone can generate a thousand lines of TypeScript in thirty seconds. The hard part is knowing which thousand lines to generate, in what order, with what tests, following what patterns, and — this is the part most people miss — knowing when to stop. When the code is good enough. When the refactor is over-engineering. When the feature is done and the next one should start.

This is what I've started calling the meta-developer. Not a developer who writes code, but a developer who designs the systems that write code. Part architect, part manager, part quality engineer. The skill set isn't syntax or frameworks — it's communication. The ability to articulate what you want with enough precision that a machine can execute it. Sometimes that takes two words. Sometimes it takes two thousand. The craft is knowing which.

I think this is where software development is heading, whether we like it or not. The hand-coding era isn't ending tomorrow, but it's ending. The developers who thrive will be the ones who can operate at the meta-level: designing processes, enforcing standards, evaluating output, and iterating on the system itself. The ones who can't — who insist on writing every line by hand, who refuse to trust (but verify) the machine — will be the ones wondering why they're moving so slowly.

I don't say this with triumph. I say it with the same uneasy acceptance you feel when you realize the world has shifted under your feet and you have to shift with it.

Start Here

If this essay leaves you with one thing, let it be this.

You don't need twelve hooks. You don't need three meta-frameworks. You don't need a voice layer or auto-scoring or agent teams. Those are the result of months of iteration, and half of them might be over-engineered. (I suspect the cost analyzer is. I haven't checked.)

You need one guardrail. The one that says: think before you code.

Build it however you want — a hook, a checklist, a sticky note on your monitor, a rubber duck that stares at you judgmentally. The mechanism doesn't matter. The discipline does.

Everything else follows.

Tools & Resources

Everything I use is either open source or has a free tier. Here's the complete stack:

Tool	What It Does	Link
Claude Code	Anthropic's CLI for AI-assisted development	github.com/anthropics/claude-code
Claude Code Hooks	Shell scripts that enforce workflow rules	Hooks documentation
Claude Code Skills	Markdown-based skill definitions	Skills documentation
CLAUDE.md	Project-level instruction files	CLAUDE.md documentation
Wispr Flow	Voice-to-text with custom dictionary	wispr.ai
Reflexion	Self-refinement framework (academic paper)	arxiv.org/abs/2303.17651
Claude Mem	Persistent memory across sessions (MCP server)	github.com/anthropics/claude-mem
Superpowers	Brainstorming, planning, debugging skills plugin	github.com/superpowers-marketplace/superpowers
Everything Claude Code	30+ skills: TDD, security, build fixes, and more	github.com/aashari/everything-claude-code

Quick Start: Your First Hook

Create ~/.claude/hooks/think-first.sh:

Bash

1#!/bin/bash
2# Blocks file edits unless the word "plan" appears in the conversation
3INPUT=$(cat)
4TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // empty')
5[[ "$TOOL_NAME" != "Edit" && "$TOOL_NAME" != "Write" ]] && exit 0
6
7TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.transcript_path // empty')
8if [[ -n "$TRANSCRIPT_PATH" && -f "$TRANSCRIPT_PATH" ]]; then
9    if grep -qi "plan\|brainstorm\|design\|think" "$TRANSCRIPT_PATH"; then
10        exit 0  # Thinking detected, allow the edit
11    fi
12fi
13
14cat << 'EOF'
15{
16    "hookSpecificOutput": {
17        "hookEventName": "PreToolUse",
18        "permissionDecision": "deny",
19        "permissionDecisionReason": "Think before you code. Describe your plan first."
20    }
21}
22EOF
23exit 0

Add it to ~/.claude/settings.json:

JSON

1{
2  "hooks": {
3    "PreToolUse": [{ "matcher": "Edit|Write", "command": "bash ~/.claude/hooks/think-first.sh" }]
4  }
5}

That's it. One file. One config entry. Claude now can't edit code without thinking first. Everything else in this essay is just iteration on that idea.

Replicate the Voice Layer

The hooks are the foundation. But if you read the Wispr Flow section and thought "I want that" — here's how to build your own voice-driven workflow from scratch. The entire process takes about an hour, and most of it is letting Claude Code do the work.

Step 1: Install the plugins.

Claude Code supports community plugins that add skills, hooks, and agent capabilities. The three I use:

Bash

1# Skills for brainstorming, planning, debugging, and code review
2claude install superpowers-marketplace/superpowers
3
4# Self-refinement framework — reflect, critique, memorize
5claude install alioshr/reflexion
6
7# TDD, security review, build fixer, and 30+ other skills
8claude install aashari/everything-claude-code

Each plugin drops a set of skill files into ~/.claude/plugins/. Claude Code discovers them automatically. You can verify with /skills — you should see things like superpowers:brainstorming, reflexion:critique, and everything-claude-code:tdd in the list.

Step 2: Set up persistent memory.

This is the piece most people skip, and it's the piece that makes everything else work. Without persistent memory, every Claude Code session starts from zero. With it, the AI can recall what you built yesterday, what architectural decisions you made, and what bugs you've already fixed.

I use Claude Mem — an MCP server that stores observations in a local database and exposes search, timeline, and retrieval tools to Claude Code. Install it as an MCP server in your Claude Code settings:

JSON

1// ~/.claude/settings.json
2{
3  "mcpServers": {
4    "claude-mem": {
5      "command": "npx",
6      "args": ["-y", "@anthropic/claude-mem", "server", "--db", "~/.claude/claude-mem.db"]
7    }
8  }
9}

Once configured, Claude Code can search past sessions with claude-mem:search, browse a timeline with claude-mem:timeline, and save new observations with reflexion:memorize. The context snippets I described earlier — "where was I," "save and compact" — all depend on this.

Step 3: Let Claude Code generate your snippets.

Here's the part that surprised me. You don't need to write your Wispr Flow snippets from scratch. Claude Code can analyze your own conversation patterns and generate them for you.

Start a new session and say something like:

"Search my Claude Mem observations for this project. Look at the types of tasks I do most often — feature building, debugging, reviewing, deploying. For each common task, write a Wispr Flow snippet: a short trigger phrase (2-4 words) that expands into a detailed prompt with the right skills invoked in the right order. Output them as a markdown file I can reference."

Claude Code will search your memory, find your patterns, and generate snippets tailored to how you work. My thirteen snippets didn't come from a template — they came from Claude Code analyzing three weeks of my own conversations and identifying the workflows I repeated most often.

Step 4: Add them to Wispr Flow.

Open Wispr Flow settings → Snippets. For each snippet, add:

Trigger: The short phrase (e.g., "build feature")
Expansion: The full prompt Claude Code generated

Then add your technical vocabulary to the custom dictionary — every framework name, every tool name, every skill name that Wispr might mangle. My dictionary has 66 terms. Yours will be different.

Step 5: Iterate.

The first version of your snippets will be wrong. Not catastrophically wrong — just slightly off. The trigger phrase will be too similar to another one and Wispr will pick the wrong expansion. Or the expanded prompt will be missing a step you always do. Or it'll invoke a skill that doesn't exist in your plugin set.

This is normal. I've rewritten my snippets four times. Each iteration gets closer to the workflow that matches how I actually think. The goal isn't perfection on day one — it's a system that improves every week.