Your agents are generating more error signal in a day
than your team did all year.

Fukura turns it into measured effectiveness. Every failed tool call, every retried command, every “try this instead” from Claude Code, Cursor, or your own scripts — captured locally, redacted, and made searchable. So the next agent (or the next engineer) doesn’t pay the same cost twice.

curl -sSL https://fukura.dev/install.sh | bash

Agents bang on tools. Nothing remembers.

A coding agent runs a hundred shell commands to land one pull request. Ninety of them fail. The ten fixes that worked are thrown away the moment the session ends — next week, a different agent on a different branch retries the same wrong command, hits the same Terraform error, and burns the same forty-five minutes.

Sentry sees production crashes. Datadog sees metrics. Your editor forgets everything. Nobody is recording the intermediate failure-and-fix loop that now dominates how software gets written. That loop is where the cost lives, and it’s invisible.

How it works

Fukura is a local-first CLI and an open spec (EKP). The CLI watches your shells and agent sessions, captures error/fix pairs, redacts secrets against deterministic rules, and stores them in a content-addressable repo you own. A hub — optional, self-host or managed — aggregates across machines so your whole team queries the same memory.

Capture

Shell hooks (zsh / bash / fish / powershell) + MCP tools catch every failing invocation. Redaction runs before anything leaves your machine.

Classify

EKP adapters turn raw stderr into a stable fingerprint so the same error from any tool, language, or agent collapses to the same key.

Measure

Every attempt at a fix is recorded with its outcome. The hub surfaces success rates per fingerprint, human vs. agent splits, and unresolved patterns in real time.

You can’t improve a loop you can’t measure.

Fukura’s effectiveness score (successful next command ÷ attempts) weighted by time-to-fix — is the first number that tells you whether your agent tooling is getting better or just getting louder. Ship it to a dashboard. Put it on a review. Sort your docs by it. The hub does the arithmetic; you decide what to do with the answer.

Top 10 failures this week

Ranked by attempts × (1 − success rate). The concrete list every DevEx standup should end with.

Human vs. agent split

Which fingerprints do agents disproportionately fail on? That’s where a one-paragraph note pays back the most.

Coverage %

Percent of active fingerprints that have at least one team note. The single number that says “how much of our firefighting is actually captured”.

Used by teams who run Claude Code and Cursor in production

Design-partner program is open — if your team ships with agent tooling and wants to measure it, say hi.

Install in one line. Self-host the hub when you’re ready.

The CLI is Apache-2.0 and runs offline. The hub is source-available commercial — free for individuals and small teams, paid past that. You can turn it on whenever the effectiveness data starts to matter.