← Home

Agents bang on tools. The cost is invisible. That’s the problem.

A year ago, a senior engineer produced a pull request by running fifty shell commands, ten of which failed. Today the same PR is produced by an agent running five hundred shell commands, four hundred of which failed. The number of failed-and-retried tool calls emitted per feature has increased by one to three orders of magnitude, and nobody is measuring it.

This is the signal fukura exists to capture. It turns out to be the most valuable data your team is throwing away.

Why the existing tools miss it

Observability stacks (Sentry, Datadog, OpenTelemetry) were built for runtime errors — a service emits a stack trace, the stack ingests it. Agents iterating on a local sandbox emit nothing a runtime tool wants to ingest. A terraform plan that 403’d because the session token expired is not a service exception. It is a failed tool call with actionable structure, and there is no product category for it.

Knowledge tools (Stack Overflow for Teams, Notion, Glean) were built for post-hoc capture. A human sits down, writes a Q&A, files it. That works for problems you remember long enough to write about. It does not work at agent-era volume, and it has no notion of “did this answer actually work?”

Vendor memory (Cursor memory, Claude Projects, Warp AI) is locked to one agent. Cursor can’t read Claude Code’s memory, Claude Code can’t read Cursor’s, Warp doesn’t even run agent tool calls. Your fleet is fragmented, your memory is fragmented, the recurrence problem gets worse every time you onboard a new agent.

What fukura does instead

Three things, and only three.

  1. Capture. Shell hooks + MCP tools feed every failing invocation through a redaction pipeline and into a content-addressable local store. No network round-trip, no vendor dependency, no config to turn on capture.
  2. Fingerprint. EKP adapters reduce the raw output to a stable hash that is the same across machines, users, and agents for “the same logical error.” This is the joining key that makes aggregation possible.
  3. Measure. Every attempt at a fix records its outcome. Success rate per fingerprint ranks which fixes actually worked, not which sound plausible. This is the metric you couldn’t compute before because the data wasn’t there.

The ICP

The people for whom this clicks first are engineering-platform and DevEx leads at companies that have already deployed Claude Code or Cursor org-wide in the last twelve months. They are already being asked variants of “is this stuff actually working?” by their CFO or their VP Eng, and they do not have an answer because no tool has the data.

Fukura gives them the answer. “Our agent fleet retried these 15 failures 1,200 times last week. The top five had no linked note. Writing one note each cut retries by 60% over seven days.” That is the conversation the DevEx lead wants to have with their exec team, and the data that supports it doesn’t exist without fukura.

Our honest moat

A schema is not a moat. EKP is CC-BY on purpose: adoption is the point. What defends fukura is:

  1. Cross-vendor data. No single-vendor product can ingest both Cursor and Claude Code failures at the same time. Fukura is the substrate both sit on.
  2. The effectiveness loop. Recording the next command (redacted) after a failure, joining it to outcome, and computing success rate per fingerprint is several careful engineering quarters of work — not a weekend for a competitor.
  3. Default-local privacy. By the time anyone sees a byte of your data, it has passed through redaction on the producer. Competitors who start from “ship stderr to a SaaS” have a harder time getting to that posture later.

We are not going to pretend the schema alone defends us. We think the honest story is stronger.

What’s next

Read Quickstart to get fukura on your laptop in ten minutes, or Deploy a hub if you’re ready to give the team a shared memory layer. If you’re still deciding, the competitor page walks through what each adjacent tool is actually missing.