Show HN: Agent Tinman – Autonomous failure discovery for LLM systems https://ift.tt/swIJ8mg
Show HN: Agent Tinman – Autonomous failure discovery for LLM systems Hey HN, I built Tinman because finding LLM failures in production is a pain in the ass. Traditional testing checks what you've already thought of. Tinman tries to find what you haven't. It's an autonomous research agent that: - Generates hypotheses about potential failure modes - Designs and runs experiments to test them - Classifies failures (reasoning errors, tool use, context issues, etc.) - Proposes interventions and validates them via simulation The core loop runs continuously. Each cycle informs the next. Why now: With tools like OpenClaw/ClawdBot giving agents real system access, the failure surface is way bigger than "bad chatbot response." Tinman has a gateway adapter that connects to OpenClaw's WebSocket stream for real-time analysis as requests flow through. Three modes: - LAB: unrestricted research against dev - SHADOW: observe production, flag issues - PRODUCTION: human approval ...