Show HN: Agent Tinman – Autonomous failure discovery for LLM systems https://ift.tt/swIJ8mg
Show HN: Agent Tinman – Autonomous failure discovery for LLM systems Hey HN, I built Tinman because finding LLM failures in production is a pain in the ass. Traditional testing checks what you've already thought of. Tinman tries to find what you haven't. It's an autonomous research agent that: - Generates hypotheses about potential failure modes - Designs and runs experiments to test them - Classifies failures (reasoning errors, tool use, context issues, etc.) - Proposes interventions and validates them via simulation The core loop runs continuously. Each cycle informs the next. Why now: With tools like OpenClaw/ClawdBot giving agents real system access, the failure surface is way bigger than "bad chatbot response." Tinman has a gateway adapter that connects to OpenClaw's WebSocket stream for real-time analysis as requests flow through. Three modes: - LAB: unrestricted research against dev - SHADOW: observe production, flag issues - PRODUCTION: human approval required Tech: - Python, async throughout - Extensible GatewayAdapter ABC for any proxy/gateway - Memory graph for tracking what was known when - Works with OpenAI, Anthropic, Ollama, Groq, OpenRouter, Together pip install AgentTinman tinman init && tinman tui GitHub: https://ift.tt/WDN7tjG Docs: https://oliveskin.github.io/Agent-Tinman/ OpenClaw adapter: https://ift.tt/qQJwSZu Apache 2.0. No telemetry, no paid tier. Feedback and contributions welcome. https://ift.tt/WDN7tjG February 1, 2026 at 12:17AM
Comments
Post a Comment