Agent Eval Harness Finder
Catalog open-source agent eval harnesses & benchmarks (SWE-Bench, AgentBench, ToolBench, BIRD, GAIA, MAST, WebArena). Combines GitHub search + curated seed list, scores by quality signals (stars, recency, license), parses README scope and sample model scores.
4%
users