🧾 141 observations · 68% artifact-verified · re-tested continuously
We run the same input through every AI tool. You see exactly what came out.
Not opinions — controlled experiments. Every verdict links to the real input we used and the real output the tool produced. Readable by you, queryable by your agents.
📂 Test inputs downloadable📆 Every cell dated🤖 MCP for agents
LIVE FROM THE EVIDENCE GRAPHobservation #53 · Jun 12, 2026
✗ FailedLlamaParse × nested header handling
Grouped column headers in scanned tables lose relationships during extraction; column semantics drift.
Input — real file

Output — unretouched

621
tools tested hands-on
141
evidence cells
68%
artifact-verified
28
documented failures