🧾 141 observations · 68% artifact-verified · re-tested continuously

We run the same input through every AI tool. You see exactly what came out.

Not opinions — controlled experiments. Every verdict links to the real input we used and the real output the tool produced. Readable by you, queryable by your agents.

📂 Test inputs downloadable📆 Every cell dated🤖 MCP for agents
LIVE FROM THE EVIDENCE GRAPHobservation #53 · Jun 12, 2026
✗ FailedLlamaParse × nested header handling

Grouped column headers in scanned tables lose relationships during extraction; column semantics drift.

Input — real file
Extracted treatment table with diameter measurements.
Output — unretouched
Scanned bar chart of tree mortality by year and treatment.
get_evidence({ tool: "llamaparse" }) →
621
tools tested hands-on
141
evidence cells
68%
artifact-verified
28
documented failures