Epistamate Blog, Evidence, AI, and the research quality problem

July 2026 AI verification Evidence quality Enterprise AI adoption

AI judging AI has a reliability problem. Recent research measured exactly how unreliable.

One judge, one question, fifty repeated trials: the verdict flipped 13.6 percent of the time. What two 2026 studies on LLM-as-a-judge reliability actually found, and what still works despite it.

8 min read →

July 2026 Evidence quality AI research tools Verification architecture

A model agreeing with another model is not the same as a claim being verified.

Cross-checking an AI answer with another model, or running a debate between several, has become the mark of a careful user. What that practice is actually built from has not kept pace with how confidently it gets trusted.

9 min read →

June 2026 Professional services Evidence quality Research methodology

In AI-assisted consulting work, the output that looks most confident is sometimes the work that most needs checking.

A BCG study of 758 consultants found AI made them 19 percentage points less likely to produce correct answers on outside-frontier tasks -- with no signal in the output to indicate which tasks those were.

10 min read →

June 2026 Agentic AI Evidence quality Research methodology

The compounding evidence problem: why agentic research pipelines fail quietly.

Agentic AI pipelines do not just produce errors. They inherit, transform, and propagate them. Each failure mode in a static system becomes a compounding one when the system is dynamic.

9 min read →

June 2026 Evidence quality Research methodology Compliance

A cited source and a supporting source are not the same thing. AI research tools treat them as if they are.

A citation resolving to a real paper is not the same as that paper supporting the claim attached to it. The distinction is now named and measurable -- and most AI research tools fail it.

8 min read →

June 2026 AI governance Compliance

Binding law vs non-binding guidance: a distinction AI research tools collapse, with compliance consequences.

AI policy research tools conflate binding regulations with voluntary codes and interpretive guidance. They surface both as equally authoritative. For compliance work, that conflation has direct legal consequences.

8 min read →

June 2026 Evidence quality Research methodology

AI research tools erode the one expertise that can catch what they get wrong.

AI tools substitute for the research tasks that build domain expertise. As expertise atrophies, the capacity to catch AI errors declines. Meanwhile AI training data incorporates more synthetic content. The two processes compound each other invisibly.

9 min read →

June 2026 Evidence quality AI governance

The internet is eating itself: model collapse and the evidence base.

AI models train on AI-generated content at scale. The contamination is self-reinforcing. A new epidemiological model finds supercritical dynamics across all scenarios. What this means for anyone using AI tools to do research.

9 min read →

June 2026 Evidence quality AI governance Research methodology

Confident and wrong: why AI can't tell you when it doesn't know.

The series has examined what AI tools do to evidence from the outside. This article turns inward. Most language models have no honest mechanism for representing uncertainty, and the confidence in their outputs is a property of the format, not the evidence.

8 min read

June 2026 Evidence quality Academic publishing AI governance

The peer review loop is breaking.

21% of peer reviews at ICLR 2026, the world's largest AI conference, were fully AI-generated. More than half showed AI involvement. When AI writes papers and AI reviews them, the proxy-sovereign evaluation problem becomes concrete — and the evidence base that policy and regulatory research depends on starts to hollow out.

9 min read →

May 2026 Evidence quality AI governance Research methodology

Fragility is not falsehood. That's the harder problem.

A claim can be true, correctly cited, and pass peer review — and still be evidentially fragile. A new paper from Lexsi Labs formalises this as the audit gap: the divergence between the evidence governance frameworks demand and the evidence current assurance methods can actually produce.

8 min read →

May 2026 Evidence quality Academic publishing Reference integrity

A citation used to be evidence. It isn't anymore.

A Lancet audit of 2.5 million biomedical papers found fabricated citations rising 12x in three years. The fake references look real, correct formatting, genuine author names, plausible titles. Here's what reference integrity checking actually requires, and why most tools don't do it.

7 min read →

May 2026 Evidence quality Research methodology

Forty sources, one claim: the difference between corroboration and amplification

When forty papers cite the same finding, that looks like consensus. Often it isn't. Source diversity, not source count, is what makes evidence reliable. Here's why most research tools can't tell the difference, and why it matters for anyone doing work that has to hold up.

8 min read →

Evidence, AI, and theresearch quality problem.

Evidence, AI, and the
research quality problem.