Your analyst rebuilt last quarter's competitive landscape from scratch — again. A claim made it into the final brief because nobody tracked the conflicting source. The citation was a trade blog. The board approved it anyway, because there was no way to know.
This is not a research quality problem. It is a structural problem with how evidence-based work is done.
Epistamate never uses LLM self-reported confidence in its scores. It computes confidence from the evidence underneath — source credibility tier, cross-source agreement, adversarial challenge outcome, evidence recency.
A claim-based research system — for building knowledge you can defend.
A frontier LLM producing a research brief will report high confidence even when the underlying sources are weak, out of date, or non-existent. It has no mechanism to distinguish between a claim corroborated by three Tier 1 sources and a claim it generated from pretraining memory.
Every finding is a structured assertion — with a source credibility tier, provider consensus count, adversarial challenge outcome, and evidence age. The formula-computed score is deterministic. The 29% claim is shown alongside the 90% one. Nothing is averaged away.
In live sessions, LLM-reported aggregate confidence typically runs 87–92%. Formula-computed per-claim scores, reflecting actual source tier quality, range 11–70%. That gap is not a rounding error. It is the difference between research you can defend and research that sounds authoritative.
Epistamate structures research as a claim vault — every finding individually addressable, scored, and traceable. Contradictions are named. Gaps are tracked. The decision record is immutable.
Enter a research question. See how claims are extracted, scored, and challenged — on a real topic, with real scores.
Topic map, initial claim stubs, prior verified claims retrieved from the knowledge graph.
Multi-provider parallel retrieval. Each claim scored against the confidence formula. Gaps extracted as typed objects.
Mandatory phase. Claims that don't survive lose their socratic bonus. Challenges are persisted, not discarded.
WEAK and UNVERIFIED claims excluded from Key Findings. Decision log entry immutably preserves the evidence state at time of decision.
The atomic unit is a claim, not a summary. Every finding is a structured, scored assertion — with a source tier, provider consensus count, adversarial challenge outcome, and evidence age. The 29% claim is shown alongside the 90% one. Epistamate never averages away what it doesn't know, and it never hides uncertainty behind confident prose.
Your documents count. An authoritative report you ingest contributes directly to confidence scores — outweighing what the model guesses from memory.
The same analyst rebuilds the same knowledge every quarter. Epistamate maintains a durable claim graph across sessions. Run five builds on sessions one through four. Contradictions are preserved, not discarded. The knowledge graph compounds with use.
Works for general professional research — not locked to academic literature like Elicit, Consensus, or Scite.
The best strategy research already works the way the engine works: individual claims are sourced and graded, contradictions are noted, gaps are named, and the final recommendation is honest about its confidence level. What it doesn't do is carry that structure forward to the next engagement, the next client, the next analyst who joins the team.
The structured brief a senior consultant produces for a board is a claim vault — it just doesn't look like one, and it evaporates when the project ends.
Evidence synthesis under time pressure, across contradictory sources. Gap tracking and contradiction detection are precisely what's missing from policy brief workflows.
Stop rebuilding the same landscape every engagement. Compounding knowledge graph means session five builds on sessions one through four — not from zero.
Binding law vs guidance vs draft vs enforcement — these are not the same. Epistamate tracks them separately. Article 12 compatible decision log by design, not by bolt-on.
Claims that trace back to a single source are amplification, not corroboration. The confidence formula's diversity weighting catches the difference.
Removing any one of the six properties degrades the system to something existing tools already do.
| Capability | FActScore / VeriScore | GraphRAG | MemGPT | Commercial tools | Std. LLM | Epistamate |
|---|---|---|---|---|---|---|
| Typed claim extraction | ✓ atomic | — | — | partial | — | ✓ |
| Multi-factor evidence confidence | retrieval-based | — | — | — | — | ✓ formula |
| Tier-enforced source hierarchy | — | — | — | — | — | ✓ |
| Adversarial challenge (pre-synthesis) | — | — | — | — | — | ✓ |
| Gap tracking (typed, persistent) | — | — | — | — | — | ✓ |
| Cross-session compounding | — | partial | partial | — | — | ✓ |
| Bidirectional operation | — | — | — | — | — | ✓ |
| Domain configurability (runtime) | — | partial | — | — | — | ✓ |
| Decision log (Article 12 compatible) | — | — | — | — | — | ✓ |
| Immutable audit snapshot | — | — | — | — | — | ✓ |
Local SQLite storage with vector similarity. No research data transmitted to external servers beyond LLM API calls. Everything stays on your machine.
Shared knowledge graphs, parallel research runs, consolidated decision logs. Appropriate for investment due diligence, legal research, regulatory compliance.
PostgreSQL-backed store for concurrent deployment. No architectural change to the reasoning pipeline — same formula, same audit trail, at scale.
A RAG system retrieves documents and passes them to an LLM, which produces a summary. Epistamate extracts structured claims from that retrieval, scores each one against a deterministic confidence formula based on source credibility tier, cross-source agreement, adversarial challenge outcome, and evidence recency — and tracks what is unknown alongside what is known. The output is a scored claim vault with a full provenance chain, not a synthesised summary. The knowledge graph persists across sessions; a RAG context window resets.
Epistamate never uses an LLM's self-reported confidence in its scoring. It computes confidence from the evidence underneath each claim: the credibility tier of cited sources, how many independent providers corroborated the claim, whether the claim survived adversarial challenge, and how recent the evidence is. A claim the model is highly confident about but that only cites a single Tier 3 source will score low. A claim from three Tier 1 documents with cross-provider consensus will score high. The formula separates rhetorical confidence from evidential quality.
Article 12 of the EU AI Act requires that high-risk AI systems maintain logs enabling post-hoc auditability — what the system did, on what basis, at what point in time. Epistamate's Decision Log mechanism produces exactly this: an immutable, timestamped record of the full evidence state at the moment a decision was logged — which claims were verified, which were contested, which gaps were acknowledged. This is the output of how the system works by default, not a compliance layer added afterward. It is not a legal certification; qualified legal counsel must assess applicability.
Verified claims from session N are stored in a persistent knowledge graph. In session N+1, when a semantically similar question is asked, prior verified claims are retrieved and contribute directly to the new session — reducing re-derivation burden and increasing confidence scores for claims with existing corroboration. Contradictions between sessions are preserved as typed objects, not silently resolved. The graph accumulates with use; it does not reset.
Yes. Documents you ingest are assigned a credibility tier within the source hierarchy. An authoritative report from a Tier 1 institution outweighs model-generated inferences. Your documents participate in the confidence formula directly — they are not just context passed to the LLM.
Epistamate's validated pipeline covers general professional research. The engine is domain-configurable at runtime — source trust hierarchies, claim type vocabularies, scoring weights, and output formats are parameters, not hardcoded logic. Gov/Policy and RegWatch vertical slices are in active development. The same binary can run policy research, investment due diligence, and regulatory compliance with no architectural change.
If you recognise your domain in this site, we'd like to hear about the specific problem before we describe the solution.