At Aicadium, AI is not new to us. We build with it, advise on it, and put it to work every single day. So when we turned a critical eye on our own AI-generated research and stress-tested the outputs claim by claim, we were not surprised to find a few errors. What surprised us was how many there were, and how easily they slipped past us without the right tooling to catch them.
This is the story of SPARK, and how we stopped hoping AI research was right and built a way to know for sure.
When Stanford tested leading models on high-stakes legal research, somewhere between two-thirds and nine in ten of the answers contained a fabricated or incorrect claim. The harder the question, the worse it got
Multi-model usage is the norm. Optimising for one platform is no longer a viable strategy — buyers ask the same question across ChatGPT, Gemini, Perplexity, and Claude.
There is no confidence score, no source, and no conflict flag. Every claim arrives looking equally certain, whether it is solidly sourced or quietly invented, and telling them apart is left entirely to you.
AI makes it easy to produce a confident-sounding finding, but confidence is not the same thing as evidence. A single hallucinated fact rarely fails loudly. Instead, it travels quietly into every slide, model, and decision that gets built on top of it.
Provenance is moving from a nice-to-have to a board-level and regulatory expectation. The phrase "the AI said so" will not survive scrutiny from a regulator, a client, the CFO, or a peer reviewer. Every claim needs a source behind it, or at least an honest flag that one is missing.
SPARK stands for Self-verifying, Portable, Agentic, Researcher, Kit.
It takes whatever AI assistant your team already uses and turns it into a research partner that remembers what it found, checks it, and scores every fact. You bring the domain expertise and write the recipe. SPARK brings the rigour, so the work holds up when someone asks where a number came from.
Every claim is scored for confidence and traced to a source as it goes, so a month later you can just ask what changed and get the difference back instead of re-running the whole thing.
Instead of asking AI for research and hoping, you ask it for the receipt. Each commitment is a question a leader would actually ask of a finding, and SPARK answers it for every claim in the report, not just the report as a whole.
Three tiers sit next to the finding itself, so the evidence travels with the sentence. You see confidence where you read the claim, not buried in a footnote. (add in trust markers)
When two credible sources contradict each other, SPARK keeps both and flags the tension rather than silently averaging them into one tidy answer. Honest uncertainty beats false consensus.
Research isn’t a one-time artifact. SPARK re-runs against a stored fact file and reports exactly what changed, new facts, superseded claims, fresh conflicts, so a report stays alive instead of decaying into a snapshot
Reviewers, legal, and leadership start asking where a claim came from before any findings ship, whether those findings live in a deck, a memo, or a report. The teams who already log it will set the standard, and everyone else will be retrofitting.
Teams begin reporting trust and coverage the way they report any other quality metric. "How accurate is the intelligence we are acting on" turns into a standing question in the business review rather than an afterthought.
From dashboards and automated reports to personalisation engines, the trust layer becomes a gate rather than an optional check. If a claim has not been scored, it simply does not make it into the decision.
Select a form in the module settings to activate the subscribe form.