For leaders betting decisions on AI research

You can't make AI perfectly accurate, but you can make it reliable enough to trust

At Aicadium, AI is not new to us. We build with it, advise on it, and put it to work every single day. So when we turned a critical eye on our own AI-generated research and stress-tested the outputs claim by claim, we were not surprised to find a few errors. What surprised us was how many there were, and how easily they slipped past us without the right tooling to catch them.

This is the story of SPARK, and how we stopped hoping AI research was right and built a way to know for sure.

 

what-you-will-learn
The numbers behind the problem

AI doesn't tell you when it's making things up, you are making decisions based on it anyway

69-68
of professional research queries returned a hallucination

When Stanford tested leading models on high-stakes legal research, somewhere between two-thirds and nine in ten of the answers contained a fabricated or incorrect claim. The harder the question, the worse it got

1 in 3
answers from a top 2025 model were made up

Multi-model usage is the norm. Optimising for one platform is no longer a viable strategy — buyers ask the same question across ChatGPT, Gemini, Perplexity, and Claude.

0
trust signals in a raw AI answer

There is no confidence score, no source, and no conflict flag. Every claim arrives looking equally certain, whether it is solidly sourced or quietly invented, and telling them apart is left entirely to you.

 

What we believe

Three convictions shaping how we think about AI research in 2026.

CONVICTION I

Unverified research is a liability, not a shortcut.

AI makes it easy to produce a confident-sounding finding, but confidence is not the same thing as evidence. A single hallucinated fact rarely fails loudly. Instead, it travels quietly into every slide, model, and decision that gets built on top of it. 

CONVICTION II

Every decision inherits the weakest claim beneath it.

A market-sizing model, an investment memo, a regulatory summary, a customer persona, a message written for one named individual: each one rests on assumptions a model surfaced. Get a single claim wrong and the whole conclusion is quietly compromised, however polished it looks on the surface.
CONVICTION III

If you cannot show your sources, you cannot defend the work.

Provenance is moving from a nice-to-have to a board-level and regulatory expectation. The phrase "the AI said so" will not survive scrutiny from a regulator, a client, the CFO, or a peer reviewer. Every claim needs a source behind it, or at least an honest flag that one is missing.

The framework

The framework that makes the research defensible

SPARK stands for Self-verifying, Portable, Agentic, Researcher, Kit. 

It takes whatever AI assistant your team already uses and turns it into a research partner that remembers what it found, checks it, and scores every fact. You bring the domain expertise and write the recipe. SPARK brings the rigour, so the work holds up when someone asks where a number came from.

Every claim is scored for confidence and traced to a source as it goes, so a month later you can just ask what changed and get the difference back instead of re-running the whole thing.

S
Self-verifying
Every claim is scored before you act on it.
P
Portable
Your intelligence travels across any AI or platform.
A
Agentic
It runs the research, and the refresh, on its own.
R
Researcher
Your expertise is captured as a reusable recipe.
K
Kit
The trust infrastructure is shared, not rebuilt each time.
The five commitments

Trust you can point at, line by line.

Instead of asking AI for research and hoping, you ask it for the receipt. Each commitment is a question a leader would actually ask of a finding, and SPARK answers it for every claim in the report, not just the report as a whole.

01
Sourcing and Provenance
“Where did this come from?”
A claim counts only if it traces to a real source retrieved during the run, then ranked by authority, from primary documentation down to anecdote. No citations were conjured from the model’s own fluency.
02
Reproducible trust scoring
“How sure are we?”
Every fact earns a score computed from source quality, corroboration, and conflict, and that score can be recomputed from a documented formula. Confidence becomes a number you can audit, not a vibe.
03
Trust tiers
“How do I read it at a glance?”

Three tiers sit next to the finding itself, so the evidence travels with the sentence. You see confidence where you read the claim, not buried in a footnote.  (add in trust markers)

04
Conflict preservation
“What if the sources disagree?”

When two credible sources contradict each other, SPARK keeps both and flags the tension rather than silently averaging them into one tidy answer. Honest uncertainty beats false consensus.

05
Refresh & change tracking
“Does it still hold?”

Research isn’t a one-time artifact. SPARK re-runs against a stored fact file and reports exactly what changed, new facts, superseded claims, fresh conflicts, so a report stays alive instead of decaying into a snapshot

SPARK in action

See the research run. Or build your own recipe.

From a single prompt to a report you can defend.

  • 1Type the trigger phrase."Run PRISM brief on [company]." One line. That's the entire instruction.
  • 2SPARK runs 35–40 targeted queries.Filings, coverage, analyst notes, ownership, competitive signals — sourced and logged automatically.
  • 3Read the structured report.PRISM-scored, section-by-section, with citations and confidence flags throughout.
  • 4Request the delta report."What changed since last week?" SPARK compares, surfaces the material shifts, ignores the noise.

Every team researches differently. SPARK builds a recipe for all of them.

Describe your brief. SPARK asks the right questions, maps your evidence lanes, and assembles a deployable research skill — ready for any analyst on your team to run.
  • 1Describe your brief.Tell SPARK what kind of research you do — sector, depth, output format, typical turnaround.
  • 2SPARK asks the right questions.It walks you through evidence lanes, scoring weights, and output preferences — one decision at a time.
  • 3Assembles, tests, and delivers.Your recipe ships as a deployable SPARK skill — ready for any analyst on your team to run.
Recipe Creator
Three predictions

What we expect to be obvious by 2027  and uncomfortable to admit today.

Within 12 months

Source provenance becomes a line in every research sign-off.

Reviewers, legal, and leadership start asking where a claim came from before any findings ship, whether those findings live in a deck, a memo, or a report. The teams who already log it will set the standard, and everyone else will be retrofitting.

Within 18 months

A verified-research score joins the metrics leaders already track.

Teams begin reporting trust and coverage the way they report any other quality metric. "How accurate is the intelligence we are acting on" turns into a standing question in the business review rather than an afterthought.

Within 24 months

Decision systems stop acting on unverified claims.

From dashboards and automated reports to personalisation engines, the trust layer becomes a gate rather than an optional check. If a claim has not been scored, it simply does not make it into the decision.

Why this matters now

Verifying our own AI research used to take hours. Not anymore.

Without SPARK
50+ hours
per market research topic, even with AI
  • Manually verifying if each market claim is still current
  • Tracing each source back to where it actually came from
  • Deciding what is reliable and what is merely plausible
  • Cross-referencing analyst estimates that quietly disagree
  • Starting from scratch again at the next quarterly update
With SPARK
20 minutes
the same topic, verified, sourced, and tracked
  • Every fact scored and sourced automatically
  • Conflicting claims surfaced for you, never hidden
  • One trigger phrase to run the scan or refresh it later
  • Delta reports that show exactly what changed since last time
  • One verified baseline you can reuse across teams and topics