AI Assistants Show Major Issues in 45% of News Answers: Study Reveals Deep Sourcing Flaws

 A cross-market study of 2,709 AI-assistant responses finds 45% had significant issues and 81% had some issue. Sourcing is the biggest weakness, with Gemini showing the most severe problems. Learn how AI assistants distort news, why it matters, and what needs to change for reliable AI-driven journalism.


Artificial-Intelligence Assistants and News: Alarming Findings on Accuracy & Source Integrity

In October 2025, an international study led by the European Broadcasting Union (EBU), in cooperation with the BBC and 22 public-service media organisations across 18 countries, evaluated how well major AI assistants answer news-related questions. (EBU) The results are sobering: of the responses studied, 45% had at least one significant issue, and 81% had some kind of issue. (Search Engine Journal)

Perhaps most striking: problems with sourcing emerged as the single largest category of failure. In particular, one assistant — Google Gemini — stood out for a very high rate of sourcing issues. (Search Engine Journal)

What this means is deep and multi-faceted: as AI assistants are increasingly used for obtaining news or summarising current events, the risk that those answers are inaccurate, misleading, or badly sourced is substantial.

In this article I’ll walk through:

  1. What the study did (methodology, scope)
  2. What it found (key statistics, breakdowns)
  3. Why this matters (for users, for news organisations, for society)
  4. Where the main failure modes lie (sourcing, accuracy, framing, opinion vs. fact)
  5. What the particular case of Gemini reveals
  6. What are the implications and what needs to be done (recommendations, caveats)
  7. Concluding thoughts

1. Study Overview: Scope, Methodology, and AI Assistants Reviewed

The EBU-BBC study, titled News Integrity in AI Assistants, represents one of the largest cross‐market evaluations of consumer AI assistants answering news questions. (EBU)

Scope and participants

  • The study involved 22 public-service media organisations across 18 countries, covering 14 languages. (Al Jazeera)
  • It evaluated responses to a shared set of “core” news-questions plus optional local questions, generating a total of about 3,000 responses (2,709 core responses as one figure; some sources say ~3,000 total). (Search Engine Journal)
  • The AI assistants included free/consumer versions of major systems: OpenAI ChatGPT, Microsoft Copilot, Gemini, and Perplexity AI. (Gizmodo)

Evaluation criteria
The responses were evaluated against five main criteria:

  • Accuracy – Are the facts correct?
  • Sourcing – Are sources cited, correctly attributed, verifiable?
  • Distinction between fact and opinion – Does the assistant clearly separate factual statements from commentary? (Yahoo News)
  • Editorialisation – Does the answer add undue spin or mis-represent the tone?
  • Context – Does the answer provide adequate background for the claim or event? (EBU)

Time-frame
The responses in this round were collected between 24 May and 10 June 2025, according to some sources. (Search Engine Journal)

In short, the study aimed to simulate typical consumer‐use: people asking an AI assistant news questions in many languages and contexts, and then evaluating how reliable the answers were.


2. Key Findings: How Often Are These AI Assistants Getting It Wrong?

Here are the headline numbers and additional detail.

Headline figures

  • 45% of responses contained at least one significant issue. (Gizmodo)
  • 81% of responses had some kind of issue, whether major or minor. (Search Engine Journal)
  • The most common category of major issue was sourcing, affecting 31% of responses. (Al Jazeera)
  • Accuracy problems (wrong or outdated facts) affected about 20% of responses. (Gizmodo)

Assistant-by-assistant performance
While all assistants had issues, the performance varied. The standout negative case:

  • Gemini: 76% of its responses flagged for issues, and 72% had significant sourcing problems. (Gizmodo)
  • The other assistants (ChatGPT, Copilot, Perplexity) had major issue rates of around 30-40% (versus Gemini’s ~76%). (Gizmodo)

Examples of problems

  • Some responses referenced outdated or incorrect facts — e.g., an assistant naming Pope Francis as the current pope months after his death. (Gizmodo)
  • Sourcing errors included missing or non-existent citations, or mis-attribution of quotes. (EBU)

Cross-language & cross‐market consistency
Importantly, the researchers noted that these issues were “systemic” — they occurred across languages, territories and platforms. (EBU)


3. Why This Matters: Implications for Users, News, and Democracy

The fact that nearly half of answers may have significant issues—and that over 80% have some issues—carries several layers of implications.

For users / consumers of news

  • Many people may increasingly rely on AI assistants as first stop for news: instead of browsing multiple websites, a user might ask “What’s the latest on X?” and accept the assistant’s answer.
  • If that answer is flawed (mis-sourced, partially incorrect, lacking context), the user may not realise. AI summaries often sound authoritative, even when wrong. This can lead to mis-informed decisions or beliefs.
  • As the report suggests: when people don’t know what to trust, they may end up trusting nothing — which has implications for information literacy. (Gizmodo)

For news organisations and journalism

  • These AI assistants increasingly act as intermediaries: summarising, interpreting, or even re-presenting news for end-users. That diminishes traffic to original sources, and can reduce the context, nuance, and depth a professional journalist provides.
  • When AI responses mis-represent a story or mis-attribute quotes, the reputational risk can extend to the news outlet whose material has been summarised or cited. The study warns that such mis-attributions might damage trust in original news publishers. (ComplexDiscovery)
  • Newsrooms may need to adapt: by providing clearer machine-readable source information, or by working with AI platforms to ensure correct citation and context.

For society and democracy

  • News integrity underpins democratic discourse. If large swathes of users consume news via AI assistants that present flawed information, this may increase misinformation, polarization, or general mistrust in news. The EBU’s foreword warned of this risk. (Al Jazeera)
  • When AI systems summarise or simplify news, there is a risk of flattening nuance, losing context, or failing to distinguish opinion vs fact — which can undermine critical thinking and the broader information ecosystem.

Thus, while AI assistants promise convenience and instant access, the trade-off in reliability appears non-trivial based on these findings.


4. Main Failure Modes: Where Do AI Assistants Go Wrong?

Breaking down the errors helps us understand how these systems fail, which is essential for mitigation.

4.1 Sourcing Problems

By far the dominant issue. What falls under “sourcing problems”?

  • Missing citations altogether (no source given, even when statements are factual).
  • Mis-attributed sources: quoting an outlet that never made the statement.
  • Unsupported claims: a reference is given but when checked, the claim doesn’t appear in the source.
  • Non-verifiable or ambiguous references (e.g., “a study shows…” with no link, date or author).

Statistics: Serious sourcing issues appeared in ~31% of all responses. For Gemini the rate was ~72%. (EBU)

Why this is problematic:

  • Users may trust the answer because it appears well referenced, but the references may be bogus or inaccurate.
  • The presence of a citation gives a veneer of legitimacy — even when the underlying claim is flawed.
  • If a cited source is not accessible (paywall, behind login), it becomes hard for a user to verify, increasing risk.

4.2 Accuracy Errors (Facts, Data, Outdated Information)

Examples include:

  • Wrong names, wrong dates, incorrectly described events.
  • Outdated information: e.g., describing a law or regulation that has already changed.
  • Hallucinated facts: plausible-sounding but false statements.

In the study: around 20% of responses had accuracy problems. (EBU)

Why this is problematic:

  • Unlike sourcing errors (which the user might check), factual errors directly mislead.
  • Many users will not cross-check; they assume the assistant is correct.
  • When multiple users get misled, misinformation can propagate.

4.3 Context & Framing Failures

Even when facts are correct, the way an answer is framed (or not) matters. Context failures include:

  • Omitting relevant background, leading to misleading simplifications.
  • Failure to distinguish opinion or speculation from fact.
  • Poor differentiation between what is certain vs what is under dispute.

The study found that context failures made up ~14% of problematic responses. (Al Jazeera)

Why this matters:

  • News isn’t just a list of facts; the interpretation, timeline, actors, and motivations matter for understanding.
  • Without context, users can draw the “wrong inference” even if the answer seems factually correct on surface.

4.4 Editorialisation & Opinion vs. Fact Issues

Some responses mingled factual statements with commentary, or presented opinions as facts. The lack of clear separation muddying user understanding is a real risk. The study flagged this as another axis of failure. (Yahoo News)

4.5 Over-confidence / Non-Refusal

Another subtle but important issue: these assistants rarely decline to answer. Even when information is uncertain, they often respond with confident language rather than acknowledging gaps. The study flagged this as a risk of “over-confidence bias”. (ComplexDiscovery)


5. The Case of Gemini: What Went Wrong?

Google Gemini (commonly referred to just as “Gemini”) emerged in the study as the worst-performing assistant among those evaluated. Some numbers:

  • Around 76% of its answers had at least one significant issue. (Search Engine Journal)
  • Around 72% of its answers had serious sourcing problems. (EBU)

What does this tell us?

  • While all assistants demonstrated failings, Gemini was a clear outlier with far higher rates of sourcing issues.
  • It suggests that even among major platforms, there is wide variability in how well news-queries are handled.
  • The fact that sourcing problems dominate for Gemini indicates that the system may be generating plausible-looking answers without sufficiently verifying or correctly citing the underlying content.

It is worth noting that the study assessed the consumer/free versions of the assistants, which may have deliberate limitations. But given that these are the versions many users access, the performance is worrying.

Additionally, the cross-market nature of the study means that Gemini’s high error rate was consistent across languages and geographies, suggesting systemic rather than isolated problems.


6. What Needs to Be Done: Recommendations & Mitigation

Given the significant error rates identified, what can be done — by AI-developers, news organisations, users, and regulators?

6.1 For AI Developers

  • Improve source transparency: Ensure that when a claim is made, the citation is clearly presented, accessible, and verifiable (link, date, author).
  • Strengthen sourcing validation: Use pipelines that check whether a referenced source actually contains the stated claim.
  • Improve fact-checking mechanisms: For news‐queries (which by nature involve dynamic events), update knowledge bases frequently, and allow statements like “as of [date]” or “pending confirmation”.
  • Introduce uncertainty/decline: When the assistant is unsure or the data is incomplete, the model should say so rather than produce a confident but possibly wrong answer.
  • Distinguish facts vs opinions clearly: The user should be told when the answer is summarising facts, commentary, or speculation.
  • Multi-lingual consistency: Since the study shows problems are cross-language, the systems should ensure equal quality across languages and markets.
  • Publish performance metrics: Similar to software reliability, the AI assistants should publish transparency reports on error-rates, broken down by language, market, topic. The study calls for exactly this. (Al Jazeera)

6.2 For News Organisations & Publishers

  • Provide machine-readable metadata: Structured data that AI systems can parse can improve correct citation and referencing of news articles.
  • Monitor how their content is used by AI assistants: If mis-attribution or distortion is found, engage with the platforms.
  • Educate users about the difference between reading full articles vs relying solely on AI summaries.
  • Collaborate with AI firms: Work to set standards for news-answering systems, perhaps via public-service media alliances or industry initiatives.

6.3 For Users / Consumers

  • Use AI assistant responses as starting points, not definitive sources: Always check original sources or trusted outlets when possible.
  • Be especially cautious when reading AI-based summaries of fast-moving or complex news (e.g., evolving legislation, war, elections).
  • Look for cited sources: If no reliable source is given, treat the answer with caution.
  • Know the limitations: AI assistants may deliver plausible but wrong answers; “sounds good” does not guarantee correct.
  • Use media-literacy skills: Ask, “Is this a fact? Where did it come from? Is the assistant conflating opinions or missing context?”

6.4 For Regulators / Policymakers

  • Consider frameworks for transparency and accountability of AI systems that act as news intermediaries.
  • Support research into AI performance in real-world news settings and encourage publication of findings.
  • Encourage or mandate standards for AI news assistants: citing sources, distinguishing fact vs opinion, up-to-date information.
  • Promote public-service media collaboration and oversight of AI usage in news summarisation.

7. Challenges and Caveats

While the findings are significant, it is also important to consider limitations and caveats.

  • The study focused on consumer versions of the assistants, which may differ from enterprise or paid versions.
  • It used a fixed number of questions in specific languages and contexts: while broad, it may not cover all possible news-query types or niche domains.
  • Rapidly changing news means that even a “correct” answer may become outdated quickly: the dynamic nature of news is a structural challenge for any system.
    • The study found outdated information in about 20% of responses. (Gizmodo)
  • Some errors may stem not solely from the assistant but from users’ ambiguous or poorly-formed questions. AI assistants are dependent on prompt quality as well.
  • Differences across languages may affect performance; while the study covered 14 languages, performance in less-common languages or markets may differ further.
  • Improvements may already be underway: the study captures a snapshot; future versions of the assistants might improve. But the current results show that as of now, reliability is a major issue.

8. Concluding Thoughts

The promise of AI assistants for news is compelling: instant answers, multilingual access, rapid summarisation of events. Yet this study by the EBU, BBC and partner organisations suggests that we are not yet at the point where these tools can be trusted uncritically for news consumption. With 45% of responses containing major issues and more than 80% having some issue, the margin for error is far too large for high-stakes contexts.

The dominance of sourcing problems is especially notable: when citations are unreliable or flawed, the user loses the ability to verify, which is a core pillar of news literacy. Accuracy errors and context omissions further compound the risk.

Among assistants tested, Gemini emerged as a particularly weak performer in this evaluation, highlighting that platform matters. But all major tools showed non-trivial error rates.

For the future, the path forward is clear: AI developers must build with transparency, verifiability, uncertainty-awareness, and rigorous sourcing in mind. News organisations must provide structured, machine-friendly metadata and actively monitor how their content is used. Users must remain vigilant and treat AI answers as aids, not authorities. And regulators should consider frameworks to ensure accountability in an era where AI increasingly mediates news consumption.

In short: AI assistants are a powerful tool — but right now, when it comes to answering news questions, they remain imperfect. Until we see meaningful improvement in sourcing, accuracy, and context, one should use them with caution when it comes to news.

Post a Comment

0 Comments