The hallucination problem

Talk to an expert

Select Language

English

As artificial intelligence makes deeper inroads into manufacturing - from design assistance and quality control to predictive maintenance and procurement - large language models (LLMs) like ChatGPT or Claude often look like a shortcut to productivity. Yet a major barrier to their dependable adoption is hallucination - the tendency of an LLM to generate plausible-sounding but incorrect or entirely fabricated information.

Hallucinations arise because LLMs aren’t logic engines or fact checkers; they are probabilistic text models trained to predict the next token. When the model lacks confidence or context, it can “fill in gaps” with invented or partially inaccurate content.

Researchers at Cornell call output that diverges from verifiable ground truth “extrinsic hallucinations,” while contradictions within a model’s internal knowledge are “intrinsic hallucinations”.

In sectors like manufacturing, where workflows rely on standards, specifications, compliance documents and precise facts, the risks are real. Imagine an LLM tool advising design teams on material tolerances or supply chains and confidently delivering a spec that doesn’t exist or misattributing a revision change. Left unchecked, it’s hugely problematic.

You could end up with production defects, regulatory exposure or incorrect ordering. Indeed, some estimates suggest that even state-of-the-art models hallucinate between 2.5 % and 8.5 % of the time, and in complex domains the error rate can exceed 15 %, according to research by Vectara, who operate a Hallucinations Leaderboard.

The risks are such that a recent media report flagged the AI “Overview” feature used in Google search as producing “confidently wrong” answers that mislead users and divert traffic from legitimate sources.

For manufacturing decision support, that kind of error margin is intolerable. What’s worse, even when models are augmented with external knowledge sources, hallucination persists, especially if the query is loosely anchored or if the underlying retrieval is weak.

Mitigation efforts exist: better prompt design, chain-of-thought reasoning, output filtering, human in the loop review, domain fine-tuning. But none eliminate hallucinations entirely.

However, one promising architectural approach is retrieval-augmented generation (RAG), where the model draws from a curated, validated set of documents in real time. By grounding responses in known source content, you restrict the model’s freedom to “invent”.

RAG works by combining a search engine and a language model in one process: the retrieval component identifies and extracts the most relevant documents or passages related to a query, while the generation component uses that retrieved information to construct an answer.

This means the model’s responses are not based on general statistical likelihoods but are supported by specific, verifiable evidence from trusted materials. For highly regulated or precision-driven industries like manufacturing, this approach is critical: ensuring that AI outputs align with the technical and compliance realities of production, safety and supplier management.

How VendorPilot addresses hallucination risk

Bendi’s VendorPilot is architected to prevent hallucinations by design. Unlike generic LLM tools, it only draws answers from source documents the enterprise supplies such as specifications, technical documents, vendor reports and internal policies. Because VendorPilot never ventures beyond your domain material, the risk of fabricating out-of-domain content is greatly minimised.

Additionally, VendorPilot implements a RAG agent: when a query arrives, it retrieves relevant documents into the LLM’s context before prompting it to generate an answer. In effect, the model is not guessing from general knowledge but being tightly ‘scaffolded’ by your specific source content.

This combination of both speed and accuracy is critical. This narrowing of context significantly reduces ambiguity and thereby reduces the chance of hallucinations slipping through. By combining controlled document ingestion with RAG architecture, VendorPilot offers supply chain users a more reliable, auditable AI assistant. One that plays within bounds rather than venturing into creative fiction.

To explore how VendorPilot can help your business harness AI safely and accurately, get in touch with the Bendi team today. Whether you’re streamlining supplier data, improving compliance reporting or reducing manual workload in procurement, VendorPilot ensures every answer your teams receive is grounded in the truth of your own documents.

(Photo credit: Aedrian Salazar via Unsplash)

Nov 7, 2025

More like this

From investment to insight: how AI can clear tech’s supply chains

Get in touch:

Get in touch:

Get in touch:

Talk to an expert

Select Language

English

Talk to an expert

Select Language

English