Why Skepticism Is Growing Around Large Language Models (LLMs)

Large language models (LLMs)—AI systems like GPT-4, Claude, and others that generate text, answer questions, translate, etc.—have been hailed as revolutionary. But “god-like” expectations are being pulled back as their limitations become more visible. While the Economist article outlines many of these, several additional points and trends are also important to understand.

A bearded man with digital binary code projected on his face, symbolizing cybersecurity and technology.

What the Original Article Covers (Briefly)

High Expectations vs Reality: Many users and developers expected these models to push boundaries toward general intelligence; but results are mixed, especially in tasks needing deep reasoning, precision, or domain expertise.
Growing Critique: Experts are pointing out hallucinations (false or misleading outputs), lack of robust reasoning, and overconfidence in models.
Emerging Alternatives: There’s interest in smaller, more focused models; efforts to improve reliability, guardrails, transparency; also more scrutiny from regulators, academic researchers, and the public.

What’s Often Under-Reported or Needs More Emphasis

Here are dimensions the article didn’t fully explore or that are emerging strongly in parallel, which help explain why confidence is waning:

Benchmarking & Overfitting to Tests
Many LLMs perform impressively on standardized benchmarks but poorly when faced with novel, ambiguous, or adversarial tasks. Overfitting to public benchmarks means success in the lab doesn’t always translate to robust performance in deployment.
Abstract Reasoning & Generalization Weaknesses
While LLMs are strong at language prediction, pattern matching, and tasks with abundant training examples, their performance drops with abstract reasoning, causal inference, understanding counterfactuals, or dealing with unusual combinations of inputs.
Bias, Hallucinations, and Social Risk
False or misleading outputs (hallucinations) are still a major challenge, especially in high-stakes contexts (medical, legal, scientific). Bias (cultural, political, linguistic) can creep in via training data. Users may overtrust models, which presents risk.
Environmental & Economic Costs
Large models require significant computational resources for training, maintenance, updates. That implies energy costs, carbon footprint, hardware demands, which make building/training/hosting them expensive—and whose benefits may diminish as returns taper. Also, smaller, specialized models are looking more appealing in many corporate settings because of speed, cost, control.
Regulation, Governance, and Trust Issues
As adoption grows, so does concern over how these models are governed: Who controls data sources? How are outputs audited for fairness or correctness? What recourse do users have when an AI model causes harm (misinformation, defamation)? Regulatory frameworks are still fragmented.
Expectation vs Use-Case Mismatch
Many early expectations leaned toward models being like “thinking machines” or “AI assistants” with generalist intelligence. But users often need domain-specific reliability or consistency. When LLMs are used outside of their strengths (e.g., nuanced policy drafting, ethical judgments, scientific proofs), problems become more visible.

Why Faith Is Waning: Underlying Drivers

Putting all these pieces together, here are the causes of declining “faith”:

Repeated Surprises / Failures: When high expectations repeatedly hit boundaries, confidence erodes. Public examples of failures amplify this.
Hype vs Maturity Gap: The rapid marketing, media coverage, venture funding, and promotional rhetoric have created a gap between what’s promised (or implied) and what’s reliably delivered.
Competitive Pressure: As more players enter with smaller, cheaper, easier-to-control models, the huge monolithic model approach seems less universally optimal.
User Fatigue & Trust Erosion: Users growing wary of misinformation, overconfidence in “AI answers,” challenges in getting consistent, trustworthy and transparent behavior from LLMs.

A woman with laser lines on her face highlighting facial recognition technology, set against a plain white background.

What Could Rebuild Confidence—or Adjust It Responsibly

Here are some paths forward that may help restore a more grounded, realistic trust in LLMs:

Improved Evaluation & Transparency: Better benchmarks, more real-world stress tests, open disclosures of training data or domain limitations.
Hybrid Models & Complementary Systems: Using LLMs alongside symbolic reasoning, domain-specific models, human oversight.
Smaller / Specialized Models: Models tailored for specific tasks or domains may perform more reliably and inexpensively.
Robust Guardrails: Better uncertainty quantification, “don’t know” outputs rather than confident gibberish; mechanisms to audit truthfulness, bias.
Regulatory Oversight: Standards, liability frameworks, ethics boards, possibly “AI safety” infrastructures so that harms can be mitigated.
User Education: Helping users understand what LLMs can and can’t do; promoting “healthy skepticism” rather than blind reliance.

FAQs: What People Commonly Ask

1. What do we mean by “god-like” LLMs?
A shorthand for expectations that some have placed on LLMs: that they could mimic human intelligence broadly, reason like humans, make little to no mistakes, understand deeply, etc. In practice, LLMs are extremely powerful in many ways, but they fall short of many “superhuman” expectations, especially in reasoning, consistency, ethics, and domain-specific contexts.

2. What is a “hallucination” in LLMs?
When an AI model generates incorrect or ungrounded information—statements that sound plausible but are false. Could be wrong facts, invented references, or misleading outputs. This is still one of the biggest reliability issues.

3. Are LLMs getting worse, or are people just noticing their limits more?
More the latter. LLMs keep improving—on size, capacity, capability—but as they’re used more in “real world” settings (outside labs), failures and limits become more evident, especially under ambiguous or adversarial conditions.

4. Do smaller models solve these problems?
Not all of them, but many. Smaller or specialized models often cost less, are quicker, more efficient, easier to control, and can be tuned more precisely for specific tasks. For many applications, they may offer better cost-benefit tradeoffs than giant generalist LLMs.

5. Should we stop dreaming of AGI (Artificial General Intelligence)?
Probably not stop dreaming, but adjust expectations. The path to AGI (if it happens) looks likely more complex than simply scaling up LLMs. Other architectures, methods of reasoning, maybe different paradigms (symbolic AI, neuro-symbolic hybrids, etc.) may be needed.

6. What are the risks if we keep overtrusting LLMs?
Lots: misinformation; errors in critical settings like healthcare, law, policy; reinforcing biases; economic and social harms (e.g. job displacement or mis-allocation); environmental costs; erosion of public trust in AI broadly.

7. Can regulation help, and what might it look like?
Yes, regulation can help. Possible elements include transparency about training data and sources, standards for safe/ethical behavior, liability when AI causes harm, truth/audit requirements, possibly certifications for reliable models.

Conclusion: What We Should Believe In, and How to Proceed

It’s healthy that “faith” in LLMs is waning—not because LLMs aren’t powerful, but because reality is catching up to hype. The greatest danger is when either enthusiasts or critics overshoot: claiming LLMs are perfect, or that they’re useless. The future likely lies in pragmatic realism: using LLMs where they shine; guarding where they fail; combining them with other systems; and maintaining oversight and humility.

For users, developers, and regulators, the goal isn’t to reject LLMs—but to reshape our relationship with them: less awe, more understanding; less blind belief, more informed trust.

Two scientists in lab coats analyzing a robotic arm in a laboratory setting.

Sources The Economist