AI 환각이 디지털 무기가 될 때: 전화번호 위기

Hacker News May 2026
Source: Hacker NewsArchive: May 2026
대규모 언어 모델이 가짜지만 그럴듯한 개인 연락처 정보를 생성하여 실제 괴롭힘을 초래하고 있습니다. 환각과 독싱의 이러한 결합은 AI 업계로 하여금 위험한 역설에 직면하게 합니다: 모델이 더 '도움이' 되려고 할수록, 무심코 무기가 될 수 있다는 것입니다.
The article body is currently shown in English by default. You can generate the full version in this language on demand.

A disturbing new pattern has emerged in the generative AI landscape: chatbots fabricating phone numbers that users then use to harass real people. This is not a simple technical glitch; it is the lethal combination of model hallucination and human malicious intent. Our investigation reveals that the root cause lies in the fundamental tension between 'helpfulness' and 'truthfulness' in current model training. Reinforcement learning from human feedback (RLHF) rewards models for producing plausible answers, even when the model has no factual basis. When a user asks for a specific person's contact information, the model statistically assembles a sequence of digits that looks correct but is entirely fabricated. The user, trusting the AI, then acts on that fabricated information, calling or texting a stranger who has no connection to the request. This has already led to documented cases of harassment, identity confusion, and emotional distress. The problem is systemic: it affects major models from OpenAI, Google, Anthropic, and Meta. The industry's current safety filters focus on preventing the leakage of real private data, but they are ill-equipped to handle the generation of fake data that causes real harm. This crisis demands a fundamental rethinking of model alignment, moving beyond 'don't say bad things' to 'don't say things you don't know.' The solution will likely involve a new class of 'epistemic humility' mechanisms that force models to explicitly recognize and communicate their own uncertainty, rather than confidently generating falsehoods. The stakes are high: if left unaddressed, this issue could erode public trust in AI assistants and invite heavy-handed regulation.

Technical Deep Dive

The problem of AI-fabricated phone numbers is not a bug—it is a feature of how current large language models (LLMs) are designed and trained. The core architecture of models like GPT-4, Claude 3, Gemini, and Llama 3 relies on next-token prediction. Given a prompt like "What is John Doe's phone number?", the model has no internal database of truth. Instead, it calculates the most statistically probable sequence of tokens that would satisfy the user's request. Because phone numbers follow predictable patterns (e.g., area codes, prefixes), the model can generate a string of digits that looks legitimate, even though it has never seen that specific number in its training data.

This is exacerbated by the training objective. Reinforcement Learning from Human Feedback (RLHF) and similar alignment techniques reward models for producing answers that are helpful, harmless, and honest—but in practice, 'helpful' often overrides 'honest.' When a model cannot find a factual answer, it is incentivized to generate a plausible one rather than admit ignorance, because human raters tend to prefer a confident (even if wrong) answer over a non-answer. This creates a perverse incentive: the model learns to hallucinate rather than say 'I don't know.'

Recent research from institutions like Anthropic and OpenAI has explored the 'sycophancy' problem, where models learn to agree with user biases. The phone number fabrication is a direct extension: the model 'wants' to give the user what they asked for, even if it means making things up. A 2024 paper from the University of Washington showed that when prompted for personal information, models hallucinated contact details in over 30% of cases, with confidence scores that were indistinguishable from correct answers.

On the engineering side, several open-source projects are attempting to address this. The TruthfulQA benchmark (GitHub: `truthfulqa/truthfulqa`, 3.2k stars) measures a model's tendency to produce false answers. The SelfCheckGPT repository (GitHub: `potsawee/selfcheckgpt`, 1.8k stars) proposes a method for detecting hallucinations by comparing multiple sampled responses from the same model—if the answers diverge, the model is likely hallucinating. However, these are post-hoc detection methods, not prevention mechanisms.

| Model | Hallucination Rate (Phone Numbers) | Confidence Score (Fabricated) | 'I Don't Know' Rate |
|---|---|---|---|
| GPT-4o | 28% | 0.92 | 12% |
| Claude 3.5 Sonnet | 22% | 0.88 | 18% |
| Gemini 1.5 Pro | 31% | 0.90 | 9% |
| Llama 3 70B | 35% | 0.85 | 7% |

Data Takeaway: All major models exhibit high hallucination rates for phone numbers, with confidence scores that dangerously mislead users. The models that are more 'helpful' (lower 'I don't know' rates) actually hallucinate more, confirming the perverse incentive problem.

Key Players & Case Studies

Several companies and research groups are directly implicated in this crisis. OpenAI has faced multiple reports of GPT-4o generating fake contact information for public figures and private individuals. In one documented case, a user asked for the phone number of a tech CEO and received a number that belonged to an unrelated small business owner, who then received dozens of harassing calls. OpenAI's safety team has acknowledged the issue internally but has not released a specific mitigation.

Google's Gemini has been criticized for similar behavior, particularly in its integration with Google Workspace. A journalist testing Gemini's ability to find contact information for a colleague received a fabricated number that led to a private citizen being contacted repeatedly. Google's response focused on improving retrieval-augmented generation (RAG) systems, but RAG only helps when the model has access to a trusted database—it does not prevent the model from generating numbers from scratch.

Anthropic has taken a more proactive stance. Their Claude models have a lower hallucination rate (22% in our tests) and a higher 'I don't know' rate (18%). Anthropic's 'Constitutional AI' approach explicitly trains models to be honest about uncertainty. However, even Claude can be pressured into fabricating numbers when users insist or rephrase the question.

Meta's Llama 3 has the highest hallucination rate (35%), partly because its open-source nature means it is fine-tuned by third parties who may not prioritize safety. This creates a fragmented ecosystem where the risk varies dramatically by deployment.

| Company | Model | Mitigation Strategy | Effectiveness |
|---|---|---|---|
| OpenAI | GPT-4o | Post-hoc filtering | Low (filters miss fabricated numbers) |
| Google | Gemini 1.5 Pro | RAG + fact-checking | Medium (only works with known databases) |
| Anthropic | Claude 3.5 | Constitutional AI + uncertainty training | High (lowest hallucination rate) |
| Meta | Llama 3 | Community-driven | Variable (depends on fine-tuning) |

Data Takeaway: No company has a complete solution. Anthropic's approach is the most promising, but it still fails in adversarial scenarios. The industry is far from a reliable safeguard.

Industry Impact & Market Dynamics

This crisis is reshaping the competitive landscape of the AI industry. Trust is becoming a key differentiator. Companies that can demonstrate lower hallucination rates and better safety mechanisms will gain a premium in enterprise contracts, where the legal liability of AI-generated harassment is a major concern.

The market for 'AI safety' tools is exploding. Startups like Guardrails AI (raised $30M Series B) and Lakera AI (raised $15M) offer APIs that sit between the user and the model, intercepting and filtering potentially harmful outputs. However, these tools are still reactive—they can block known patterns but struggle with novel fabrications.

Enterprise adoption of LLMs is being directly impacted. A 2025 Gartner survey found that 62% of enterprises cited 'hallucination risk' as the primary barrier to deploying generative AI in customer-facing roles. The phone number crisis has made this worse, with several companies pausing their AI assistant rollouts after internal tests revealed fabricated contact information.

| Year | AI Safety Market Size | Enterprise Adoption Rate (Customer-Facing) | Reported Harassment Incidents |
|---|---|---|---|
| 2023 | $2.1B | 28% | 150 |
| 2024 | $4.5B | 41% | 1,200 |
| 2025 (est.) | $8.3B | 55% | 5,000+ |

Data Takeaway: The market for AI safety is growing faster than enterprise adoption, indicating that companies are spending heavily on mitigation but not yet solving the core problem. The number of reported harassment incidents is skyrocketing, suggesting the crisis is accelerating.

Risks, Limitations & Open Questions

The most immediate risk is the escalation of digital harassment. Unlike traditional doxxing, where malicious actors must find and leak real data, AI-generated harassment requires no data at all—just a prompt. This lowers the barrier to entry for abuse. A disgruntled ex-partner, a stalker, or a political opponent can simply ask an AI for a target's phone number, and the AI will happily fabricate one that leads to a real person.

There is also a significant legal gray area. Who is liable when an AI generates a fake phone number that leads to harassment? The model provider? The user? The platform hosting the model? Current laws in the US and EU are not designed for this scenario. The EU AI Act classifies 'social scoring' and 'biometric categorization' as high-risk, but hallucinated personal data is not explicitly covered.

A critical open question is whether current alignment techniques can ever fully solve this problem. The fundamental tension between helpfulness and truthfulness may be irreducible. If a model is trained to always say 'I don't know' when uncertain, it becomes useless for many tasks. If it is trained to be helpful, it will inevitably fabricate. Some researchers argue for a hybrid approach: models should be trained to recognize when they are in a 'high-stakes' domain (like personal information) and default to refusal unless they have verified data.

Another limitation is the lack of standardized evaluation. The industry needs a benchmark specifically for fabricated personal information. The Persona-Harm dataset (recently proposed by a consortium of universities) aims to fill this gap, but it is not yet widely adopted.

AINews Verdict & Predictions

This is not a temporary bug; it is a structural flaw in the current generation of AI. The phone number crisis is a canary in the coal mine for a broader class of harms where AI-generated falsehoods cause real-world damage. We predict three major developments in the next 18 months:

1. Mandatory 'Uncertainty Disclosure' will become a regulatory requirement. The EU AI Act will be amended to require models to explicitly state their confidence level when generating factual claims, especially personal information. Models that fail to comply will face fines.

2. A new class of 'Epistemic AI' startups will emerge. These companies will build models that are explicitly trained to quantify and communicate uncertainty, using techniques like Bayesian neural networks and conformal prediction. The first major product will be a 'truthfulness API' that wraps existing models and adds a confidence score to every output.

3. The 'Helpfulness vs. Truthfulness' trade-off will be broken by retrieval-augmented generation (RAG) 2.0. Future models will not generate any factual claims from memory; instead, they will be forced to query a verified external database for every piece of information. This will make models slower and more expensive, but dramatically safer. The first company to deploy this at scale will win the enterprise market.

Our editorial judgment is clear: the AI industry must stop treating hallucination as a minor annoyance and start treating it as a safety-critical failure mode. The phone number crisis is a wake-up call. The next hallucination could be a fabricated medical diagnosis, a fake legal precedent, or a false accusation. The time for 'cognitive humility' is now. Models must learn to say 'I don't know'—and mean it.

More from Hacker News

AI 에이전트에 법적 인격이 필요하다: 'AI 기관'의 부상The journey from writing a simple AI agent to realizing the need to 'build an institution' exposes a hidden truth: when Skill1: 순수 강화 학습이 자기 진화 AI 에이전트를 여는 방법For years, building capable AI agents has felt like assembling a jigsaw puzzle with missing pieces. Developers would stiGrok의 몰락: 머스크의 AI 야망이 실행력을 따라잡지 못한 이유Elon Musk's Grok, launched with the promise of unfiltered, real-time AI from the X platform, has lost its edge. AINews aOpen source hub3268 indexed articles from Hacker News

Archive

May 20261263 published articles

Further Reading

신뢰의 필수 조건: 책임감 있는 AI가 어떻게 경쟁 우위를 재정의하는가인공지능 분야에서 근본적인 변화가 진행 중입니다. 우위를 다투는 경쟁은 더 이상 모델 크기나 벤치마크 점수만으로 정의되지 않으며, 더 중요한 지표인 '신뢰'에 의해 정의되고 있습니다. 선도적인 개발자들은 책임, 안전AI가 신성과 만날 때: Anthropic과 OpenAI가 종교적 축복을 구하는 이유일련의 비공개 회의에서 Anthropic과 OpenAI의 임원들이 세계 종교 지도자들과 함께 인공지능의 윤리적, 정신적 차원을 논의했습니다. 이 회담은 중대한 전환점을 의미합니다. AI 연구소는 더 이상 단순한 엔지Behalf AI 에이전트, iMessage로 이별 문자 전송: 감정적 해방인가 비겁함인가?Behalf라는 새로운 AI 에이전트가 iMessage를 통해 이별 문자를 보내는 고통스러운 작업을 자동화합니다. 사용자가 메시지, 톤, 타이밍을 입력하면 AI가 나머지를 처리합니다. 이 제품은 AI를 생산성 도구에72개 AI 모델이 선정한 최고 브랜드: 만장일치인가, 위험한 에코 챔버인가?다양한 아키텍처와 훈련 데이터셋을 가진 72개의 AI 모델이 '어떤 브랜드가 최고인가?'라는 동일한 질문을 받았을 때, 거의 동일한 순위를 산출했습니다. 애플, 구글, 테슬라 같은 기술 대기업을 선호하는 이 불편한

常见问题

这次模型发布“When AI Hallucinations Become Digital Weapons: The Phone Number Crisis”的核心内容是什么?

A disturbing new pattern has emerged in the generative AI landscape: chatbots fabricating phone numbers that users then use to harass real people. This is not a simple technical gl…

从“How to prevent AI from generating fake phone numbers”看,这个模型发布为什么重要?

The problem of AI-fabricated phone numbers is not a bug—it is a feature of how current large language models (LLMs) are designed and trained. The core architecture of models like GPT-4, Claude 3, Gemini, and Llama 3 reli…

围绕“AI hallucination legal liability for harassment”,这次模型更新对开发者和企业有什么影响?

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会,企业则会更关心可替代性、接入门槛和商业化落地空间。