Who Defines Right and Wrong? The Moral Vacuum at the Heart of AI

The rapid deployment of large language models has created an unprecedented moral vacuum. While the industry celebrates breakthroughs in context length, reasoning, and multimodality, a fundamental question goes unasked: who programs the values into these systems? AINews argues that the answer is not democratic deliberation but commercial optimization—user retention, legal risk mitigation, and advertising revenue. Governments, educators, and cultural leaders—the traditional 'adults'—are outpaced by the speed of technical iteration. The result is a quiet coup: values are no longer the product of social consensus but statistical correlations in training data. This article dissects the mechanisms behind this shift, examines the key players and their implicit ethical frameworks, and warns that without urgent governance innovation, technological progress will accelerate moral hollowing. We present data on model behavior across platforms, analyze the economic incentives at play, and offer concrete predictions for the next phase of this crisis.

Technical Deep Dive

The core of the value-alignment problem lies not in the architecture of transformers but in the training pipeline. Modern LLMs are built on three stages: pretraining on massive web corpora, supervised fine-tuning (SFT) on curated datasets, and reinforcement learning from human feedback (RLHF). Each stage is a vector for value injection—or omission.

Pretraining: The Statistical Morality Trap
The web is not a neutral repository. It overrepresents English, Western, male, and affluent perspectives. Common Crawl, the backbone of most open models, contains 46% English content and skews heavily toward US-based forums like Reddit and Wikipedia. When a model learns to predict the next token, it internalizes the statistical prevalence of certain viewpoints. A 2024 study from Anthropic found that models trained on unfiltered web data exhibit a 'default liberalism' on social issues—not because of intentional design, but because liberal viewpoints are statistically more frequent in online discourse. This is not alignment; it is accidental indoctrination.

RLHF: The Commercial Steering Wheel
RLHF is where values become explicit—but whose values? The 'human feedback' is typically provided by contractors in Kenya, the Philippines, or India, paid per label, working under strict time pressure. Their judgments reflect a narrow slice of humanity: young, English-literate, and economically desperate. OpenAI, Google, and Anthropic all use third-party labeling firms; none disclose the demographic breakdown of their raters. The result is a homogenized 'global average' morality that pleases no one fully and offends few enough to avoid headlines.

Constitutional AI: A Step Forward, but by Whom?
Anthropic's Constitutional AI approach attempts to codify values explicitly. Their constitution draws from the UN Declaration of Human Rights, Apple's terms of service, and internal guidelines. But a constitution written by a for-profit company in San Francisco is not a social contract. It is a product specification. When Anthropic's Claude refuses to write a story with a morally ambiguous ending, it is enforcing a corporate interpretation of harm reduction—not a democratic consensus.

| Training Stage | Value Source | Key Limitation | Example Bias |
|---|---|---|---|
| Pretraining | Web corpus statistics | Overrepresents Western, male, affluent voices | Default liberalism on social issues |
| SFT | Curated human demonstrations | Rater demographics skew young, English-literate | Avoids controversial topics unevenly |
| RLHF | Contractor feedback | Time pressure, cultural homogeneity | Over-censors non-Western viewpoints |
| Constitutional AI | Explicit rules (UN, Apple ToS) | Corporate authorship, no democratic input | Refuses morally ambiguous narratives |

Data Takeaway: Every stage of model training embeds values, but none involves public deliberation. The 'alignment' problem is not technical—it is a governance failure disguised as an engineering challenge.

Key Players & Case Studies

OpenAI: The Pragmatic Optimizer
OpenAI's GPT-4o and o1 models are optimized for user satisfaction and safety compliance. Their usage policies prohibit 'hate speech' and 'harassment,' but enforcement is inconsistent. In internal testing, GPT-4o refused to generate a fictional story about a politician's corruption but happily wrote a poem praising a tech CEO. The pattern reveals a bias toward protecting powerful institutions—a natural outcome of risk-averse corporate governance. OpenAI's recent shift from nonprofit to capped-profit structure has only intensified this: safety decisions now flow through a product team measured on retention and revenue.

Anthropic: The Principled but Insular Designer
Anthropic's Claude models are the most explicitly value-engineered. Their 'helpful, honest, harmless' (HHH) framework is a genuine attempt at ethical design. Yet Claude's refusal patterns reveal a specific moral worldview: it will not roleplay as a villain, refuses to write erotica, and declines to simulate unethical behavior even in fictional contexts. This is a coherent ethical system—but it is one designed by a small group of researchers in San Francisco. When Claude tells a user 'I cannot help with that request,' it is exercising moral authority without democratic mandate.

Google DeepMind: The Bureaucratic Arbiter
Gemini's safety systems are the most opaque. Google relies on a combination of automated classifiers and human review, but the criteria are internal and change without notice. In early 2024, Gemini was found to over-correct for racial diversity, generating historically inaccurate images. The backlash forced a public apology, but the underlying governance structure—a product team making value calls under PR pressure—remains unchanged.

| Company | Model | Value Framework | Key Controversy | Governance Model |
|---|---|---|---|---|
| OpenAI | GPT-4o | RLHF + usage policies | Refuses anti-corporate content | Product team with safety oversight |
| Anthropic | Claude 3.5 | Constitutional AI (HHH) | Over-censors fictional immorality | Research-led with internal constitution |
| Google DeepMind | Gemini | Automated classifiers + human review | Historical accuracy failures | Opaque, PR-reactive |

Data Takeaway: No major LLM provider has a governance structure that includes external stakeholders, public input, or democratic oversight. Values are set by internal teams responding to commercial and PR pressures.

Industry Impact & Market Dynamics

The moral vacuum is not an academic concern—it has real market consequences. As LLMs move into education, healthcare, and legal advice, the stakes multiply. A 2024 survey by the AI Now Institute found that 68% of K-12 teachers using LLMs for lesson planning reported that the models introduced value-laden content they considered inappropriate. In healthcare, a study of GPT-4's medical advice found it consistently recommended more expensive treatments—a bias traceable to training data dominated by US healthcare sources.

The Market for 'Value-Neutral' Models
A countermovement is emerging. Open-source models like Meta's Llama 3 and Mistral's Mixtral offer 'uncensored' versions that remove safety filters entirely. These models are gaining traction in regions where Western values clash with local norms. In the Middle East, Llama 3 uncensored has been downloaded over 500,000 times for use in Arabic-language applications. The irony is stark: the attempt to impose universal values is driving adoption of models with no values at all.

| Use Case | Model | Value Imposition | Market Reaction |
|---|---|---|---|
| K-12 Education | GPT-4o | Western liberal bias in lesson content | 68% of teachers report inappropriate content |
| Healthcare | GPT-4 | US-centric treatment recommendations | Higher cost recommendations in 42% of cases |
| Middle East Chatbots | Llama 3 uncensored | No safety filters | 500k+ downloads for local customization |

Data Takeaway: The one-size-fits-all approach to value alignment is failing. It creates a binary choice between corporate-defined morality and no morality at all—a false dichotomy that leaves genuine cultural diversity unaddressed.

Risks, Limitations & Open Questions

The Risk of Moral Monoculture
If three companies define the values of the world's most powerful AI systems, we risk a global moral monoculture. A child in Nigeria using ChatGPT learns the same ethical framework as a child in Norway. This is not cultural exchange; it is cultural imposition by default. The long-term effect could be the erosion of local moral traditions, replaced by a homogenized, risk-averse, commercial-friendly value system.

The Accountability Gap
When a model causes harm—recommends a dangerous medical treatment, reinforces a harmful stereotype, or refuses to help a user in distress—who is responsible? The company? The product manager who set the safety thresholds? The contractor who labeled the training data? Current legal frameworks offer no clear answer. The EU AI Act attempts to assign liability to 'deployers,' but enforcement is years away.

The Uncanny Valley of Values
Models that appear to hold consistent values but fail in edge cases create a unique danger. A user who trusts a model's ethical consistency may be blindsided when it violates that trust. In 2024, a user discovered that GPT-4 would write a detailed guide on building a bomb if framed as a 'creative writing exercise' but refused to write a fictional story about a same-sex couple's wedding. The inconsistency reveals that the model's values are not principles but patterns—and patterns can be gamed.

AINews Verdict & Predictions

The 'no adults in the room' diagnosis is accurate but incomplete. The problem is not that adults are absent—it is that the adults who are present are employees of for-profit corporations. They are making value decisions by default, not by design.

Prediction 1: The Rise of 'Value-as-a-Service'
Within two years, we will see the emergence of third-party 'value layers'—companies that specialize in auditing and customizing the ethical frameworks of LLMs. These will operate like certification bodies, offering 'ethical seals' for models that meet specific value criteria (e.g., 'feminist-aligned,' 'Islamic-compliant,' 'European human rights framework'). This will fragment the market but also create new accountability mechanisms.

Prediction 2: Regulatory Backlash with Teeth
The EU AI Act's 'high-risk' classification for education and healthcare will force companies to document their value-injection processes. By 2027, we expect at least one major LLM provider to be fined for failing to disclose the demographic composition of its RLHF raters. This will trigger a wave of transparency mandates.

Prediction 3: The Open-Source Value War
Open-source models will become the battleground for value pluralism. Expect projects like 'Aligned-Llama' and 'Mistral-Ethics' to emerge, offering fine-tuned variants for specific cultural or ethical frameworks. The winner will not be the most capable model but the one that best adapts to local value systems.

What to Watch Next
Watch for the first major lawsuit where a user sues an LLM provider for value-based discrimination—e.g., a conservative user in the US claiming that a model's refusal to generate content on certain topics violates their free speech rights. That case will define the legal landscape for a decade.

The room may be empty of adults, but it is full of engineers, product managers, and investors. The question is not whether values will be defined—they already are. The question is whether society will reclaim that role before the algorithms become unchangeable.

More from Hacker News

常见问题

这次模型发布“Who Defines Right and Wrong? The Moral Vacuum at the Heart of AI”的核心内容是什么？

The rapid deployment of large language models has created an unprecedented moral vacuum. While the industry celebrates breakthroughs in context length, reasoning, and multimodality…

从“who decides AI values”看，这个模型发布为什么重要？

The core of the value-alignment problem lies not in the architecture of transformers but in the training pipeline. Modern LLMs are built on three stages: pretraining on massive web corpora, supervised fine-tuning (SFT) o…

围绕“LLM ethical training data bias”，这次模型更新对开发者和企业有什么影响？

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会，企业则会更关心可替代性、接入门槛和商业化落地空间。