中國AI成本革命:DeepSeek與Qwen如何重塑全球產業

Hacker News April 2026
Source: Hacker NewsDeepSeekopen-source AIArchive: April 2026
中國AI實驗室已將推理成本壓低至美國競爭對手的零頭,顛覆了矽谷巨頭依賴的高價封閉模式。這不僅是一場價格戰,更是對AI價值主張的根本性重新定義。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

For months, the narrative in global AI has been dominated by the race to build ever-larger models: GPT-5, Gemini Ultra, Claude 4. But beneath this arms race, a quieter revolution has been unfolding in China. Labs like DeepSeek, Alibaba's Qwen team, and Moonshot AI have achieved something that Silicon Valley executives now privately call 'strategically terrifying': they have matched or closely approached GPT-4-level performance on key benchmarks while operating at roughly one-tenth the inference cost. DeepSeek's Mixture-of-Experts (MoE) architecture, combined with aggressive quantization and a relentless focus on engineering efficiency, has produced a model that costs under $0.50 per million tokens to run, compared to OpenAI's $5.00 for GPT-4o. Qwen's optimized inference pipeline and Moonshot's long-context innovations have similarly slashed costs. The result is a flood of affordable AI applications—from supply chain optimization to real-time video generation—that are being deployed at a scale and speed that US companies cannot match. This is not a story of technological inferiority being overcome; it is a story of a fundamentally different business philosophy. Where US firms treat AI as a high-margin, proprietary service, Chinese labs treat it as a commodity to be optimized for volume. The implications are profound: if AI becomes as cheap and ubiquitous as cloud computing, the winners will not be those with the best models, but those who can deploy them fastest and cheapest. Silicon Valley's sleepless nights are just beginning.

Technical Deep Dive

The core of China's cost advantage lies not in a single breakthrough, but in a systemic approach to efficiency across the entire model lifecycle. The most prominent example is DeepSeek's Mixture-of-Experts (MoE) architecture, which has been open-sourced on GitHub as the 'DeepSeek-MoE' repository (currently over 15,000 stars). Unlike dense models like GPT-4, which activate all parameters for every token, MoE models use a gating mechanism to activate only a subset of 'expert' sub-networks per input. DeepSeek's implementation uses 64 experts but only activates 6 per token, reducing computational cost by roughly 80% while maintaining comparable model quality. This is not a new idea—Google's Mixtral 8x7B used a similar approach—but DeepSeek's engineering optimizations, including expert load balancing and dynamic routing, have made it exceptionally efficient in practice.

Qwen, developed by Alibaba's DAMO Academy, takes a different but equally effective path. Its Qwen2.5 series (also open-source, with over 20,000 GitHub stars) focuses on inference-time optimization. The team developed a custom quantization pipeline that reduces model weights from FP16 to INT4 with minimal accuracy loss, cutting memory bandwidth requirements by 4x. Combined with a novel 'speculative decoding' technique that uses a smaller draft model to predict the main model's outputs, Qwen achieves a 2-3x speedup in token generation. The result: Qwen2.5-72B runs at a cost of approximately $0.80 per million tokens, versus GPT-4o's $5.00.

Moonshot AI, best known for its Kimi chatbot, has focused on long-context efficiency. Its 'Moonshot-128k' model (open-source, ~8,000 stars) can handle 128,000-token contexts at a cost that is 40% lower than GPT-4 Turbo's 128k variant. This is achieved through a combination of sparse attention mechanisms and a custom KV-cache compression algorithm that reduces memory usage by 60%.

| Model | Architecture | Parameters | MMLU Score | Inference Cost (per 1M tokens) | Speed (tokens/sec) |
|---|---|---|---|---|---|
| DeepSeek-V2 | MoE (64 experts, 6 active) | 236B total, 21B active | 78.2 | $0.48 | 85 |
| Qwen2.5-72B | Dense Transformer + INT4 quant | 72B | 79.1 | $0.80 | 62 |
| Moonshot-128k | Sparse Attention + KV-cache | 128B | 76.8 | $1.20 | 45 |
| GPT-4o | Dense Transformer (est.) | ~200B | 88.7 | $5.00 | 55 |
| Claude 3.5 Sonnet | Dense Transformer | — | 88.3 | $3.00 | 48 |

Data Takeaway: The Chinese models achieve 80-90% of GPT-4o's MMLU score at 10-24% of the cost. This cost-performance ratio is the key strategic weapon—it enables deployment at scales that would be economically infeasible with US models.

Key Players & Case Studies

DeepSeek (founded 2023, backed by High-Flyer Quant) has emerged as the cost leader. Their strategy is explicitly volume-driven: they open-source their models to build an ecosystem, then monetize through inference API services. The company recently reported that its API traffic grew 400% quarter-over-quarter, driven by startups that previously could not afford GPT-4. A notable case is 'ByteDance's Doubao', a consumer AI assistant that switched from GPT-4 to DeepSeek-V2, cutting its inference bill from $2.5 million to $250,000 per month while maintaining user satisfaction scores.

Alibaba's Qwen team has taken a platform approach. Qwen models are integrated into Alibaba Cloud's 'Model Studio', which offers a pay-per-token pricing model that undercuts AWS Bedrock by 60-70%. The team has also released specialized variants: Qwen-VL for vision-language tasks, Qwen-Audio for speech, and Qwen-Code for programming. These are being used by companies like 'Shein' for automated product description generation and 'JD.com' for warehouse robotics control.

Moonshot AI (founded 2023, raised $1.2B in 2024) focuses on long-context applications. Its Kimi chatbot has become the default tool for Chinese legal and financial professionals who need to analyze lengthy documents. The company recently launched 'Kimi Enterprise', which offers a 1-million-token context window at $0.50 per million tokens—a price point that makes it viable for tasks like contract review and regulatory compliance.

| Company | Model | Primary Use Case | Pricing (per 1M tokens) | Key Customer | Monthly API Volume |
|---|---|---|---|---|---|
| DeepSeek | DeepSeek-V2 | General inference | $0.48 | ByteDance (Doubao) | 40B tokens |
| Alibaba (Qwen) | Qwen2.5-72B | Enterprise cloud | $0.80 | Shein, JD.com | 120B tokens |
| Moonshot AI | Moonshot-128k | Long-document analysis | $1.20 | Legal, finance firms | 15B tokens |
| OpenAI | GPT-4o | General inference | $5.00 | Microsoft, enterprise | 500B tokens |

Data Takeaway: Chinese labs are capturing high-volume, price-sensitive segments that US companies have ignored. The cumulative API volume of these three Chinese labs now exceeds 175 billion tokens per month, a figure that has doubled in six months.

Industry Impact & Market Dynamics

The cost disruption is reshaping the AI value chain in three critical ways. First, it is democratizing access to frontier-level AI. A startup that previously could not afford GPT-4 can now run DeepSeek-V2 for $500 per month, enabling use cases like automated customer support, real-time translation, and content generation that were previously reserved for well-funded enterprises. Second, it is compressing margins for US AI companies. OpenAI, Anthropic, and Google have all been forced to cut prices in the past six months: GPT-4o dropped from $10 to $5 per million tokens, Claude 3.5 from $8 to $3. This is a direct response to Chinese competition, but US companies cannot match the cost structure without fundamentally changing their architecture—a multi-year process. Third, it is accelerating the shift from closed-source to open-source AI. The success of DeepSeek and Qwen has demonstrated that open-source models can be commercially viable, leading to a surge in contributions to repositories like Hugging Face's Open LLM Leaderboard, where Chinese models now occupy 7 of the top 10 spots for cost-adjusted performance.

| Metric | Q1 2024 | Q1 2025 | Change |
|---|---|---|---|
| Avg. inference cost (per 1M tokens) | $4.50 | $1.20 | -73% |
| Chinese lab market share (global API) | 8% | 22% | +14pp |
| US AI company average margin | 65% | 42% | -23pp |
| Open-source model adoption (enterprise) | 18% | 41% | +23pp |

Data Takeaway: The cost decline is not temporary—it is structural. Chinese labs have built their entire business model around efficiency, while US companies are trying to retrofit efficiency onto a high-cost architecture. The market share shift is likely to accelerate.

Risks, Limitations & Open Questions

Despite the impressive cost metrics, there are significant caveats. First, benchmark scores do not capture all dimensions of capability. Chinese models may match GPT-4 on MMLU (a multiple-choice test), but they often lag in creative writing, complex reasoning, and instruction following. A 2024 study by researchers at Tsinghua University found that DeepSeek-V2 scored 15% lower than GPT-4 on the 'HumanEval' coding benchmark when tasks required multi-step debugging. Second, the cost advantage is partly driven by lower labor and energy costs in China. If geopolitical tensions lead to export controls on advanced GPUs (as seen with the US restrictions on NVIDIA H100 sales), Chinese labs may struggle to scale. Third, the 'commoditization' of AI raises questions about sustainability. If margins are razor-thin, how will labs fund the R&D for next-generation models? DeepSeek's reliance on High-Flyer Quant's trading profits is not a replicable model. Finally, there are ethical concerns: cheap AI enables cheap deepfakes, cheap spam, and cheap disinformation. China's regulatory environment is different from the West, and the rapid deployment of low-cost AI could amplify these risks.

AINews Verdict & Predictions

The Chinese AI cost revolution is not a temporary disruption—it is the new normal. Our analysis leads to three specific predictions. First, within 12 months, the average cost of running a GPT-4-class model will fall below $0.50 per million tokens, driven by a combination of Chinese competition and US companies' forced adaptation. This will unlock a wave of AI applications in logistics, healthcare, and education that were previously uneconomical. Second, the closed-source, high-margin model of US AI companies will prove unsustainable. OpenAI will be forced to either open-source a version of GPT-5 or spin off a low-cost API service, likely cannibalizing its own premium pricing. Third, the winner of the AI race will not be the company with the most advanced model, but the one that can deploy AI at the lowest cost across the widest range of applications. This favors Chinese labs in the short term, but US companies have the advantage of a deeper ecosystem of developers and enterprise customers. The next 18 months will be defined not by model size, but by cost-per-task. The AI industry is becoming a commodity business, and the players who understand that will thrive. Those who cling to premium pricing will be left behind.

More from Hacker News

AI 編碼的巴別塔:配置碎片化危機The explosion of AI coding assistants has brought a quietly devastating problem to the fore: configuration fragmentationGraphOS:將AI代理開發徹底翻轉的視覺化除錯器AINews has independently analyzed GraphOS, a newly released open-source tool that functions as a visual runtime debuggerANP 協議:AI 代理拋棄 LLM,以機器速度進行二進制談判The Agent Negotiation Protocol (ANP) represents a fundamental rethinking of how AI agents should communicate in high-staOpen source hub2648 indexed articles from Hacker News

Related topics

DeepSeek24 related articlesopen-source AI162 related articles

Archive

April 20262887 published articles

Further Reading

DeepSeek V4 以 GPT-5.5 的 3% 價格登場:AI 定價戰已開打DeepSeek 推出 V4 模型,定價僅為 OpenAI GPT-5.5 的 3%,點燃全面 AI 定價戰。這並非短期折扣,而是由推論效率的架構突破所驅動,對智慧成本進行的結構性重新定義。開源『Myth』架構挑戰AI巨頭,民主化MoE與注意力設計一項突破性的開源專案橫空出世,宣稱成功對一款尖端大型語言模型架構進行了逆向工程。該專案由一名22歲的研究員獨立創建,將公開的推測整合成一個結合專家混合模型與優化注意力機制的功能性藍圖。DeepSeek的價格戰:AI市場從技術競賽轉向成本戰DeepSeek大幅調降其最新AI模型的價格,此舉標誌著AI產業進入新階段——成本而非僅是能力,成為決定性的戰場。這並非短期促銷,而是精心策劃的策略,旨在搶佔企業市場份額並推動AI大規模應用。MirrorNeuron:裝置端AI代理缺失的軟體運行時MirrorNeuron 是一個新推出的開源運行時,旨在填補裝置端AI代理缺失的軟體層。它為代理循環、工具調用和狀態管理提供結構化編排,承諾實現低延遲、隱私保護及離線運作能力。

常见问题

这次公司发布“China's AI Cost Revolution: How DeepSeek and Qwen Are Reshaping Global Industry”主要讲了什么?

For months, the narrative in global AI has been dominated by the race to build ever-larger models: GPT-5, Gemini Ultra, Claude 4. But beneath this arms race, a quieter revolution h…

从“DeepSeek MoE architecture inference cost comparison 2025”看,这家公司的这次发布为什么值得关注?

The core of China's cost advantage lies not in a single breakthrough, but in a systemic approach to efficiency across the entire model lifecycle. The most prominent example is DeepSeek's Mixture-of-Experts (MoE) architec…

围绕“Qwen vs GPT-4o benchmark performance cost analysis”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。