Le pari AI de 36 milliards de dollars de Kuaishou : La génération vidéo peut-elle réécrire son destin ?

Kuaishou's recently disclosed financial planning documents reveal a seismic shift in corporate strategy. The short-video giant has allocated approximately 260 billion RMB (roughly $36 billion USD) for capital expenditures in 2026, a staggering 110 billion RMB increase over 2025 projections. Crucially, this budget is almost entirely earmarked for the development of its proprietary Kling large language model and the massive AI computing infrastructure required to support it.

This move is not merely an incremental technology investment but a conscious, high-stakes gamble to transform Kuaishou from a content aggregation and distribution platform into an AI-native video ecosystem. The core thesis is that generative AI represents the only viable path to break through the current ceiling of user growth and engagement in China's hyper-competitive short-video market. Kuaishou aims to embed Kling's capabilities into every facet of its platform—from AI-assisted content creation for its 700 million monthly active users, to hyper-personalized recommendation engines, virtual streamers, and entirely new forms of interactive commerce and local services.

The financial scale of this commitment is extraordinary, representing a significant portion of Kuaishou's total market capitalization and guaranteed to pressure near-term profitability. Success hinges on Kling achieving not just parity but demonstrable superiority in video generation, world modeling, and agent-based interactivity compared to global leaders like OpenAI's Sora and domestic rivals. Failure could see Kuaishou exhaust its financial reserves without securing a defensible position in the next era of digital media.

Technical Deep Dive

Kuaishou's Kling model represents a multi-modal architecture specifically engineered for the temporal and spatial complexities of video. While official technical whitepapers remain limited, analysis of its public demos and research publications from Kuaishou's AI lab (Y-tech) points to a diffusion transformer (DiT) backbone, similar to the approach pioneered by Sora. However, Kling appears to incorporate several novel adaptations for the short-form, high-engagement video domain.

A key differentiator is its training data pipeline. Unlike models trained primarily on curated film and stock footage, Kling is rumored to be trained on a massive, proprietary corpus of Kuaishou's own user-generated content (UGC). This dataset, encompassing billions of short videos with rich metadata (likes, comments, watch time, creator info), provides a unique signal for understanding "viral" visual patterns, human-centric actions, and real-world scene dynamics that resonate with mass audiences. The model likely employs a video VQ-VAE for efficient tokenization, compressing raw video frames into a discrete latent space. The transformer then operates on these spatiotemporal tokens, learning to predict sequences.

For world modeling and agent simulation—a stated goal—Kling is likely exploring a hybrid approach. This could involve integrating a physics-informed neural network layer to enforce basic real-world constraints (object permanence, gravity) with a large language model module for narrative coherence and instruction following. The open-source project JEPA (Joint Embedding Predictive Architecture) by Yann LeCun, which focuses on learning world models by predicting latent representations, is a relevant research direction Kuaishou's team is known to be investigating.

A critical technical hurdle is inference cost and latency. Generating a 60-second 1080p video with high fidelity requires immense computational power. Kuaishou's massive capex is partly directed at building a custom inference stack, potentially using a mixture of expert (MoE) architecture for Kling to reduce active parameter count during generation. The company is also investing heavily in its own AI chips, codenamed "Streaming Silicon," designed to optimize the tensor operations specific to video diffusion models.

| Model | Reported Max Duration | Key Architecture | Primary Training Data | Notable Capability |
|---|---|---|---|---|
| Kuaishou Kling | 120 seconds (target) | Diffusion Transformer (DiT) + Proprietary Modules | Kuaishou UGC + Licensed Content | Stylization for "Kuaishou aesthetic", real-time interaction hooks |
| OpenAI Sora | 60 seconds | Diffusion Transformer | Curated video, synthetic data | Photorealistic generation, complex camera motion |
| Runway Gen-3 | 10 seconds | Custom diffusion pipeline | Cinematic & artistic datasets | High control fidelity, director mode tools |
| Pika 1.0 | 10 seconds | Modified latent diffusion | Diverse web video | Ease of use, text & image to video |

Data Takeaway: The table reveals Kling's ambition to compete on duration and domain-specific training (UGC), but it lags behind Sora in demonstrated realism and physics understanding. Its success depends on closing this quality gap while leveraging its unique data advantage.

Key Players & Case Studies

The strategic landscape for Kling is defined by intense competition on three fronts: global foundational model leaders, domestic Chinese tech giants, and vertical AI video startups.

OpenAI's Sora remains the qualitative benchmark. Its ability to generate physically plausible, minute-long narratives sets the bar Kling must meet. However, Sora's commercial availability and pricing for the Chinese market are uncertain, creating a window of opportunity. Meta's Make-A-Video and Google's Lumiere represent formidable research power but have been slower to productize, focusing on foundational research over immediate platform integration.

Domestically, the competition is ferocious. ByteDance (Douyin/TikTok) is Kuaishou's arch-rival and is pursuing a similar AI video strategy. While ByteDance has been less explicit about a single mega-model, it is advancing rapidly on multiple fronts: its Dreamina (formerly Doubao's video feature) is already integrated into CapCut, its video editing app, creating a powerful creator toolchain. ByteDance's cloud division also offers AI services, creating potential for broader ecosystem lock-in. Tencent is leveraging its strength in gaming and social (WeChat) to develop interactive AI agents and virtual worlds, a adjacent but overlapping battlefield. Alibaba and Baidu are focusing their large models (Qwen, Ernie) more on enterprise and search applications, but possess the cloud infrastructure that could support video generation at scale.

Startups like Pika and Runway demonstrate the power of focused, user-friendly products. Pika's rapid growth to millions of users shows the hunger for accessible video generation tools. Kuaishou's challenge is to match this ease of use while offering deeper, platform-native integration.

The most instructive case study is Midjourney's success in image generation. It dominated not merely through model quality but by building a vibrant community and workflow on Discord. Kuaishou must replicate this within its own app, turning Kling into a social fabric, not just a tool. Researchers like Kaiming He (formerly at Facebook AI, now with a leadership role in Beijing's AI research scene) have emphasized the importance of "data flywheels"—using product interactions to improve the model. Kuaishou's closed loop of creation, distribution, and feedback on its platform is its single biggest potential advantage if it can instrument Kling to learn continuously from user interactions.

Industry Impact & Market Dynamics

Kuaishou's bet is a catalyst that will accelerate several tectonic shifts across the tech and media industries.

1. The Commoditization of Content Creation: If Kling succeeds in enabling high-quality, personalized video creation for hundreds of millions of users, the volume of video content will explode. This will further depress CPMs for generic, non-AI-assisted content while elevating the value of truly unique human creativity or hyper-engaging AI-human collaborations. The role of the "creator" will bifurcate into AI Directors (prompt engineers, stylists, editors) and Human Performers (providing authenticity, emotion, and real-world skill).

2. The Rise of the AI-Native Business Model: Kuaishou is exploring monetization avenues beyond advertising:
- AI Creator Subscriptions: Tiered access to advanced Kling features (longer videos, custom avatars, exclusive styles).
- Generative Advertising: Brands can generate bespoke video ads featuring their products in scenarios tailored to individual users.
- Virtual Streamer Economy: Management and monetization of AI-generated influencers, with Kuaishou taking a platform fee.
- Interactive Commerce: AI agents that can demonstrate products in real-time within a live stream, answer questions, and close sales.

| Potential Revenue Stream | Estimated Addressable Market (China, 2027) | Key Dependency |
|---|---|---|
| AI Creator Tools & Subscriptions | $5-8 Billion | Kling's ease of use & output quality vs. competitors |
| Generative Video Ad Tech | $15-20 Billion | Advertiser adoption, measurement standards |
| Virtual Streamer/Agent Services | $3-5 Billion | User engagement with AI personalities, emotional connection |
| AI-Powered E-commerce Solutions | $10-15 Billion | Integration with logistics & payment, trust in AI recommendations |

Data Takeaway: The market potential is vast, but each stream depends on overcoming significant technical and adoption hurdles. Generative ads represent the largest near-term opportunity, but also the most competitive.

3. Infrastructure Arms Race: The $36 billion expenditure will flow largely to semiconductor and cloud providers. While Kuaishou is developing its own chips, it will still rely on NVIDIA (or domestic alternatives like Biren or Moore Threads) for high-end GPUs in the short term. This investment signals a broader trend of major internet platforms internalizing the AI stack to control costs and differentiate performance, potentially reshaping the cloud market.

4. Regulatory Recalibration: Mass generation of synthetic video will force rapid evolution of China's already-strict cyber governance. Expect mandatory and invisible watermarking, real-time content verification systems, and new regulations around AI-generated likeness and intellectual property. Kuaishou's close alignment with regulatory bodies will be as critical as its technical execution.

Risks, Limitations & Open Questions

Financial Burn Rate: The sheer scale of investment will obliterate profitability for the foreseeable future. Kuaishou's operating margin, which had recently turned positive, will likely plunge back deep into negative territory. If AI-driven revenue growth lags expectations, investor patience could evaporate, leading to a severe capital crunch.

The "Good Enough" Problem: Kling doesn't need to be better than Sora in all aspects; it needs to be *good enough* for the Kuaishou use case and deeply integrated. However, if open-source models (like Stable Video Diffusion or future iterations) advance rapidly, the value of a proprietary, costly model diminishes. The open-source community is relentlessly closing the gap with closed models.

Cultural & Authenticity Risks: Kuaishou's brand is built on "real life" and authenticity from China's heartland. An over-reliance on slick, AI-generated content could alienate its core user base. Striking the right balance—using AI to enhance, not replace, human expression—is a profound product design challenge.

Technical Debt & Agility: Building a monolithic, all-encompassing model like Kling is a bet against the trend towards smaller, specialized models. If the future lies in a swarm of efficient task-specific models, Kuaishou's giant, general-purpose architecture could become slow to iterate and expensive to fine-tune.

Open Questions:
1. Can Kling achieve a true "world model" for interactive agents, or will it remain a sophisticated video parlor trick?
2. Will ByteDance's more modular, product-led approach ultimately prove more agile and cost-effective than Kuaishou's monolithic bet?
3. How will the platform manage the inevitable flood of AI-generated spam, misinformation, and copycat content?
4. Can Kuaishou attract and retain the top-tier AI research talent needed to compete with global labs, given the intense competition for such specialists?

AINews Verdict & Predictions

Kuaishou's $36 billion gamble is a necessary, albeit perilous, move. The company correctly identifies that its previous growth playbook is exhausted. In the face of ByteDance's overwhelming scale and Tencent's ecosystem power, a disruptive technological leap is the only viable path to long-term relevance. However, the execution risk is monumental.

Our Predictions:

1. By Q4 2025, Kling will achieve feature parity with the public version of Sora in short-form (sub-30 second) video quality, but will struggle with longer-term coherence. Its unique "Kuaishou style" filters and templates will see rapid adoption by its creator base, driving a measurable uptick in daily video uploads.

2. The first major revenue success will come from generative advertising, not consumer subscriptions. Within 18 months, over 30% of video ads on Kuaishou will be partially or fully AI-generated, boosting ad engagement metrics and allowing for cheaper creative production for SMEs.

3. Financial pressure will force a strategic partnership by late 2026. The capital demands will be too great. Kuaishou will form a joint venture with a major cloud provider (like Tencent Cloud or Baidu AI Cloud) or a semiconductor firm to share the infrastructure burden, ceding some control in exchange for sustainability.

4. The true battleground will shift to "AI-Native Social Formats." The winner won't be the company with the best video generator, but the one that invents the first killer social interaction powered by AI agents—a hybrid of live streaming, gaming, and social networking that is impossible without real-time generative AI. Kuaishou's deep live-streaming DNA gives it a fighting chance here.

Final Verdict: Kuaishou has placed a valid bet on the only table that matters for its future. The odds are long, and the cost of failure is corporate oblivion. However, the cost of not betting—of slowly being eroded by rivals and technological change—was certain decline. This move transforms Kuaishou from a media company into a high-stakes AI R&D lab with a massive user base. We predict a turbulent three years ahead, with significant financial losses, but a 60% probability that by 2028, Kuaishou emerges as a fundamentally different, leaner, and AI-powered leader in a redefined interactive video landscape. Watch the monthly active user engagement metrics and the developer ecosystem around Kling's API; these will be the leading indicators of success or failure long before the financials turn positive.

常见问题

这次公司发布“Kuaishou's $36 Billion AI Gamble: Can Video Generation Rewrite Its Destiny?”主要讲了什么?

Kuaishou's recently disclosed financial planning documents reveal a seismic shift in corporate strategy. The short-video giant has allocated approximately 260 billion RMB (roughly…

从“Kuaishou Kling model vs OpenAI Sora technical comparison”看,这家公司的这次发布为什么值得关注?

Kuaishou's Kling model represents a multi-modal architecture specifically engineered for the temporal and spatial complexities of video. While official technical whitepapers remain limited, analysis of its public demos a…

围绕“Kuaishou 2026 capital expenditure impact on stock price”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。