Sora Stalled, Kling Thrives: The AI Video Race Demands Product Grit Over Flashy Demos

May 2026
AI video generationOpenAI归档:May 2026
OpenAI's Sora once defined the cutting edge of AI video generation, but it has stalled in the lab. Kuaishou's Kling, by contrast, has surged ahead by prioritizing product integration and cost efficiency, revealing that endurance—not raw speed—wins the AI video race.
当前正文默认显示英文版,可按需生成当前语言全文。

The AI video generation landscape has entered a brutal endurance phase, where the ability to ship a usable product matters more than flashy demos. OpenAI's Sora, launched to global acclaim for its stunning temporal coherence, has effectively stalled. Internal strategic confusion and an over-reliance on scaling laws without solving core product issues—like long-form temporal consistency and inference cost—have left it as a glorified tech demo. Meanwhile, Kuaishou's Kling has executed a quiet but devastating counterattack. By focusing on controllability, real-time editing, and low-cost inference, Kling has embedded itself into the daily workflow of short-video creators, achieving real commercial traction. This contrast is not a story of technological superiority but of product philosophy. Sora aimed for a perfect, long-form generation out of the gate; Kling started with short, editable, and cheap clips. The result is a market where Kling is generating revenue and user lock-in, while Sora remains a promise. The 'treadmill' effect is clear: in AI video, the runner who can keep iterating and integrating into real-world pipelines will outlast the one who tries to sprint to perfection. The industry must now reckon with the fact that model architecture is only half the battle; the other half is ruthless productization.

Technical Deep Dive

The divergence between Sora and Kling is rooted in fundamentally different architectural and engineering choices. Sora, as described in OpenAI's technical report, is a diffusion transformer (DiT) that operates on spacetime patches. It compresses videos into a latent space and then models the temporal and spatial dynamics jointly. This approach is elegant and theoretically powerful, enabling Sora to generate videos up to a minute long with remarkable object persistence and scene coherence. However, this architecture comes with a crippling computational cost. Generating a single minute of high-resolution video with Sora is estimated to require thousands of GPU hours, making it economically unviable for mass deployment. The model's reliance on scaling laws—the belief that simply increasing model size, data, and compute will yield proportional improvements—has hit a wall. Sora struggles with the 'long-tail' of temporal consistency: while it can maintain a character's appearance for 10 seconds, it often fails over 30 seconds, producing subtle but uncanny glitches like flickering textures or objects that morph into unrelated shapes. This is not a problem that more compute alone can solve; it requires architectural innovations like hierarchical temporal modeling or explicit memory mechanisms, which OpenAI has not publicly pursued.

Kling, on the other hand, takes a pragmatic approach. While Kuaishou has not released a full technical paper, the model is understood to be a 3D Variational Autoencoder (VAE) combined with a latent diffusion model, similar to Stable Video Diffusion but optimized for short clips (5-15 seconds). Kling's key innovation is its 'split-and-stitch' inference pipeline. Instead of generating a long video in one shot, it generates short segments and uses a lightweight temporal alignment module to stitch them together. This dramatically reduces inference cost—by an estimated 80% compared to a monolithic DiT of similar quality—and allows for real-time editing. Kling also integrates a 'control net' style conditioning mechanism that lets creators specify camera motion, subject appearance, and background via simple text prompts or reference images. This focus on controllability, rather than raw length, aligns perfectly with the needs of short-video creators who need quick, iterable assets.

| Model | Max Video Length | Inference Cost (est. per 10s clip) | Temporal Consistency (1-10) | Controllability (1-10) | Open Source Status |
|---|---|---|---|---|---|
| Sora (OpenAI) | 60s | $50+ (GPU hours) | 8 | 3 | No |
| Kling (Kuaishou) | 15s (stitchable) | $0.50 | 7 | 9 | No |
| Stable Video Diffusion (Stability AI) | 4s | $0.05 | 5 | 6 | Yes (GitHub: Stability-AI/generative-models, 25k+ stars) |
| Runway Gen-3 Alpha | 10s | $1.00 | 7 | 7 | No |

Data Takeaway: The table reveals a clear trade-off between video length and practical usability. Sora leads in raw capability but is orders of magnitude too expensive for real-world use. Kling achieves a 'good enough' quality at a cost that enables mass deployment. The open-source alternative, Stable Video Diffusion, lags in quality but offers a foundation for customization. The winner is not the most powerful model, but the one that balances quality, cost, and control.

Key Players & Case Studies

The Sora vs. Kling narrative is a case study in contrasting corporate strategies. OpenAI, under Sam Altman, has historically prioritized breakthrough research and massive scale. Sora was born from this culture: a moonshot project that aimed to redefine video generation. However, OpenAI's internal priorities have been fragmented. The company has simultaneously pursued GPT-5, voice mode for ChatGPT, and enterprise AI agents, leaving Sora without a dedicated product team. The result is a model that is technically impressive but has no clear path to market. OpenAI has not released a public API or a consumer-facing product for Sora, and internal leaks suggest that the team is struggling to reduce inference costs below $10 per minute of video—a price point that kills any viable business model. Sora has become a 'demo-ware': a tool used to secure funding and media attention, but not to generate revenue.

Kuaishou, the parent company of Kling, operates in a different universe. As the second-largest short-video platform in China (behind Douyin/TikTok), Kuaishou has a deep understanding of creator workflows. Kling was not developed in a research lab in isolation; it was built in close collaboration with Kuaishou's internal creator tools team. The model was trained on Kuaishou's proprietary dataset of over 100 million short videos, which gives it an inherent advantage in understanding the visual language of short-form content—fast cuts, dynamic camera movements, and meme-like aesthetics. Kuaishou also invested heavily in inference optimization, using a combination of model distillation and quantization to run Kling on consumer-grade GPUs (e.g., NVIDIA RTX 4090) at near-real-time speeds. This allowed Kuaishou to launch Kling as a free tier within its video editing app, directly competing with traditional editing tools. Within three months of launch, Kling was used to generate over 500 million short clips, according to Kuaishou's internal metrics. This is not a theoretical advantage; it is a data flywheel. Every clip generated provides feedback that improves the model, creating a moat that Sora cannot easily cross.

| Company | Product | Launch Date | Monthly Active Users (est.) | Revenue Model | Key Differentiator |
|---|---|---|---|---|---|
| OpenAI | Sora | Feb 2024 (demo) | <100k (limited access) | None (free demos) | Long-form coherence |
| Kuaishou | Kling | Jun 2024 | 15M | Freemium (ads + subscription) | Workflow integration |
| Runway | Gen-3 Alpha | Jun 2024 | 1M | Subscription ($15-$95/mo) | Professional editing tools |
| Pika Labs | Pika 2.0 | Apr 2024 | 500k | Freemium | Community and ease of use |

Data Takeaway: The user numbers tell the story. Kling has 15 million monthly active users, orders of magnitude more than Sora's limited testers. Kuaishou's strategy of embedding AI generation into an existing, popular platform has created a distribution advantage that is nearly impossible to replicate. Sora, despite its technical superiority, has no distribution channel.

Industry Impact & Market Dynamics

The 'treadmill' effect is reshaping the entire AI video market. The initial hype cycle (2023-2024) was dominated by 'wow factor' demos—models that could generate a convincing 10-second clip of a dog flying a plane. Investors poured over $1.5 billion into AI video startups in 2024 alone. But the market is now entering a consolidation phase. The winners will not be those with the best model, but those who can achieve the lowest cost per usable minute of video. This is a classic 'cost curve' race. Kuaishou has already driven the cost of generating a 10-second clip down to $0.50, and internal roadmaps suggest they aim for $0.10 by the end of 2025. At that price, AI-generated video becomes cheaper than stock footage for many use cases, opening up massive markets in advertising, social media content, and even low-budget filmmaking.

This has profound implications for the competitive landscape. OpenAI, with its massive compute budget, could theoretically match Kling's cost, but it lacks the product DNA to do so efficiently. The company's strength is in research, not in building user-facing tools for non-technical creators. Meanwhile, Chinese companies like Kuaishou and ByteDance (which is developing its own AI video model, Jimeng) have a structural advantage: they already own the distribution platforms and the creator ecosystems. They can afford to subsidize AI generation in the short term to build market share, a playbook they perfected in the short-video wars of the 2010s.

The market is also bifurcating. On one side, there is the 'professional' segment (film, advertising, high-end content) where quality and controllability are paramount. Here, Runway and Pika Labs are competing with Sora, offering subscription-based tools that integrate with traditional editing software like Adobe Premiere Pro. On the other side is the 'mass market' segment (social media, UGC, marketing) where cost and speed are king. Kling dominates this space. The danger for Sora is that it falls into a no-man's land: too expensive for mass market, not controllable enough for professionals.

| Market Segment | Key Players | Average Spend per User/Year | Growth Rate (YoY) | Primary Metric |
|---|---|---|---|---|
| Professional (Film/Ad) | Runway, Pika, Sora | $500-$5,000 | 40% | Quality & Control |
| Mass Market (Social/UGC) | Kling, Jimeng, CapCut AI | $0-$100 | 150% | Cost & Speed |
| Enterprise (Training/Sim) | Synthesia, HeyGen | $10,000-$100,000 | 60% | Customization & Reliability |

Data Takeaway: The mass market segment is growing at 150% year-over-year, three times faster than the professional segment. Kling is perfectly positioned to capture this growth. Sora's failure to address this segment is a strategic error that will be difficult to correct.

Risks, Limitations & Open Questions

Despite Kling's success, the AI video industry faces existential risks. The most pressing is the 'uncanny valley' problem. Even the best models, including Kling, produce artifacts that break immersion—fingers that morph, backgrounds that flicker, physics that defy gravity. For professional use, these errors are unacceptable. Kuaishou has mitigated this by limiting generation length to 15 seconds, but this is a crutch, not a solution. The industry needs a breakthrough in temporal consistency that goes beyond scaling.

Second, the regulatory environment is tightening. The EU's AI Act and China's own generative AI regulations require watermarking and provenance tracking for AI-generated content. Kling already embeds invisible watermarks, but compliance costs are rising. More concerning is the potential for misuse: deepfakes, disinformation, and non-consensual synthetic content. Kuaishou has implemented content moderation filters, but these are imperfect. A major scandal involving AI-generated video on a platform like Kuaishou could trigger a regulatory backlash that stifles the entire market.

Third, the open-source threat looms. Stability AI's Stable Video Diffusion, while inferior in quality, is free and customizable. A community-driven model that approaches Kling's quality could disrupt the commercial market overnight. The recent release of Mochi 1 (by Genmo, not open-source but with a permissive license) shows that the gap is closing. If a truly competitive open-source model emerges, Kuaishou's cost advantage could evaporate.

Finally, there is the question of Sora's resurrection. OpenAI has the resources and talent to pivot. If the company decides to prioritize video generation, acquires a distribution channel (e.g., by partnering with a platform like YouTube or Instagram), and optimizes for cost, it could re-enter the race. But this would require a fundamental cultural shift at OpenAI, from research-first to product-first. Given the company's current trajectory, this seems unlikely within the next 12 months.

AINews Verdict & Predictions

Sora is not dead, but it is in a coma. OpenAI's failure to productize its most impressive demo in over a year is a strategic blunder of historic proportions. The company has squandered its first-mover advantage by treating video generation as a research project rather than a product. Kling, meanwhile, has won the first round by being ruthlessly pragmatic. It is not the best model, but it is the best product.

Prediction 1: Within 12 months, Kuaishou will launch a paid API for Kling that undercuts all competitors by at least 50%, forcing Runway and Pika to either consolidate or pivot to niche professional markets.

Prediction 2: OpenAI will eventually spin off Sora into a separate entity or partner with a major platform (likely Microsoft or Instagram) to gain distribution, but this will be a defensive move, not a market-leading one.

Prediction 3: The next major breakthrough in AI video will not come from a model architecture change, but from a 'video operating system'—a platform that combines generation, editing, and distribution into a single, seamless workflow. Kuaishou is already building this. OpenAI is not.

What to watch: The open-source community. If a model like Stable Video Diffusion 2.0 or a new entrant achieves Kling-level quality at zero cost, the entire commercial landscape will be upended. For now, Kling is the king, but the treadmill never stops.

相关专题

AI video generation38 篇相关文章OpenAI109 篇相关文章

时间归档

May 20261212 篇已发布文章

延伸阅读

字节跳动AI视频狂飙:中国科技巨头如何赢得后Sora时代的商业化竞赛AI生成视频的叙事正在发生根本性转变。从OpenAI的Sora演示引发的初始惊叹,已转向对部署、实用性和可持续商业模式的务实关注。在这一新阶段,以字节跳动为首的中国科技集团正凭借其一体化生态,对行动迟缓的西方实验室发起决定性超越。超越Sora:AI视频生成如何分野为世界模型与商业现实的双轨竞赛以Sora初现为标志的、追逐纯技术奇观的AI视频生成时代已告终结。行业正裂变为两条清晰并行的轨道:一是追求支撑长叙事逻辑的底层「世界模型」,二是推动高保真、可商用的实用工具开发。这一分野标志着该领域正从演示驱动的狂热,转向更务实的价值探索。超越Sora:中国新BAT三巨头如何重塑AI视频生成竞赛格局Sora作为AI视频生成唯一标杆的时代已经终结。竞争进入更复杂的新阶段——重点不再是追逐视觉保真度,而是构建实用、可扩展的视频AI生态系统。中国的科技巨头正引领这场变革,在世界模型与实时应用领域驱动创新。Sora战略地位滑落:AI产业正从炫技奇观转向实用主义AI产业正经历深刻的战略转向。以OpenAI的Sora为代表的、令人惊叹的生成式媒体引发的初期狂热正在消退,行业焦点已不可逆转地转向追求可执行、可行动的实用智能。这标志着演示驱动型炒作周期的终结,以及构建能在现实世界中可靠执行任务的AI的硬

常见问题

这次公司发布“Sora Stalled, Kling Thrives: The AI Video Race Demands Product Grit Over Flashy Demos”主要讲了什么?

The AI video generation landscape has entered a brutal endurance phase, where the ability to ship a usable product matters more than flashy demos. OpenAI's Sora, launched to global…

从“How does Kling's inference cost compare to Sora's per minute of video?”看,这家公司的这次发布为什么值得关注?

The divergence between Sora and Kling is rooted in fundamentally different architectural and engineering choices. Sora, as described in OpenAI's technical report, is a diffusion transformer (DiT) that operates on spaceti…

围绕“What are the key architectural differences between Sora's DiT and Kling's 3D VAE?”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。