Sora は停滞、Kling は躍進:AI動画レースに求められるのは派手なデモではなく製品の粘り強さ

May 2026
AI video generationOpenAIArchive: May 2026
OpenAI の Sora はかつて AI 動画生成の最先端を定義しましたが、現在は実験室で停滞しています。対照的に、Kuaishou の Kling は製品統合とコスト効率を優先することで急成長しており、AI動画レースでは生のスピードではなく持久力が勝利をもたらすことを明らかにしています。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The AI video generation landscape has entered a brutal endurance phase, where the ability to ship a usable product matters more than flashy demos. OpenAI's Sora, launched to global acclaim for its stunning temporal coherence, has effectively stalled. Internal strategic confusion and an over-reliance on scaling laws without solving core product issues—like long-form temporal consistency and inference cost—have left it as a glorified tech demo. Meanwhile, Kuaishou's Kling has executed a quiet but devastating counterattack. By focusing on controllability, real-time editing, and low-cost inference, Kling has embedded itself into the daily workflow of short-video creators, achieving real commercial traction. This contrast is not a story of technological superiority but of product philosophy. Sora aimed for a perfect, long-form generation out of the gate; Kling started with short, editable, and cheap clips. The result is a market where Kling is generating revenue and user lock-in, while Sora remains a promise. The 'treadmill' effect is clear: in AI video, the runner who can keep iterating and integrating into real-world pipelines will outlast the one who tries to sprint to perfection. The industry must now reckon with the fact that model architecture is only half the battle; the other half is ruthless productization.

Technical Deep Dive

The divergence between Sora and Kling is rooted in fundamentally different architectural and engineering choices. Sora, as described in OpenAI's technical report, is a diffusion transformer (DiT) that operates on spacetime patches. It compresses videos into a latent space and then models the temporal and spatial dynamics jointly. This approach is elegant and theoretically powerful, enabling Sora to generate videos up to a minute long with remarkable object persistence and scene coherence. However, this architecture comes with a crippling computational cost. Generating a single minute of high-resolution video with Sora is estimated to require thousands of GPU hours, making it economically unviable for mass deployment. The model's reliance on scaling laws—the belief that simply increasing model size, data, and compute will yield proportional improvements—has hit a wall. Sora struggles with the 'long-tail' of temporal consistency: while it can maintain a character's appearance for 10 seconds, it often fails over 30 seconds, producing subtle but uncanny glitches like flickering textures or objects that morph into unrelated shapes. This is not a problem that more compute alone can solve; it requires architectural innovations like hierarchical temporal modeling or explicit memory mechanisms, which OpenAI has not publicly pursued.

Kling, on the other hand, takes a pragmatic approach. While Kuaishou has not released a full technical paper, the model is understood to be a 3D Variational Autoencoder (VAE) combined with a latent diffusion model, similar to Stable Video Diffusion but optimized for short clips (5-15 seconds). Kling's key innovation is its 'split-and-stitch' inference pipeline. Instead of generating a long video in one shot, it generates short segments and uses a lightweight temporal alignment module to stitch them together. This dramatically reduces inference cost—by an estimated 80% compared to a monolithic DiT of similar quality—and allows for real-time editing. Kling also integrates a 'control net' style conditioning mechanism that lets creators specify camera motion, subject appearance, and background via simple text prompts or reference images. This focus on controllability, rather than raw length, aligns perfectly with the needs of short-video creators who need quick, iterable assets.

| Model | Max Video Length | Inference Cost (est. per 10s clip) | Temporal Consistency (1-10) | Controllability (1-10) | Open Source Status |
|---|---|---|---|---|---|
| Sora (OpenAI) | 60s | $50+ (GPU hours) | 8 | 3 | No |
| Kling (Kuaishou) | 15s (stitchable) | $0.50 | 7 | 9 | No |
| Stable Video Diffusion (Stability AI) | 4s | $0.05 | 5 | 6 | Yes (GitHub: Stability-AI/generative-models, 25k+ stars) |
| Runway Gen-3 Alpha | 10s | $1.00 | 7 | 7 | No |

Data Takeaway: The table reveals a clear trade-off between video length and practical usability. Sora leads in raw capability but is orders of magnitude too expensive for real-world use. Kling achieves a 'good enough' quality at a cost that enables mass deployment. The open-source alternative, Stable Video Diffusion, lags in quality but offers a foundation for customization. The winner is not the most powerful model, but the one that balances quality, cost, and control.

Key Players & Case Studies

The Sora vs. Kling narrative is a case study in contrasting corporate strategies. OpenAI, under Sam Altman, has historically prioritized breakthrough research and massive scale. Sora was born from this culture: a moonshot project that aimed to redefine video generation. However, OpenAI's internal priorities have been fragmented. The company has simultaneously pursued GPT-5, voice mode for ChatGPT, and enterprise AI agents, leaving Sora without a dedicated product team. The result is a model that is technically impressive but has no clear path to market. OpenAI has not released a public API or a consumer-facing product for Sora, and internal leaks suggest that the team is struggling to reduce inference costs below $10 per minute of video—a price point that kills any viable business model. Sora has become a 'demo-ware': a tool used to secure funding and media attention, but not to generate revenue.

Kuaishou, the parent company of Kling, operates in a different universe. As the second-largest short-video platform in China (behind Douyin/TikTok), Kuaishou has a deep understanding of creator workflows. Kling was not developed in a research lab in isolation; it was built in close collaboration with Kuaishou's internal creator tools team. The model was trained on Kuaishou's proprietary dataset of over 100 million short videos, which gives it an inherent advantage in understanding the visual language of short-form content—fast cuts, dynamic camera movements, and meme-like aesthetics. Kuaishou also invested heavily in inference optimization, using a combination of model distillation and quantization to run Kling on consumer-grade GPUs (e.g., NVIDIA RTX 4090) at near-real-time speeds. This allowed Kuaishou to launch Kling as a free tier within its video editing app, directly competing with traditional editing tools. Within three months of launch, Kling was used to generate over 500 million short clips, according to Kuaishou's internal metrics. This is not a theoretical advantage; it is a data flywheel. Every clip generated provides feedback that improves the model, creating a moat that Sora cannot easily cross.

| Company | Product | Launch Date | Monthly Active Users (est.) | Revenue Model | Key Differentiator |
|---|---|---|---|---|---|
| OpenAI | Sora | Feb 2024 (demo) | <100k (limited access) | None (free demos) | Long-form coherence |
| Kuaishou | Kling | Jun 2024 | 15M | Freemium (ads + subscription) | Workflow integration |
| Runway | Gen-3 Alpha | Jun 2024 | 1M | Subscription ($15-$95/mo) | Professional editing tools |
| Pika Labs | Pika 2.0 | Apr 2024 | 500k | Freemium | Community and ease of use |

Data Takeaway: The user numbers tell the story. Kling has 15 million monthly active users, orders of magnitude more than Sora's limited testers. Kuaishou's strategy of embedding AI generation into an existing, popular platform has created a distribution advantage that is nearly impossible to replicate. Sora, despite its technical superiority, has no distribution channel.

Industry Impact & Market Dynamics

The 'treadmill' effect is reshaping the entire AI video market. The initial hype cycle (2023-2024) was dominated by 'wow factor' demos—models that could generate a convincing 10-second clip of a dog flying a plane. Investors poured over $1.5 billion into AI video startups in 2024 alone. But the market is now entering a consolidation phase. The winners will not be those with the best model, but those who can achieve the lowest cost per usable minute of video. This is a classic 'cost curve' race. Kuaishou has already driven the cost of generating a 10-second clip down to $0.50, and internal roadmaps suggest they aim for $0.10 by the end of 2025. At that price, AI-generated video becomes cheaper than stock footage for many use cases, opening up massive markets in advertising, social media content, and even low-budget filmmaking.

This has profound implications for the competitive landscape. OpenAI, with its massive compute budget, could theoretically match Kling's cost, but it lacks the product DNA to do so efficiently. The company's strength is in research, not in building user-facing tools for non-technical creators. Meanwhile, Chinese companies like Kuaishou and ByteDance (which is developing its own AI video model, Jimeng) have a structural advantage: they already own the distribution platforms and the creator ecosystems. They can afford to subsidize AI generation in the short term to build market share, a playbook they perfected in the short-video wars of the 2010s.

The market is also bifurcating. On one side, there is the 'professional' segment (film, advertising, high-end content) where quality and controllability are paramount. Here, Runway and Pika Labs are competing with Sora, offering subscription-based tools that integrate with traditional editing software like Adobe Premiere Pro. On the other side is the 'mass market' segment (social media, UGC, marketing) where cost and speed are king. Kling dominates this space. The danger for Sora is that it falls into a no-man's land: too expensive for mass market, not controllable enough for professionals.

| Market Segment | Key Players | Average Spend per User/Year | Growth Rate (YoY) | Primary Metric |
|---|---|---|---|---|
| Professional (Film/Ad) | Runway, Pika, Sora | $500-$5,000 | 40% | Quality & Control |
| Mass Market (Social/UGC) | Kling, Jimeng, CapCut AI | $0-$100 | 150% | Cost & Speed |
| Enterprise (Training/Sim) | Synthesia, HeyGen | $10,000-$100,000 | 60% | Customization & Reliability |

Data Takeaway: The mass market segment is growing at 150% year-over-year, three times faster than the professional segment. Kling is perfectly positioned to capture this growth. Sora's failure to address this segment is a strategic error that will be difficult to correct.

Risks, Limitations & Open Questions

Despite Kling's success, the AI video industry faces existential risks. The most pressing is the 'uncanny valley' problem. Even the best models, including Kling, produce artifacts that break immersion—fingers that morph, backgrounds that flicker, physics that defy gravity. For professional use, these errors are unacceptable. Kuaishou has mitigated this by limiting generation length to 15 seconds, but this is a crutch, not a solution. The industry needs a breakthrough in temporal consistency that goes beyond scaling.

Second, the regulatory environment is tightening. The EU's AI Act and China's own generative AI regulations require watermarking and provenance tracking for AI-generated content. Kling already embeds invisible watermarks, but compliance costs are rising. More concerning is the potential for misuse: deepfakes, disinformation, and non-consensual synthetic content. Kuaishou has implemented content moderation filters, but these are imperfect. A major scandal involving AI-generated video on a platform like Kuaishou could trigger a regulatory backlash that stifles the entire market.

Third, the open-source threat looms. Stability AI's Stable Video Diffusion, while inferior in quality, is free and customizable. A community-driven model that approaches Kling's quality could disrupt the commercial market overnight. The recent release of Mochi 1 (by Genmo, not open-source but with a permissive license) shows that the gap is closing. If a truly competitive open-source model emerges, Kuaishou's cost advantage could evaporate.

Finally, there is the question of Sora's resurrection. OpenAI has the resources and talent to pivot. If the company decides to prioritize video generation, acquires a distribution channel (e.g., by partnering with a platform like YouTube or Instagram), and optimizes for cost, it could re-enter the race. But this would require a fundamental cultural shift at OpenAI, from research-first to product-first. Given the company's current trajectory, this seems unlikely within the next 12 months.

AINews Verdict & Predictions

Sora is not dead, but it is in a coma. OpenAI's failure to productize its most impressive demo in over a year is a strategic blunder of historic proportions. The company has squandered its first-mover advantage by treating video generation as a research project rather than a product. Kling, meanwhile, has won the first round by being ruthlessly pragmatic. It is not the best model, but it is the best product.

Prediction 1: Within 12 months, Kuaishou will launch a paid API for Kling that undercuts all competitors by at least 50%, forcing Runway and Pika to either consolidate or pivot to niche professional markets.

Prediction 2: OpenAI will eventually spin off Sora into a separate entity or partner with a major platform (likely Microsoft or Instagram) to gain distribution, but this will be a defensive move, not a market-leading one.

Prediction 3: The next major breakthrough in AI video will not come from a model architecture change, but from a 'video operating system'—a platform that combines generation, editing, and distribution into a single, seamless workflow. Kuaishou is already building this. OpenAI is not.

What to watch: The open-source community. If a model like Stable Video Diffusion 2.0 or a new entrant achieves Kling-level quality at zero cost, the entire commercial landscape will be upended. For now, Kling is the king, but the treadmill never stops.

Related topics

AI video generation38 related articlesOpenAI109 related articles

Archive

May 20261212 published articles

Further Reading

ByteDanceのAI動画急増:中国テック大手がポストSoraの商業化レースで優位に立つ方法AI生成動画をめぐる状況は根本的な転換期を迎えています。OpenAIのSoraデモに触発された当初の驚きは、展開、実用性、持続可能なビジネスモデルへの現実的な焦点へと移行しました。この新たな段階で、ByteDanceを筆頭とする中国のテックSoraを超えて:AIビデオ生成が「世界モデル」と商業的現実の間で分岐する道筋Soraの初公開に象徴される、純粋な技術的スペクタクルを追い求めるAIビデオ生成の時代は終わりを告げました。業界は現在、長編で論理的なナラティブを構築する基礎的な「世界モデル」の追求と、高精細で実用的な短編コンテンツの商業化への推進という、Soraを超えて:中国の新BATトリオがAIビデオ生成競争を再定義する方法SoraがAIビデオ生成の唯一のベンチマークであった時代は終わりました。視覚的な忠実度を追うのではなく、実用的でスケーラブルなビデオAIエコシステムを構築する、新たでより複雑な競争の段階が始まっています。中国を代表するテックコングロマリットSoraの戦略的退潮は、AIがスペクタクルから実用性への転換を示すAI業界は深遠な戦略的再編を経ています。OpenAIのSoraに代表される、息をのむ生成メディアへの当初の熱狂は、実用的で実行可能な知能へのひたむきな焦点に取って代わられつつあります。これは、デモ主導のハイプサイクルの終わりと、新たな段階の

常见问题

这次公司发布“Sora Stalled, Kling Thrives: The AI Video Race Demands Product Grit Over Flashy Demos”主要讲了什么?

The AI video generation landscape has entered a brutal endurance phase, where the ability to ship a usable product matters more than flashy demos. OpenAI's Sora, launched to global…

从“How does Kling's inference cost compare to Sora's per minute of video?”看,这家公司的这次发布为什么值得关注?

The divergence between Sora and Kling is rooted in fundamentally different architectural and engineering choices. Sora, as described in OpenAI's technical report, is a diffusion transformer (DiT) that operates on spaceti…

围绕“What are the key architectural differences between Sora's DiT and Kling's 3D VAE?”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。