Sora se estanca, Kling prospera: la carrera de video con IA exige solidez de producto sobre demos llamativas

May 2026
AI video generationOpenAIArchive: May 2026
El Sora de OpenAI alguna vez definió la vanguardia de la generación de video con IA, pero se ha estancado en el laboratorio. El Kling de Kuaishou, por el contrario, ha avanzado al priorizar la integración del producto y la eficiencia de costos, revelando que la resistencia, no la velocidad bruta, gana la carrera del video con IA.
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The AI video generation landscape has entered a brutal endurance phase, where the ability to ship a usable product matters more than flashy demos. OpenAI's Sora, launched to global acclaim for its stunning temporal coherence, has effectively stalled. Internal strategic confusion and an over-reliance on scaling laws without solving core product issues—like long-form temporal consistency and inference cost—have left it as a glorified tech demo. Meanwhile, Kuaishou's Kling has executed a quiet but devastating counterattack. By focusing on controllability, real-time editing, and low-cost inference, Kling has embedded itself into the daily workflow of short-video creators, achieving real commercial traction. This contrast is not a story of technological superiority but of product philosophy. Sora aimed for a perfect, long-form generation out of the gate; Kling started with short, editable, and cheap clips. The result is a market where Kling is generating revenue and user lock-in, while Sora remains a promise. The 'treadmill' effect is clear: in AI video, the runner who can keep iterating and integrating into real-world pipelines will outlast the one who tries to sprint to perfection. The industry must now reckon with the fact that model architecture is only half the battle; the other half is ruthless productization.

Technical Deep Dive

The divergence between Sora and Kling is rooted in fundamentally different architectural and engineering choices. Sora, as described in OpenAI's technical report, is a diffusion transformer (DiT) that operates on spacetime patches. It compresses videos into a latent space and then models the temporal and spatial dynamics jointly. This approach is elegant and theoretically powerful, enabling Sora to generate videos up to a minute long with remarkable object persistence and scene coherence. However, this architecture comes with a crippling computational cost. Generating a single minute of high-resolution video with Sora is estimated to require thousands of GPU hours, making it economically unviable for mass deployment. The model's reliance on scaling laws—the belief that simply increasing model size, data, and compute will yield proportional improvements—has hit a wall. Sora struggles with the 'long-tail' of temporal consistency: while it can maintain a character's appearance for 10 seconds, it often fails over 30 seconds, producing subtle but uncanny glitches like flickering textures or objects that morph into unrelated shapes. This is not a problem that more compute alone can solve; it requires architectural innovations like hierarchical temporal modeling or explicit memory mechanisms, which OpenAI has not publicly pursued.

Kling, on the other hand, takes a pragmatic approach. While Kuaishou has not released a full technical paper, the model is understood to be a 3D Variational Autoencoder (VAE) combined with a latent diffusion model, similar to Stable Video Diffusion but optimized for short clips (5-15 seconds). Kling's key innovation is its 'split-and-stitch' inference pipeline. Instead of generating a long video in one shot, it generates short segments and uses a lightweight temporal alignment module to stitch them together. This dramatically reduces inference cost—by an estimated 80% compared to a monolithic DiT of similar quality—and allows for real-time editing. Kling also integrates a 'control net' style conditioning mechanism that lets creators specify camera motion, subject appearance, and background via simple text prompts or reference images. This focus on controllability, rather than raw length, aligns perfectly with the needs of short-video creators who need quick, iterable assets.

| Model | Max Video Length | Inference Cost (est. per 10s clip) | Temporal Consistency (1-10) | Controllability (1-10) | Open Source Status |
|---|---|---|---|---|---|
| Sora (OpenAI) | 60s | $50+ (GPU hours) | 8 | 3 | No |
| Kling (Kuaishou) | 15s (stitchable) | $0.50 | 7 | 9 | No |
| Stable Video Diffusion (Stability AI) | 4s | $0.05 | 5 | 6 | Yes (GitHub: Stability-AI/generative-models, 25k+ stars) |
| Runway Gen-3 Alpha | 10s | $1.00 | 7 | 7 | No |

Data Takeaway: The table reveals a clear trade-off between video length and practical usability. Sora leads in raw capability but is orders of magnitude too expensive for real-world use. Kling achieves a 'good enough' quality at a cost that enables mass deployment. The open-source alternative, Stable Video Diffusion, lags in quality but offers a foundation for customization. The winner is not the most powerful model, but the one that balances quality, cost, and control.

Key Players & Case Studies

The Sora vs. Kling narrative is a case study in contrasting corporate strategies. OpenAI, under Sam Altman, has historically prioritized breakthrough research and massive scale. Sora was born from this culture: a moonshot project that aimed to redefine video generation. However, OpenAI's internal priorities have been fragmented. The company has simultaneously pursued GPT-5, voice mode for ChatGPT, and enterprise AI agents, leaving Sora without a dedicated product team. The result is a model that is technically impressive but has no clear path to market. OpenAI has not released a public API or a consumer-facing product for Sora, and internal leaks suggest that the team is struggling to reduce inference costs below $10 per minute of video—a price point that kills any viable business model. Sora has become a 'demo-ware': a tool used to secure funding and media attention, but not to generate revenue.

Kuaishou, the parent company of Kling, operates in a different universe. As the second-largest short-video platform in China (behind Douyin/TikTok), Kuaishou has a deep understanding of creator workflows. Kling was not developed in a research lab in isolation; it was built in close collaboration with Kuaishou's internal creator tools team. The model was trained on Kuaishou's proprietary dataset of over 100 million short videos, which gives it an inherent advantage in understanding the visual language of short-form content—fast cuts, dynamic camera movements, and meme-like aesthetics. Kuaishou also invested heavily in inference optimization, using a combination of model distillation and quantization to run Kling on consumer-grade GPUs (e.g., NVIDIA RTX 4090) at near-real-time speeds. This allowed Kuaishou to launch Kling as a free tier within its video editing app, directly competing with traditional editing tools. Within three months of launch, Kling was used to generate over 500 million short clips, according to Kuaishou's internal metrics. This is not a theoretical advantage; it is a data flywheel. Every clip generated provides feedback that improves the model, creating a moat that Sora cannot easily cross.

| Company | Product | Launch Date | Monthly Active Users (est.) | Revenue Model | Key Differentiator |
|---|---|---|---|---|---|
| OpenAI | Sora | Feb 2024 (demo) | <100k (limited access) | None (free demos) | Long-form coherence |
| Kuaishou | Kling | Jun 2024 | 15M | Freemium (ads + subscription) | Workflow integration |
| Runway | Gen-3 Alpha | Jun 2024 | 1M | Subscription ($15-$95/mo) | Professional editing tools |
| Pika Labs | Pika 2.0 | Apr 2024 | 500k | Freemium | Community and ease of use |

Data Takeaway: The user numbers tell the story. Kling has 15 million monthly active users, orders of magnitude more than Sora's limited testers. Kuaishou's strategy of embedding AI generation into an existing, popular platform has created a distribution advantage that is nearly impossible to replicate. Sora, despite its technical superiority, has no distribution channel.

Industry Impact & Market Dynamics

The 'treadmill' effect is reshaping the entire AI video market. The initial hype cycle (2023-2024) was dominated by 'wow factor' demos—models that could generate a convincing 10-second clip of a dog flying a plane. Investors poured over $1.5 billion into AI video startups in 2024 alone. But the market is now entering a consolidation phase. The winners will not be those with the best model, but those who can achieve the lowest cost per usable minute of video. This is a classic 'cost curve' race. Kuaishou has already driven the cost of generating a 10-second clip down to $0.50, and internal roadmaps suggest they aim for $0.10 by the end of 2025. At that price, AI-generated video becomes cheaper than stock footage for many use cases, opening up massive markets in advertising, social media content, and even low-budget filmmaking.

This has profound implications for the competitive landscape. OpenAI, with its massive compute budget, could theoretically match Kling's cost, but it lacks the product DNA to do so efficiently. The company's strength is in research, not in building user-facing tools for non-technical creators. Meanwhile, Chinese companies like Kuaishou and ByteDance (which is developing its own AI video model, Jimeng) have a structural advantage: they already own the distribution platforms and the creator ecosystems. They can afford to subsidize AI generation in the short term to build market share, a playbook they perfected in the short-video wars of the 2010s.

The market is also bifurcating. On one side, there is the 'professional' segment (film, advertising, high-end content) where quality and controllability are paramount. Here, Runway and Pika Labs are competing with Sora, offering subscription-based tools that integrate with traditional editing software like Adobe Premiere Pro. On the other side is the 'mass market' segment (social media, UGC, marketing) where cost and speed are king. Kling dominates this space. The danger for Sora is that it falls into a no-man's land: too expensive for mass market, not controllable enough for professionals.

| Market Segment | Key Players | Average Spend per User/Year | Growth Rate (YoY) | Primary Metric |
|---|---|---|---|---|
| Professional (Film/Ad) | Runway, Pika, Sora | $500-$5,000 | 40% | Quality & Control |
| Mass Market (Social/UGC) | Kling, Jimeng, CapCut AI | $0-$100 | 150% | Cost & Speed |
| Enterprise (Training/Sim) | Synthesia, HeyGen | $10,000-$100,000 | 60% | Customization & Reliability |

Data Takeaway: The mass market segment is growing at 150% year-over-year, three times faster than the professional segment. Kling is perfectly positioned to capture this growth. Sora's failure to address this segment is a strategic error that will be difficult to correct.

Risks, Limitations & Open Questions

Despite Kling's success, the AI video industry faces existential risks. The most pressing is the 'uncanny valley' problem. Even the best models, including Kling, produce artifacts that break immersion—fingers that morph, backgrounds that flicker, physics that defy gravity. For professional use, these errors are unacceptable. Kuaishou has mitigated this by limiting generation length to 15 seconds, but this is a crutch, not a solution. The industry needs a breakthrough in temporal consistency that goes beyond scaling.

Second, the regulatory environment is tightening. The EU's AI Act and China's own generative AI regulations require watermarking and provenance tracking for AI-generated content. Kling already embeds invisible watermarks, but compliance costs are rising. More concerning is the potential for misuse: deepfakes, disinformation, and non-consensual synthetic content. Kuaishou has implemented content moderation filters, but these are imperfect. A major scandal involving AI-generated video on a platform like Kuaishou could trigger a regulatory backlash that stifles the entire market.

Third, the open-source threat looms. Stability AI's Stable Video Diffusion, while inferior in quality, is free and customizable. A community-driven model that approaches Kling's quality could disrupt the commercial market overnight. The recent release of Mochi 1 (by Genmo, not open-source but with a permissive license) shows that the gap is closing. If a truly competitive open-source model emerges, Kuaishou's cost advantage could evaporate.

Finally, there is the question of Sora's resurrection. OpenAI has the resources and talent to pivot. If the company decides to prioritize video generation, acquires a distribution channel (e.g., by partnering with a platform like YouTube or Instagram), and optimizes for cost, it could re-enter the race. But this would require a fundamental cultural shift at OpenAI, from research-first to product-first. Given the company's current trajectory, this seems unlikely within the next 12 months.

AINews Verdict & Predictions

Sora is not dead, but it is in a coma. OpenAI's failure to productize its most impressive demo in over a year is a strategic blunder of historic proportions. The company has squandered its first-mover advantage by treating video generation as a research project rather than a product. Kling, meanwhile, has won the first round by being ruthlessly pragmatic. It is not the best model, but it is the best product.

Prediction 1: Within 12 months, Kuaishou will launch a paid API for Kling that undercuts all competitors by at least 50%, forcing Runway and Pika to either consolidate or pivot to niche professional markets.

Prediction 2: OpenAI will eventually spin off Sora into a separate entity or partner with a major platform (likely Microsoft or Instagram) to gain distribution, but this will be a defensive move, not a market-leading one.

Prediction 3: The next major breakthrough in AI video will not come from a model architecture change, but from a 'video operating system'—a platform that combines generation, editing, and distribution into a single, seamless workflow. Kuaishou is already building this. OpenAI is not.

What to watch: The open-source community. If a model like Stable Video Diffusion 2.0 or a new entrant achieves Kling-level quality at zero cost, the entire commercial landscape will be upended. For now, Kling is the king, but the treadmill never stops.

Related topics

AI video generation38 related articlesOpenAI109 related articles

Archive

May 20261212 published articles

Further Reading

El auge del video con IA de ByteDance: Cómo los gigantes tecnológicos chinos están ganando la carrera de comercialización post-SoraLa narrativa en torno al video generado por IA está experimentando un cambio fundamental. El asombro inicial inspirado pMás allá de Sora: Cómo la generación de vídeo por IA se divide entre modelos del mundo y realidades comercialesLa era de perseguir el puro espectáculo técnico en la generación de vídeo por IA, ejemplificada por la revelación iniciaMás allá de Sora: Cómo el nuevo trío BAT de China está redefiniendo la carrera de generación de vídeo con IALa era de Sora como único referente en la generación de vídeo con IA ha terminado. Ha comenzado una nueva y más complejaEl declive estratégico de Sora señala el giro de la IA del espectáculo a la utilidad prácticaLa industria de la IA está experimentando una realineación estratégica profunda. La euforia inicial en torno a los medio

常见问题

这次公司发布“Sora Stalled, Kling Thrives: The AI Video Race Demands Product Grit Over Flashy Demos”主要讲了什么?

The AI video generation landscape has entered a brutal endurance phase, where the ability to ship a usable product matters more than flashy demos. OpenAI's Sora, launched to global…

从“How does Kling's inference cost compare to Sora's per minute of video?”看,这家公司的这次发布为什么值得关注?

The divergence between Sora and Kling is rooted in fundamentally different architectural and engineering choices. Sora, as described in OpenAI's technical report, is a diffusion transformer (DiT) that operates on spaceti…

围绕“What are the key architectural differences between Sora's DiT and Kling's 3D VAE?”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。