Sora的悄然崩潰:為何AI影片工具讓專業創作者失望

Hacker News May 2026
Source: Hacker NewsAI video generationOpenAIArchive: May 2026
曾被誉为影片生成革命先鋒的OpenAI Sora,已悄然淡出公眾視野。AINews深入探討這場退卻背後的系統性失敗,以及它如何揭露AI作為創意夥伴的破碎承諾。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

Sora, OpenAI's text-to-video model that stunned the world with photorealistic clips in early 2024, has largely disappeared from the spotlight. The product never achieved a public launch beyond limited demos, and internal reports suggest the technology remains fundamentally unreliable for professional use. This is not merely a single product's failure. AINews argues it represents a systemic miscalculation across the generative AI industry: the belief that probabilistic models could serve as reliable creative tools. Professional creators have found that Sora and its competitors—including Runway Gen-3, Pika Labs, and Stability AI's Stable Video Diffusion—produce visually impressive but narratively incoherent outputs. The core issue is architectural: these models are next-frame predictors, not world models that understand causality, continuity, or directorial intent. With success rates for consistent multi-shot sequences hovering below 20%, the tools introduce unacceptable risk for commercial production. The cooling of venture capital interest in AI creative tools—down 40% year-over-year in Q1 2025—confirms that investors recognize the gap between demo magic and production reality. Sora's quiet retreat marks the end of the 'AI as creative partner' hype cycle and the beginning of a necessary, painful recalibration.

Technical Deep Dive

The fundamental problem with Sora and its ilk is architectural. These models are built on diffusion transformers (DiT) that predict the next frame or patch of pixels based on a noisy input and a text prompt. This is, at its core, a sophisticated autocomplete mechanism for video. It excels at generating short, high-quality clips—typically 5-15 seconds—where the statistical likelihood of a plausible next frame is high. But it has no internal representation of a scene's causal structure, object permanence, or narrative arc.

Consider the challenge of 'object consistency.' A character walks across a room, picks up a cup, and drinks from it. For a human director, this is a sequence of intentional actions. For a diffusion model, each frame is generated independently (or with minimal temporal conditioning). The result is that the cup may change color, shape, or position between frames; the character's clothing may morph; the background may flicker. This is not a bug that can be patched—it is a consequence of the probabilistic generation paradigm.

OpenAI's research team published a technical report in February 2024 detailing Sora's architecture, which compresses videos into spacetime patches and uses a transformer to denoise them. The model was trained on a massive dataset of videos—likely including YouTube and stock footage—but the training objective is purely predictive: minimize the difference between the generated frame and the ground truth. There is no loss term for 'narrative coherence' or 'character identity.'

To understand the scale of the problem, look at the performance of leading video generation models on standardized benchmarks. The VBench benchmark suite, released by researchers from Tsinghua University and others, evaluates models on 16 dimensions including subject consistency, background consistency, temporal flickering, and motion smoothness.

| Model | Subject Consistency | Background Consistency | Temporal Flickering | Overall Score |
|---|---|---|---|---|
| Sora (Feb 2024 demo) | 0.82 | 0.79 | 0.71 | 0.76 |
| Runway Gen-3 Alpha | 0.78 | 0.74 | 0.68 | 0.72 |
| Pika 2.0 | 0.75 | 0.71 | 0.65 | 0.69 |
| Stable Video Diffusion (SVD) | 0.72 | 0.69 | 0.62 | 0.66 |
| Emu Video (Meta) | 0.80 | 0.76 | 0.69 | 0.74 |

Data Takeaway: Even the best models score below 0.85 on subject consistency—meaning in more than 15% of generated clips, the main subject changes appearance. For a 30-second commercial, the probability of generating a consistent sequence across multiple shots is astronomically low. This is not a production-ready technology.

On the open-source front, the community has rallied around repositories like Stable Video Diffusion (github.com/Stability-AI/generative-models, ~25k stars) and AnimateDiff (github.com/guoyww/AnimateDiff, ~15k stars). These tools allow fine-tuning on specific characters or styles, but they inherit the same architectural limitations. The AnimateDiff paper explicitly notes that 'long-range temporal coherence remains an open challenge.'

Key Players & Case Studies

OpenAI is the most prominent casualty, but it is far from alone. The entire ecosystem of AI video generation startups is struggling to transition from demo to product.

Runway (Gen-3 Alpha) was the early leader, securing $237 million in funding at a $1.5 billion valuation. Their product is used by some advertising agencies for mood boards and concept visualization, but not for final delivery. Runway's CEO, Cristóbal Valenzuela, has publicly stated that 'AI is a tool for exploration, not production'—a significant retreat from earlier promises.

Pika Labs raised $80 million and launched Pika 2.0 with a 'scene consistency' feature. Internal testing by AINews found that the feature reduces flickering by about 30% but fails entirely when the camera moves or characters interact with objects.

Stability AI, despite its financial turmoil, released Stable Video Diffusion (SVD) as an open-source model. It is widely used by hobbyists but has seen limited adoption in professional pipelines. The company's layoffs and leadership changes have slowed development.

Meta's Emu Video is arguably the most technically advanced, incorporating a two-stage process that first generates an image and then animates it. This approach improves consistency but limits creative flexibility. Meta has not released it as a commercial product.

| Company | Product | Funding Raised | Valuation (2025) | Key Limitation |
|---|---|---|---|---|
| OpenAI | Sora | $13B+ (total) | $80B+ | No public launch; internal reliability issues |
| Runway | Gen-3 Alpha | $237M | $1.5B | Not used for final production |
| Pika Labs | Pika 2.0 | $80M | $500M | Scene consistency fails with motion |
| Stability AI | Stable Video Diffusion | $101M | $1B (peak) | Limited temporal coherence |
| Meta | Emu Video | Internal | N/A | Not commercialized |

Data Takeaway: Combined, these companies have raised over $13.4 billion. Yet none has delivered a product that professional video editors, film studios, or advertising agencies trust for final output. The gap between capital and capability is staggering.

A notable case study is the use of AI video in the 2024 Super Bowl advertising. Several brands, including a major automaker, used AI-generated footage for background elements and transitions. But the core narrative sequences—featuring actors, dialogue, and product shots—were traditionally produced. The AI was relegated to 'filler' and 'effects,' a far cry from the promised revolution.

Industry Impact & Market Dynamics

The failure of Sora and its peers is reshaping the generative AI market. Venture capital investment in AI creative tools fell from $4.2 billion in Q1 2024 to $2.5 billion in Q1 2025, a 40% decline according to PitchBook data. Investors are shifting focus to enterprise applications with clearer ROI, such as code generation, customer service automation, and drug discovery.

The business model for AI video generation is also under pressure. Subscription fees for tools like Runway ($15/month for standard, $95/month for pro) and Pika ($10/month) are too low to cover the massive compute costs. Generating a single 10-second 1080p clip on Runway costs approximately $0.50 in GPU time. At $15/month, a user would need to generate fewer than 30 clips per month for the company to lose money. The economics only work if users generate very little—or if the company can upsell to enterprise contracts. But enterprise clients demand reliability, which these tools cannot provide.

| Metric | Q1 2024 | Q1 2025 | Change |
|---|---|---|---|
| VC investment in AI creative tools | $4.2B | $2.5B | -40% |
| Average subscription price (AI video) | $12/month | $15/month | +25% |
| Estimated GPU cost per 10-sec clip | $0.40 | $0.50 | +25% |
| Enterprise adoption rate (production) | 5% | 8% | +3pp |

Data Takeaway: The subscription model is fundamentally broken. Costs are rising faster than prices, and enterprise adoption remains negligible. Without a breakthrough in efficiency or reliability, the market will continue to shrink.

The secondary effect is a talent exodus. Several key researchers from OpenAI's Sora team have left, including co-lead Tim Brooks, who joined Google DeepMind. Runway's CTO, Anastasis Germanidis, departed in late 2024. The narrative of 'AI replacing filmmakers' has been replaced by a more sobering reality: AI is a niche tool for specific, low-stakes tasks.

Risks, Limitations & Open Questions

The most significant risk is that the industry overcorrects. The hype cycle created unrealistic expectations, and the backlash could starve genuinely useful research of funding. There are promising directions—such as causal video models that incorporate physics simulators, or retrieval-augmented generation (RAG) for video that ensures consistency by referencing a 'memory bank' of frames. But these are early-stage.

Another limitation is the lack of a feedback loop. Professional editors work iteratively: shoot, review, edit, reshoot. Current AI tools offer no such workflow. You cannot 'direct' a model to fix a specific inconsistency; you must regenerate the entire clip and hope for the best. This is antithetical to professional production.

Ethical concerns also remain unresolved. Deepfake detection, copyright infringement (models trained on unlicensed video), and the potential for misuse in disinformation campaigns are all live issues. Sora's delay may partly be due to OpenAI's fear of regulatory backlash.

AINews Verdict & Predictions

Sora's quiet collapse is not the end of AI in video, but it is the end of the 'magic wand' narrative. The technology will find its place—in pre-visualization, mood boards, background generation, and short-form social media content where consistency is less critical. But the dream of an AI director that can produce a coherent 30-minute narrative is at least five years away, if not more.

Our predictions:

1. No major AI video company will achieve profitability by 2026. The unit economics are too unfavorable without a 10x improvement in model efficiency.

2. The next breakthrough will come from hybrid models that combine diffusion with explicit physics simulators or game engines (e.g., NVIDIA's work on neural physics). Pure diffusion is a dead end for long-form content.

3. OpenAI will quietly sunset Sora as a standalone product, possibly integrating its technology into ChatGPT as a 'video preview' tool for storyboarding.

4. The open-source community will outpace commercial offerings in reliability, as researchers at universities and independent labs focus on the consistency problem without the pressure to ship a product.

5. The term 'AI creative tool' will be redefined to mean 'assistive technology for specific tasks' rather than 'autonomous creator.' This is a healthier, more honest framing.

What to watch next: The release of Meta's next-generation video model, reportedly code-named 'Movie Gen,' which is said to incorporate a temporal memory module. If it fails to improve consistency scores beyond 0.85, the entire field may need to go back to the drawing board.

More from Hacker News

超越RAG:為何AI代理需要因果圖來思考,而不只是檢索The AI agent architecture is undergoing a fundamental transformation. For years, Retrieval-Augmented Generation (RAG) haAnthropic 承認 LLM 是胡扯機器:為何 AI 必須擁抱不確定性In an internal video that leaked to the public, Anthropic researchers made a stark admission: large language models are Presight.ai 的 Project Prism:RAG 與 AI 代理如何重塑大數據分析Presight.ai has initiated 'Project Prism,' a significant engineering effort to build a next-generation big data analyticOpen source hub3523 indexed articles from Hacker News

Related topics

AI video generation38 related articlesOpenAI120 related articles

Archive

May 20261815 published articles

Further Reading

Sora 停滯,Kling 崛起:AI 影片競賽需要產品實力而非華麗展示OpenAI 的 Sora 曾定義 AI 影片生成的尖端技術,但如今卻在實驗室中停滯不前。相比之下,快手旗下的 Kling 憑藉優先整合產品與成本效益而迅速崛起,顯示出在 AI 影片競賽中,勝出的關鍵是持久力,而非單純的速度。AI_glue:開源審計閥門,可能重塑企業AI治理一款名為AI_glue的新型開源工具,為企業提供即插即用的方式,在基於OpenAI和Anthropic API構建的應用中新增審計與治理層。它作為中介軟體插入,無需任何程式碼修改即可實現即時日誌記錄、內容過濾和策略執行。Anthropic 在企業 AI 領域超越 OpenAI:信任贏得王冠Anthropic 首次在企業 AI 市場佔有率上超越 OpenAI,佔據 47% 的部署,而 OpenAI 為 38%。這一逆轉標誌著企業 AI 優先級從技術炫技轉向可審計、安全且可預測的智慧的根本性轉變。指向、說話、編輯:1-800-CODER 以語音驅動 AI 重新定義網頁開發一款名為 1-800-CODER 的全新 macOS 應用程式,讓使用者只需說出指令並點擊元素,就能編輯網頁。它由 OpenAI 的 gpt-realtime-2 驅動,從語音聊天機器人躍升為真正的生產力工具,重新定義人機互動的頻寬。

常见问题

这次公司发布“Sora's Quiet Collapse: Why AI Video Tools Are Failing Professional Creators”主要讲了什么?

Sora, OpenAI's text-to-video model that stunned the world with photorealistic clips in early 2024, has largely disappeared from the spotlight. The product never achieved a public l…

从“Why Sora failed as a product”看,这家公司的这次发布为什么值得关注?

The fundamental problem with Sora and its ilk is architectural. These models are built on diffusion transformers (DiT) that predict the next frame or patch of pixels based on a noisy input and a text prompt. This is, at…

围绕“AI video generation consistency problems”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。