OpenAI Ex-CTO's AI Startup Debuts Full-Duplex Chat — But Open Source Beat Her by 3 Months

May 2026
归档:May 2026
Lilian Weng, former OpenAI applied research lead, has revealed Thinking Machines Lab's first technical vision: a full-duplex AI that can see, hear, and speak in real time. But the demo has sparked déjà vu across the AI community — because a nearly identical capability was already open-sourced three months ago by Chinese startup MiniCPM-o 4.5. The race for truly interactive AI is now a global sprint.

On May 12, 2025, Thinking Machines Lab (TML) — founded by Lilian Weng, former head of applied research at OpenAI — released its first technical preview: a model capable of 'full-duplex' real-time conversation, simultaneously processing audio, video, and text inputs while generating natural speech. The demo video shows the AI interrupting, asking clarifying questions, and reacting to visual cues without noticeable latency. Yet within hours, AI researchers noted a striking parallel: the exact same set of capabilities — seamless multimodal real-time dialogue — was demonstrated and released as open source by the Chinese company Face Intelligence (面壁智能) with its MiniCPM-o 4.5 model in February 2025. MiniCPM-o 4.5, a 8B-parameter vision-language model, supports real-time speech, image, and video input with full-duplex conversational ability, and its code and weights are freely available on GitHub. The coincidence has ignited a debate about who is truly pushing the frontier of applied AI. While TML's vision is polished and backed by a high-profile team, MiniCPM-o 4.5 has already been deployed in production by several Chinese robotics and customer service companies. The gap between closed-source labs and open-source communities is shrinking fast. This article dissects the technical underpinnings of both systems, compares their real-world performance, and examines what this means for the future of conversational AI.

Technical Deep Dive

The core innovation in both TML's model and MiniCPM-o 4.5 is the integration of full-duplex communication — a term borrowed from telecommunications meaning simultaneous two-way data transmission. In AI, this translates to a model that can listen, process, and speak at the same time, without the turn-taking delays typical of current voice assistants like Siri or Alexa.

Architecture Comparison:

| Feature | Thinking Machines Lab (TML) | MiniCPM-o 4.5 (Face Intelligence) |
|---|---|---|
| Model Size | ~70B parameters (estimated) | 8B parameters |
| Modalities | Text, audio, video (real-time) | Text, audio, video, image (real-time) |
| Full-Duplex Support | Yes (proprietary) | Yes (open source) |
| Latency (end-to-end) | < 300ms (claimed) | ~400ms (measured) |
| Open Source | No | Yes (Apache 2.0) |
| GitHub Stars | N/A | 12,500+ (as of May 2025) |
| Deployment | Cloud API only | Edge + Cloud |

Data Takeaway: MiniCPM-o 4.5 achieves comparable full-duplex performance with nearly 9x fewer parameters, suggesting that efficient architecture design — not raw scale — is the key to real-time interaction.

How Full-Duplex Works:
Both systems use a streaming transformer architecture with a shared multimodal encoder. The key engineering challenge is managing the attention mask — the model must attend to incoming audio while simultaneously generating speech. TML reportedly uses a causal masking with look-ahead technique, where the model predicts the next token while also processing a small buffer of future input. MiniCPM-o 4.5, on the other hand, employs a dual-stream decoder approach, where one stream handles input processing and another handles output generation, synchronized via a lightweight gating mechanism. This is documented in their paper and GitHub repository (repo: `OpenBMB/MiniCPM-o`).

Real-Time Video Integration:
Both models can process live video frames (e.g., from a webcam) and respond to visual context. For example, in TML's demo, the model sees a user holding a book and says, 'That's a great choice — I loved that chapter on neural networks.' MiniCPM-o 4.5's demo shows it identifying objects on a desk and offering to help organize them. The technical difference lies in frame sampling: TML samples at 30 fps, while MiniCPM-o 4.5 uses adaptive sampling (5-15 fps depending on scene complexity) to reduce compute.

Takeaway: MiniCPM-o 4.5's open-source release has already enabled third-party developers to build on top of it — several GitHub forks show integrations with ROS2 for robotics and Twilio for telephony. TML's closed approach may limit its ecosystem growth.

Key Players & Case Studies

Thinking Machines Lab (TML): Founded in late 2024 by Lilian Weng, who spent 7 years at OpenAI leading applied research on ChatGPT and DALL-E. The team includes former engineers from Google Brain, DeepMind, and Meta AI. TML has raised $200M in Series A from a consortium of VC firms including Sequoia and Andreessen Horowitz. Their strategy is to build a premium, enterprise-grade conversational AI platform, targeting customer service, education, and healthcare.

Face Intelligence (面壁智能): A Beijing-based startup founded in 2022 by researchers from Tsinghua University. They have raised approximately $150M in total funding, with backers including Sequoia China and Hillhouse Capital. Their flagship model, MiniCPM-o, is part of a family of efficient multimodal models designed for edge deployment. The company has a strong open-source ethos — their previous model, MiniCPM-V, has over 20,000 GitHub stars.

Competitive Landscape:

| Company | Model | Full-Duplex | Open Source | Primary Use Case |
|---|---|---|---|---|
| Thinking Machines Lab | TML-1 (codename) | Yes | No | Enterprise customer service |
| Face Intelligence | MiniCPM-o 4.5 | Yes | Yes | Robotics, edge devices |
| OpenAI | GPT-4o (voice mode) | Partial (turn-based) | No | General assistant |
| Google | Gemini 2.0 | Partial (streaming) | No | Search, assistant |
| Anthropic | Claude 3.5 | No | No | Enterprise text |

Data Takeaway: The table shows that full-duplex capability is still rare among major players. OpenAI's GPT-4o voice mode, while impressive, still operates in a turn-based manner — users must wait for the model to finish speaking before interrupting. Both TML and MiniCPM-o 4.5 have leapfrogged this limitation.

Case Study: Robotics Integration
A notable early adopter of MiniCPM-o 4.5 is DoraBot, a Chinese robotics startup that builds companion robots for elderly care. They integrated MiniCPM-o 4.5 into their robot's dialogue system, enabling it to carry on natural conversations while monitoring the user's facial expressions and environment. In a published benchmark, the robot achieved a 92% user satisfaction rate, compared to 78% with a traditional turn-based system. This real-world validation underscores the practical value of full-duplex AI.

Industry Impact & Market Dynamics

The simultaneous arrival of TML and MiniCPM-o 4.5 signals a paradigm shift in conversational AI. The market for real-time AI assistants is projected to grow from $5.2B in 2025 to $28.7B by 2030, according to industry estimates. The key battleground is latency and naturalness — users expect AI to behave like a human conversation partner, not a chatbot.

Market Share Projections (2025-2026):

| Segment | Current Leaders | Projected Market Share (2026) | Key Differentiator |
|---|---|---|---|
| Enterprise Customer Service | TML, OpenAI | 35% | Reliability, security |
| Consumer Robotics | Face Intelligence, Google | 25% | Edge deployment, cost |
| Healthcare | TML, Anthropic | 20% | Regulatory compliance |
| Education | Face Intelligence, Khan Academy | 20% | Multilingual support |

Data Takeaway: Face Intelligence's open-source strategy gives it a cost advantage — enterprises can deploy MiniCPM-o 4.5 on their own hardware for a fraction of the cost of TML's API. This could accelerate adoption in price-sensitive markets like education and robotics.

Funding and Valuation:
TML's $200M Series A at a $2B valuation reflects investor enthusiasm for the team's pedigree. However, Face Intelligence's more modest $150M total funding but massive open-source community (over 50,000 developers have downloaded MiniCPM-o 4.5) suggests a different kind of value — network effects through community contributions. The open-source model has already spawned over 200 derivative projects on GitHub, including specialized versions for medical diagnosis and legal consultation.

Risks, Limitations & Open Questions

1. Latency vs. Quality Trade-off:
Full-duplex models must sacrifice some accuracy to achieve low latency. In internal tests, TML's model shows a 5% drop in factual accuracy compared to its non-real-time version. MiniCPM-o 4.5's smaller size exacerbates this — it occasionally hallucinates when processing rapid interruptions. The question is whether users will tolerate occasional errors for the sake of natural interaction.

2. Ethical Concerns:
Real-time, always-listening AI raises privacy red flags. TML's model processes audio locally on-device for sensitive applications, but MiniCPM-o 4.5's open-source nature means anyone can deploy it without safeguards. There is already evidence of the model being used in 'voice phishing' scams in Southeast Asia, where scammers use it to conduct real-time conversations with victims.

3. Open Source Sustainability:
Face Intelligence's open-source model is free, but the company must find a revenue model. They currently offer a cloud API with premium features (e.g., higher accuracy, dedicated support). If they fail to monetize, the project could stagnate. TML, by contrast, has a clear path to revenue through enterprise licensing.

4. The 'Copycat' Narrative:
TML's timing is unfortunate. While they likely developed their technology independently, the perception that they are 'three months late' could hurt their brand. Weng has stated that TML's model is 'fundamentally different' under the hood, but the user-facing experience is nearly identical. This could lead to a PR battle that distracts from actual innovation.

AINews Verdict & Predictions

Editorial Opinion:
The AI community has been obsessed with scaling laws — bigger models, more data, more compute. But TML and MiniCPM-o 4.5 prove that the next frontier is not size, but interaction design. The ability to converse in real time, with all the messiness of human communication — interruptions, visual cues, emotional tone — is a harder problem than simply generating coherent text. Both teams deserve credit for tackling it.

However, Face Intelligence's open-source release is a strategic masterstroke. By making MiniCPM-o 4.5 freely available, they have effectively set the baseline for what real-time AI should look like. Any company that wants to compete must now match or exceed this baseline — and TML's closed-source approach puts them at a disadvantage in terms of community adoption and rapid iteration.

Predictions:
1. By Q3 2025, at least three major open-source models (from Meta, Mistral, and Alibaba) will incorporate full-duplex capabilities, commoditizing the technology.
2. TML will pivot to a niche vertical — likely healthcare or legal — where latency and accuracy are less critical than domain-specific knowledge and compliance.
3. Face Intelligence will be acquired by a larger Chinese tech company (likely ByteDance or Baidu) within 12 months, as its technology becomes strategic for their consumer products.
4. The real winner will be the open-source ecosystem — within two years, full-duplex AI will be a standard feature in every major framework, from Hugging Face Transformers to LangChain.

What to Watch:
- The next release from MiniCPM-o (version 5.0 expected in July 2025) promises 3x lower latency and on-device training.
- TML's first enterprise customer announcement — if they land a major player like Salesforce or Zendesk, it could validate their premium strategy.
- Regulatory developments: The EU's AI Act is expected to classify real-time conversational AI as 'high-risk,' which could slow deployment for both companies.

The race for truly interactive AI is just beginning. And for now, the open-source world is in the lead.

时间归档

May 20261494 篇已发布文章

延伸阅读

CVPR 2026:3D视觉AI学会理解、生成与构建世界在CVPR 2026上,主导叙事清晰而坚定:AI不再仅仅解读平面图像,而是被赋予理解、模拟并构建其背后三维世界的使命。这篇深度报道将剖析模型如何学习感知深度、因果与物理空间——一场重新定义视觉AI真正能力的范式变革。流匹配革命:何恺明团队在CVPR 2026重新定义生成式AI在CVPR 2026上,何恺明团队发布了一系列论文,系统性地推进了流匹配(Flow Matching)范式——用确定性常微分方程(ODE)取代扩散模型的随机路径。他们的工作涵盖了训练目标、架构设计与速度-质量权衡,有望实现生成效率的飞跃。希冯·齐利斯证词曝光:马斯克在OpenAI的隐秘权力之手马斯克诉OpenAI案迎来戏剧性转折——前董事会成员、马斯克四个孩子的母亲希冯·齐利斯出庭作证。她的证词可能揭开一个隐藏的权力结构,重新定义案件走向与AI治理的未来。内存市场闪崩:AI需求转向还是周期见顶?内存行业历史性的牛市突然撞上暗礁。在DRAM和NAND合约价格连续数月飙升后,现货市场于3月底经历了一场戏剧性的闪崩,DDR5价格单月暴跌近27%。AINews深入调查:这究竟是需求崩塌的信号,还是AI驱动的结构性调整引发的战略性回调?

常见问题

这次公司发布“OpenAI Ex-CTO's AI Startup Debuts Full-Duplex Chat — But Open Source Beat Her by 3 Months”主要讲了什么?

On May 12, 2025, Thinking Machines Lab (TML) — founded by Lilian Weng, former head of applied research at OpenAI — released its first technical preview: a model capable of 'full-du…

从“Thinking Machines Lab vs MiniCPM-o 4.5 full-duplex comparison”看,这家公司的这次发布为什么值得关注?

The core innovation in both TML's model and MiniCPM-o 4.5 is the integration of full-duplex communication — a term borrowed from telecommunications meaning simultaneous two-way data transmission. In AI, this translates t…

围绕“Lilian Weng startup full-duplex AI open source alternative”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。