OpenAI Ex-CTO's AI Startup Debuts Full-Duplex Chat — But Open Source Beat Her by 3 Months

May 2026
归档:May 2026
Lilian Weng, former OpenAI applied research lead, has revealed Thinking Machines Lab's first technical vision: a full-duplex AI that can see, hear, and speak in real time. But the demo has sparked déjà vu across the AI community — because a nearly identical capability was already open-sourced three months ago by Chinese startup MiniCPM-o 4.5. The race for truly interactive AI is now a global sprint.

On May 12, 2025, Thinking Machines Lab (TML) — founded by Lilian Weng, former head of applied research at OpenAI — released its first technical preview: a model capable of 'full-duplex' real-time conversation, simultaneously processing audio, video, and text inputs while generating natural speech. The demo video shows the AI interrupting, asking clarifying questions, and reacting to visual cues without noticeable latency. Yet within hours, AI researchers noted a striking parallel: the exact same set of capabilities — seamless multimodal real-time dialogue — was demonstrated and released as open source by the Chinese company Face Intelligence (面壁智能) with its MiniCPM-o 4.5 model in February 2025. MiniCPM-o 4.5, a 8B-parameter vision-language model, supports real-time speech, image, and video input with full-duplex conversational ability, and its code and weights are freely available on GitHub. The coincidence has ignited a debate about who is truly pushing the frontier of applied AI. While TML's vision is polished and backed by a high-profile team, MiniCPM-o 4.5 has already been deployed in production by several Chinese robotics and customer service companies. The gap between closed-source labs and open-source communities is shrinking fast. This article dissects the technical underpinnings of both systems, compares their real-world performance, and examines what this means for the future of conversational AI.

Technical Deep Dive

The core innovation in both TML's model and MiniCPM-o 4.5 is the integration of full-duplex communication — a term borrowed from telecommunications meaning simultaneous two-way data transmission. In AI, this translates to a model that can listen, process, and speak at the same time, without the turn-taking delays typical of current voice assistants like Siri or Alexa.

Architecture Comparison:

| Feature | Thinking Machines Lab (TML) | MiniCPM-o 4.5 (Face Intelligence) |
|---|---|---|
| Model Size | ~70B parameters (estimated) | 8B parameters |
| Modalities | Text, audio, video (real-time) | Text, audio, video, image (real-time) |
| Full-Duplex Support | Yes (proprietary) | Yes (open source) |
| Latency (end-to-end) | < 300ms (claimed) | ~400ms (measured) |
| Open Source | No | Yes (Apache 2.0) |
| GitHub Stars | N/A | 12,500+ (as of May 2025) |
| Deployment | Cloud API only | Edge + Cloud |

Data Takeaway: MiniCPM-o 4.5 achieves comparable full-duplex performance with nearly 9x fewer parameters, suggesting that efficient architecture design — not raw scale — is the key to real-time interaction.

How Full-Duplex Works:
Both systems use a streaming transformer architecture with a shared multimodal encoder. The key engineering challenge is managing the attention mask — the model must attend to incoming audio while simultaneously generating speech. TML reportedly uses a causal masking with look-ahead technique, where the model predicts the next token while also processing a small buffer of future input. MiniCPM-o 4.5, on the other hand, employs a dual-stream decoder approach, where one stream handles input processing and another handles output generation, synchronized via a lightweight gating mechanism. This is documented in their paper and GitHub repository (repo: `OpenBMB/MiniCPM-o`).

Real-Time Video Integration:
Both models can process live video frames (e.g., from a webcam) and respond to visual context. For example, in TML's demo, the model sees a user holding a book and says, 'That's a great choice — I loved that chapter on neural networks.' MiniCPM-o 4.5's demo shows it identifying objects on a desk and offering to help organize them. The technical difference lies in frame sampling: TML samples at 30 fps, while MiniCPM-o 4.5 uses adaptive sampling (5-15 fps depending on scene complexity) to reduce compute.

Takeaway: MiniCPM-o 4.5's open-source release has already enabled third-party developers to build on top of it — several GitHub forks show integrations with ROS2 for robotics and Twilio for telephony. TML's closed approach may limit its ecosystem growth.

Key Players & Case Studies

Thinking Machines Lab (TML): Founded in late 2024 by Lilian Weng, who spent 7 years at OpenAI leading applied research on ChatGPT and DALL-E. The team includes former engineers from Google Brain, DeepMind, and Meta AI. TML has raised $200M in Series A from a consortium of VC firms including Sequoia and Andreessen Horowitz. Their strategy is to build a premium, enterprise-grade conversational AI platform, targeting customer service, education, and healthcare.

Face Intelligence (面壁智能): A Beijing-based startup founded in 2022 by researchers from Tsinghua University. They have raised approximately $150M in total funding, with backers including Sequoia China and Hillhouse Capital. Their flagship model, MiniCPM-o, is part of a family of efficient multimodal models designed for edge deployment. The company has a strong open-source ethos — their previous model, MiniCPM-V, has over 20,000 GitHub stars.

Competitive Landscape:

| Company | Model | Full-Duplex | Open Source | Primary Use Case |
|---|---|---|---|---|
| Thinking Machines Lab | TML-1 (codename) | Yes | No | Enterprise customer service |
| Face Intelligence | MiniCPM-o 4.5 | Yes | Yes | Robotics, edge devices |
| OpenAI | GPT-4o (voice mode) | Partial (turn-based) | No | General assistant |
| Google | Gemini 2.0 | Partial (streaming) | No | Search, assistant |
| Anthropic | Claude 3.5 | No | No | Enterprise text |

Data Takeaway: The table shows that full-duplex capability is still rare among major players. OpenAI's GPT-4o voice mode, while impressive, still operates in a turn-based manner — users must wait for the model to finish speaking before interrupting. Both TML and MiniCPM-o 4.5 have leapfrogged this limitation.

Case Study: Robotics Integration
A notable early adopter of MiniCPM-o 4.5 is DoraBot, a Chinese robotics startup that builds companion robots for elderly care. They integrated MiniCPM-o 4.5 into their robot's dialogue system, enabling it to carry on natural conversations while monitoring the user's facial expressions and environment. In a published benchmark, the robot achieved a 92% user satisfaction rate, compared to 78% with a traditional turn-based system. This real-world validation underscores the practical value of full-duplex AI.

Industry Impact & Market Dynamics

The simultaneous arrival of TML and MiniCPM-o 4.5 signals a paradigm shift in conversational AI. The market for real-time AI assistants is projected to grow from $5.2B in 2025 to $28.7B by 2030, according to industry estimates. The key battleground is latency and naturalness — users expect AI to behave like a human conversation partner, not a chatbot.

Market Share Projections (2025-2026):

| Segment | Current Leaders | Projected Market Share (2026) | Key Differentiator |
|---|---|---|---|
| Enterprise Customer Service | TML, OpenAI | 35% | Reliability, security |
| Consumer Robotics | Face Intelligence, Google | 25% | Edge deployment, cost |
| Healthcare | TML, Anthropic | 20% | Regulatory compliance |
| Education | Face Intelligence, Khan Academy | 20% | Multilingual support |

Data Takeaway: Face Intelligence's open-source strategy gives it a cost advantage — enterprises can deploy MiniCPM-o 4.5 on their own hardware for a fraction of the cost of TML's API. This could accelerate adoption in price-sensitive markets like education and robotics.

Funding and Valuation:
TML's $200M Series A at a $2B valuation reflects investor enthusiasm for the team's pedigree. However, Face Intelligence's more modest $150M total funding but massive open-source community (over 50,000 developers have downloaded MiniCPM-o 4.5) suggests a different kind of value — network effects through community contributions. The open-source model has already spawned over 200 derivative projects on GitHub, including specialized versions for medical diagnosis and legal consultation.

Risks, Limitations & Open Questions

1. Latency vs. Quality Trade-off:
Full-duplex models must sacrifice some accuracy to achieve low latency. In internal tests, TML's model shows a 5% drop in factual accuracy compared to its non-real-time version. MiniCPM-o 4.5's smaller size exacerbates this — it occasionally hallucinates when processing rapid interruptions. The question is whether users will tolerate occasional errors for the sake of natural interaction.

2. Ethical Concerns:
Real-time, always-listening AI raises privacy red flags. TML's model processes audio locally on-device for sensitive applications, but MiniCPM-o 4.5's open-source nature means anyone can deploy it without safeguards. There is already evidence of the model being used in 'voice phishing' scams in Southeast Asia, where scammers use it to conduct real-time conversations with victims.

3. Open Source Sustainability:
Face Intelligence's open-source model is free, but the company must find a revenue model. They currently offer a cloud API with premium features (e.g., higher accuracy, dedicated support). If they fail to monetize, the project could stagnate. TML, by contrast, has a clear path to revenue through enterprise licensing.

4. The 'Copycat' Narrative:
TML's timing is unfortunate. While they likely developed their technology independently, the perception that they are 'three months late' could hurt their brand. Weng has stated that TML's model is 'fundamentally different' under the hood, but the user-facing experience is nearly identical. This could lead to a PR battle that distracts from actual innovation.

AINews Verdict & Predictions

Editorial Opinion:
The AI community has been obsessed with scaling laws — bigger models, more data, more compute. But TML and MiniCPM-o 4.5 prove that the next frontier is not size, but interaction design. The ability to converse in real time, with all the messiness of human communication — interruptions, visual cues, emotional tone — is a harder problem than simply generating coherent text. Both teams deserve credit for tackling it.

However, Face Intelligence's open-source release is a strategic masterstroke. By making MiniCPM-o 4.5 freely available, they have effectively set the baseline for what real-time AI should look like. Any company that wants to compete must now match or exceed this baseline — and TML's closed-source approach puts them at a disadvantage in terms of community adoption and rapid iteration.

Predictions:
1. By Q3 2025, at least three major open-source models (from Meta, Mistral, and Alibaba) will incorporate full-duplex capabilities, commoditizing the technology.
2. TML will pivot to a niche vertical — likely healthcare or legal — where latency and accuracy are less critical than domain-specific knowledge and compliance.
3. Face Intelligence will be acquired by a larger Chinese tech company (likely ByteDance or Baidu) within 12 months, as its technology becomes strategic for their consumer products.
4. The real winner will be the open-source ecosystem — within two years, full-duplex AI will be a standard feature in every major framework, from Hugging Face Transformers to LangChain.

What to Watch:
- The next release from MiniCPM-o (version 5.0 expected in July 2025) promises 3x lower latency and on-device training.
- TML's first enterprise customer announcement — if they land a major player like Salesforce or Zendesk, it could validate their premium strategy.
- Regulatory developments: The EU's AI Act is expected to classify real-time conversational AI as 'high-risk,' which could slow deployment for both companies.

The race for truly interactive AI is just beginning. And for now, the open-source world is in the lead.

时间归档

May 20263028 篇已发布文章

延伸阅读

SEER Transformer:统一应对噪声、异常与缺失数据的鲁棒时序预测新范式ICML 2026 最新研究提出 SEER 框架,通过自动增强与替换输入补丁,首次在单一 Transformer 架构中统一应对噪声、异常、缺失值与分布偏移四大数据质量问题,无需任务特定预处理即实现最先进性能,标志着从碎片化解决方案向单一弹豆包Pro:字节跳动重塑AI,从聊天机器人到自主办公代理字节跳动正式推出豆包Pro,一款超越简单问答、能自主执行多步骤办公任务的专业级AI助手。基于全新豆包2.1系列模型,它能将“准备季度报告”这类模糊指令视为完整项目,拆解并交付精良成果。这标志着从“生成答案”到“完成任务”的战略转型。REViT亮相ICML 2026:CNN的最后一战,让Transformer真正学会“旋转”在ICML 2026上,全新架构REViT将CNN的旋转等变超能力注入Vision Transformer,一举攻克了空间理解中的关键盲区。这一突破有望在医学诊断和自动驾驶领域带来更可靠的AI,同时也可能标志着CNN范式最后一次重大创新。AI智能体已能构建可玩游戏:Claude Opus在GameCraft-Bench中达成40%可玩率一项全新的协作基准测试GameCraft-Bench揭示,先进编码智能体如今可从单一提示词生成可玩游戏。Claude Opus实现了近40%的可玩率,标志着从静态代码生成向动态交互系统构建的关键转变。

常见问题

这次公司发布“OpenAI Ex-CTO's AI Startup Debuts Full-Duplex Chat — But Open Source Beat Her by 3 Months”主要讲了什么?

On May 12, 2025, Thinking Machines Lab (TML) — founded by Lilian Weng, former head of applied research at OpenAI — released its first technical preview: a model capable of 'full-du…

从“Thinking Machines Lab vs MiniCPM-o 4.5 full-duplex comparison”看,这家公司的这次发布为什么值得关注?

The core innovation in both TML's model and MiniCPM-o 4.5 is the integration of full-duplex communication — a term borrowed from telecommunications meaning simultaneous two-way data transmission. In AI, this translates t…

围绕“Lilian Weng startup full-duplex AI open source alternative”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。