OpenAI Ex-CTO's AI Startup Debuts Full-Duplex Chat — But Open Source Beat Her by 3 Months

Q: 围绕“Lilian Weng startup full-duplex AI open source alternative”，这次发布可能带来哪些后续影响？

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。

On May 12, 2025, Thinking Machines Lab (TML) — founded by Lilian Weng, former head of applied research at OpenAI — released its first technical preview: a model capable of 'full-duplex' real-time conversation, simultaneously processing audio, video, and text inputs while generating natural speech. The demo video shows the AI interrupting, asking clarifying questions, and reacting to visual cues without noticeable latency. Yet within hours, AI researchers noted a striking parallel: the exact same set of capabilities — seamless multimodal real-time dialogue — was demonstrated and released as open source by the Chinese company Face Intelligence (面壁智能) with its MiniCPM-o 4.5 model in February 2025. MiniCPM-o 4.5, a 8B-parameter vision-language model, supports real-time speech, image, and video input with full-duplex conversational ability, and its code and weights are freely available on GitHub. The coincidence has ignited a debate about who is truly pushing the frontier of applied AI. While TML's vision is polished and backed by a high-profile team, MiniCPM-o 4.5 has already been deployed in production by several Chinese robotics and customer service companies. The gap between closed-source labs and open-source communities is shrinking fast. This article dissects the technical underpinnings of both systems, compares their real-world performance, and examines what this means for the future of conversational AI.

Technical Deep Dive

The core innovation in both TML's model and MiniCPM-o 4.5 is the integration of full-duplex communication — a term borrowed from telecommunications meaning simultaneous two-way data transmission. In AI, this translates to a model that can listen, process, and speak at the same time, without the turn-taking delays typical of current voice assistants like Siri or Alexa.

Architecture Comparison:

| Feature | Thinking Machines Lab (TML) | MiniCPM-o 4.5 (Face Intelligence) |
|---|---|---|
| Model Size | ~70B parameters (estimated) | 8B parameters |
| Modalities | Text, audio, video (real-time) | Text, audio, video, image (real-time) |
| Full-Duplex Support | Yes (proprietary) | Yes (open source) |
| Latency (end-to-end) | < 300ms (claimed) | ~400ms (measured) |
| Open Source | No | Yes (Apache 2.0) |
| GitHub Stars | N/A | 12,500+ (as of May 2025) |
| Deployment | Cloud API only | Edge + Cloud |

Data Takeaway: MiniCPM-o 4.5 achieves comparable full-duplex performance with nearly 9x fewer parameters, suggesting that efficient architecture design — not raw scale — is the key to real-time interaction.

How Full-Duplex Works:
Both systems use a streaming transformer architecture with a shared multimodal encoder. The key engineering challenge is managing the attention mask — the model must attend to incoming audio while simultaneously generating speech. TML reportedly uses a causal masking with look-ahead technique, where the model predicts the next token while also processing a small buffer of future input. MiniCPM-o 4.5, on the other hand, employs a dual-stream decoder approach, where one stream handles input processing and another handles output generation, synchronized via a lightweight gating mechanism. This is documented in their paper and GitHub repository (repo: `OpenBMB/MiniCPM-o`).

Real-Time Video Integration:
Both models can process live video frames (e.g., from a webcam) and respond to visual context. For example, in TML's demo, the model sees a user holding a book and says, 'That's a great choice — I loved that chapter on neural networks.' MiniCPM-o 4.5's demo shows it identifying objects on a desk and offering to help organize them. The technical difference lies in frame sampling: TML samples at 30 fps, while MiniCPM-o 4.5 uses adaptive sampling (5-15 fps depending on scene complexity) to reduce compute.

Takeaway: MiniCPM-o 4.5's open-source release has already enabled third-party developers to build on top of it — several GitHub forks show integrations with ROS2 for robotics and Twilio for telephony. TML's closed approach may limit its ecosystem growth.

Key Players & Case Studies

Thinking Machines Lab (TML): Founded in late 2024 by Lilian Weng, who spent 7 years at OpenAI leading applied research on ChatGPT and DALL-E. The team includes former engineers from Google Brain, DeepMind, and Meta AI. TML has raised $200M in Series A from a consortium of VC firms including Sequoia and Andreessen Horowitz. Their strategy is to build a premium, enterprise-grade conversational AI platform, targeting customer service, education, and healthcare.

Face Intelligence (面壁智能): A Beijing-based startup founded in 2022 by researchers from Tsinghua University. They have raised approximately $150M in total funding, with backers including Sequoia China and Hillhouse Capital. Their flagship model, MiniCPM-o, is part of a family of efficient multimodal models designed for edge deployment. The company has a strong open-source ethos — their previous model, MiniCPM-V, has over 20,000 GitHub stars.

Competitive Landscape:

| Company | Model | Full-Duplex | Open Source | Primary Use Case |
|---|---|---|---|---|
| Thinking Machines Lab | TML-1 (codename) | Yes | No | Enterprise customer service |
| Face Intelligence | MiniCPM-o 4.5 | Yes | Yes | Robotics, edge devices |
| OpenAI | GPT-4o (voice mode) | Partial (turn-based) | No | General assistant |
| Google | Gemini 2.0 | Partial (streaming) | No | Search, assistant |
| Anthropic | Claude 3.5 | No | No | Enterprise text |

Data Takeaway: The table shows that full-duplex capability is still rare among major players. OpenAI's GPT-4o voice mode, while impressive, still operates in a turn-based manner — users must wait for the model to finish speaking before interrupting. Both TML and MiniCPM-o 4.5 have leapfrogged this limitation.

Case Study: Robotics Integration
A notable early adopter of MiniCPM-o 4.5 is DoraBot, a Chinese robotics startup that builds companion robots for elderly care. They integrated MiniCPM-o 4.5 into their robot's dialogue system, enabling it to carry on natural conversations while monitoring the user's facial expressions and environment. In a published benchmark, the robot achieved a 92% user satisfaction rate, compared to 78% with a traditional turn-based system. This real-world validation underscores the practical value of full-duplex AI.

Industry Impact & Market Dynamics

The simultaneous arrival of TML and MiniCPM-o 4.5 signals a paradigm shift in conversational AI. The market for real-time AI assistants is projected to grow from $5.2B in 2025 to $28.7B by 2030, according to industry estimates. The key battleground is latency and naturalness — users expect AI to behave like a human conversation partner, not a chatbot.

Market Share Projections (2025-2026):

| Segment | Current Leaders | Projected Market Share (2026) | Key Differentiator |
|---|---|---|---|
| Enterprise Customer Service | TML, OpenAI | 35% | Reliability, security |
| Consumer Robotics | Face Intelligence, Google | 25% | Edge deployment, cost |
| Healthcare | TML, Anthropic | 20% | Regulatory compliance |
| Education | Face Intelligence, Khan Academy | 20% | Multilingual support |

Data Takeaway: Face Intelligence's open-source strategy gives it a cost advantage — enterprises can deploy MiniCPM-o 4.5 on their own hardware for a fraction of the cost of TML's API. This could accelerate adoption in price-sensitive markets like education and robotics.

Funding and Valuation:
TML's $200M Series A at a $2B valuation reflects investor enthusiasm for the team's pedigree. However, Face Intelligence's more modest $150M total funding but massive open-source community (over 50,000 developers have downloaded MiniCPM-o 4.5) suggests a different kind of value — network effects through community contributions. The open-source model has already spawned over 200 derivative projects on GitHub, including specialized versions for medical diagnosis and legal consultation.

Risks, Limitations & Open Questions

1. Latency vs. Quality Trade-off:
Full-duplex models must sacrifice some accuracy to achieve low latency. In internal tests, TML's model shows a 5% drop in factual accuracy compared to its non-real-time version. MiniCPM-o 4.5's smaller size exacerbates this — it occasionally hallucinates when processing rapid interruptions. The question is whether users will tolerate occasional errors for the sake of natural interaction.

2. Ethical Concerns:
Real-time, always-listening AI raises privacy red flags. TML's model processes audio locally on-device for sensitive applications, but MiniCPM-o 4.5's open-source nature means anyone can deploy it without safeguards. There is already evidence of the model being used in 'voice phishing' scams in Southeast Asia, where scammers use it to conduct real-time conversations with victims.

3. Open Source Sustainability:
Face Intelligence's open-source model is free, but the company must find a revenue model. They currently offer a cloud API with premium features (e.g., higher accuracy, dedicated support). If they fail to monetize, the project could stagnate. TML, by contrast, has a clear path to revenue through enterprise licensing.

4. The 'Copycat' Narrative:
TML's timing is unfortunate. While they likely developed their technology independently, the perception that they are 'three months late' could hurt their brand. Weng has stated that TML's model is 'fundamentally different' under the hood, but the user-facing experience is nearly identical. This could lead to a PR battle that distracts from actual innovation.

AINews Verdict & Predictions

Editorial Opinion:
The AI community has been obsessed with scaling laws — bigger models, more data, more compute. But TML and MiniCPM-o 4.5 prove that the next frontier is not size, but interaction design. The ability to converse in real time, with all the messiness of human communication — interruptions, visual cues, emotional tone — is a harder problem than simply generating coherent text. Both teams deserve credit for tackling it.

However, Face Intelligence's open-source release is a strategic masterstroke. By making MiniCPM-o 4.5 freely available, they have effectively set the baseline for what real-time AI should look like. Any company that wants to compete must now match or exceed this baseline — and TML's closed-source approach puts them at a disadvantage in terms of community adoption and rapid iteration.

Predictions:
1. By Q3 2025, at least three major open-source models (from Meta, Mistral, and Alibaba) will incorporate full-duplex capabilities, commoditizing the technology.
2. TML will pivot to a niche vertical — likely healthcare or legal — where latency and accuracy are less critical than domain-specific knowledge and compliance.
3. Face Intelligence will be acquired by a larger Chinese tech company (likely ByteDance or Baidu) within 12 months, as its technology becomes strategic for their consumer products.
4. The real winner will be the open-source ecosystem — within two years, full-duplex AI will be a standard feature in every major framework, from Hugging Face Transformers to LangChain.

What to Watch:
- The next release from MiniCPM-o (version 5.0 expected in July 2025) promises 3x lower latency and on-device training.
- TML's first enterprise customer announcement — if they land a major player like Salesforce or Zendesk, it could validate their premium strategy.
- Regulatory developments: The EU's AI Act is expected to classify real-time conversational AI as 'high-risk,' which could slow deployment for both companies.

The race for truly interactive AI is just beginning. And for now, the open-source world is in the lead.

常见问题

这次公司发布“OpenAI Ex-CTO's AI Startup Debuts Full-Duplex Chat — But Open Source Beat Her by 3 Months”主要讲了什么？

On May 12, 2025, Thinking Machines Lab (TML) — founded by Lilian Weng, former head of applied research at OpenAI — released its first technical preview: a model capable of 'full-du…

从“Thinking Machines Lab vs MiniCPM-o 4.5 full-duplex comparison”看，这家公司的这次发布为什么值得关注？

The core innovation in both TML's model and MiniCPM-o 4.5 is the integration of full-duplex communication — a term borrowed from telecommunications meaning simultaneous two-way data transmission. In AI, this translates t…

围绕“Lilian Weng startup full-duplex AI open source alternative”，这次发布可能带来哪些后续影响？