Technical Deep Dive
The fundamental disconnect between Zhipu AI and MiniMax's technical prowess and their market reception stems from the nature of their core architectures. Both companies are building on the transformer-based decoder-only paradigm, but their engineering choices reveal different strategic bets.
Zhipu AI's GLM Architecture
Zhipu's GLM (General Language Model) family, particularly GLM-130B and the more recent GLM-4, employs a unique bidirectional attention mechanism combined with autoregressive generation. Unlike GPT-style models that use unidirectional attention, GLM's architecture allows for better understanding of context from both directions during training, which gives it an edge in tasks requiring deep comprehension, such as long-document analysis and complex reasoning. The model's ability to handle up to 128k tokens of context in its latest iteration is a direct result of this design, coupled with optimized sparse attention patterns and FlashAttention-2 integration. This is technically impressive—benchmarks show GLM-4 outperforming GPT-4 on several Chinese-language reasoning tasks.
MiniMax's Multimodal Approach
MiniMax, in contrast, has focused on a modular, multi-expert architecture for its video generation model, Hailuo AI. Their system uses a cascading pipeline: a text-to-image diffusion model (based on a latent diffusion transformer), followed by a temporal attention layer that generates frame sequences, and finally a super-resolution module. The key innovation is the use of a 'motion prior' network trained on a massive dataset of video clips, which allows for more coherent and physically plausible motion compared to frame-by-frame generation. Their voice cloning technology, meanwhile, employs a fine-tuned VALL-E variant with a speaker encoder that can clone a voice from just a 3-second sample, achieving near-zero-shot performance with a mean opinion score (MOS) of 4.2 out of 5.
Benchmark Performance
| Model | MMLU (English) | C-Eval (Chinese) | Long-Context (128k) Accuracy | Video Generation FVD Score | Voice Cloning MOS |
|---|---|---|---|---|---|
| GLM-4 | 86.4 | 78.2 | 92.1% | N/A | N/A |
| MiniMax Hailuo | N/A | N/A | N/A | 12.3 | 4.2 |
| GPT-4o | 88.7 | 72.5 | 90.5% | N/A | N/A |
| Sora (OpenAI) | N/A | N/A | N/A | 10.8 | N/A |
Data Takeaway: While Zhipu's GLM-4 leads on Chinese-language benchmarks, it still trails GPT-4o on general English knowledge. MiniMax's video generation is competitive with Sora on the Fréchet Video Distance (FVD) metric, a key measure of video quality, but the gap is small. The voice cloning performance is world-class. However, these technical wins have not translated into revenue.
Relevant Open-Source Repositories
- ZhipuAI/GLM-130B (GitHub, 40k+ stars): The open-source release of GLM-130B has been a major contribution, allowing researchers to fine-tune and deploy the model locally. The repo includes detailed training scripts, inference optimizations, and a quantization toolkit.
- MiniMax-AI/Hailuo (GitHub, 12k+ stars): The inference code and pre-trained weights for the Hailuo video generation model. The repo is notable for its efficient implementation of the temporal attention module, which reduces memory usage by 30% compared to naive implementations.
Key Players & Case Studies
Zhipu AI: The Academic Prodigy
Founded by a team from Tsinghua University, Zhipu has positioned itself as the 'Chinese GPT' with a strong emphasis on research. Their strategy has been to lead on benchmarks and open-source contributions, building trust with the developer community. Their enterprise product, GLM-Enterprise, offers API access with custom fine-tuning, targeting sectors like finance and legal. However, pricing remains aggressive—at ¥0.8 per 1M tokens for the base model, it is significantly cheaper than OpenAI's API, but this has not yet translated into high-margin revenue. The company has raised over $1 billion in funding from investors like Alibaba, Tencent, and Sequoia China, but its annualized revenue is estimated at only $50-80 million, a fraction of its $4 billion peak valuation.
MiniMax: The Product-First Challenger
MiniMax, led by former SenseTime executive Yan Junjie, has taken a more product-centric approach. Their consumer app, 'Glow', offers AI-powered video and voice creation, and has seen rapid user growth—over 10 million monthly active users. The monetization strategy relies on a freemium model: users get 5 free video generations per day, with a premium subscription at ¥30/month for unlimited usage. While user numbers are impressive, conversion rates are low, estimated at under 2%. Enterprise API access is also available, but the primary use case remains content creation for social media influencers, a market with thin margins and high churn.
Competitive Landscape Comparison
| Company | Primary Product | User Base | Revenue Model | Estimated Monthly Revenue | Valuation (Pre-Crash) |
|---|---|---|---|---|---|
| Zhipu AI | GLM-API, ChatGLM | 5M MAU (ChatGLM) | API calls, enterprise contracts | $4-6M | $4B |
| MiniMax | Glow app, Hailuo API | 10M MAU (Glow) | Freemium subscriptions, API | $2-3M | $2.5B |
| Baidu (ERNIE Bot) | ERNIE Bot, API | 100M MAU | Freemium, ads, enterprise | $30-40M | $50B (public) |
| ByteDance (Doubao) | Doubao app | 50M MAU | Freemium, ads | $15-20M | N/A (private) |
Data Takeaway: Zhipu and MiniMax are orders of magnitude smaller than Baidu and ByteDance in both user base and revenue. Their valuations, however, were inflated by private market exuberance. The A-share market is now demanding a reality check.
Industry Impact & Market Dynamics
The crash of Zhipu and MiniMax's A-share valuations has sent shockwaves through China's AI startup ecosystem. It signals a fundamental shift in investor sentiment from 'growth at all costs' to 'profitability first.' This is particularly devastating for the dozens of other Chinese LLM startups—such as Baichuan, 01.AI, and Stepfun—that were hoping to follow the same path.
Market Data: AI Startup Funding in China
| Year | Total AI Startup Funding (China) | Number of Deals | Average Deal Size | Median Valuation at IPO (AI Companies) |
|---|---|---|---|---|
| 2023 | $12B | 150 | $80M | $3.5B |
| 2024 | $8B | 100 | $80M | $2.0B |
| 2025 (Q1) | $1.5B | 25 | $60M | N/A (no IPOs) |
Data Takeaway: Funding has dropped by 33% year-over-year, and the number of deals has halved. The median valuation at IPO for AI companies has also fallen, reflecting the market's growing skepticism.
The core problem is the 'AI subsidy trap.' To compete with giants like Baidu and ByteDance, which can afford to offer free AI services as loss leaders for their advertising ecosystems, startups like Zhipu and MiniMax must also offer free tiers. This creates a user base that is price-sensitive and unlikely to convert to paying customers. The unit economics are brutal: the cost of inference for a single video generation on MiniMax's Hailuo is approximately ¥0.50, while the average revenue per user (ARPU) from subscriptions is only ¥0.30 per user per month. They are losing money on every user.
Risks, Limitations & Open Questions
1. The 'Model-as-a-Product' Fallacy: The biggest risk is that both companies are selling technology, not solutions. A better benchmark score does not automatically translate to a better product. Enterprise customers care about reliability, security, and integration, not just MMLU scores. Zhipu and MiniMax lack the sales force and customer support infrastructure of established enterprise software vendors.
2. The Open-Source Paradox: While open-sourcing models builds community goodwill, it also commoditizes their core technology. Competitors can take GLM-130B and fine-tune it for their own use, reducing the incentive to pay for Zhipu's API.
3. Regulatory Hurdles: China's new AI regulations require all generative AI services to pass a security review and obtain a license. This process is slow and unpredictable, creating uncertainty for public market investors.
4. Talent Retention: The high burn rate means both companies may need to cut costs, potentially leading to layoffs. This could trigger a talent exodus to better-funded competitors like ByteDance or Alibaba.
AINews Verdict & Predictions
Verdict: The market is right to be skeptical. Zhipu and MiniMax are technically brilliant but commercially immature. Their A-share listing attempts are premature, and the valuation crash is a necessary correction.
Predictions:
1. Downround or Delayed IPO: Zhipu and MiniMax will be forced to either accept a significantly lower valuation (down 50-60% from peak) or postpone their IPO for 12-18 months while they focus on improving unit economics.
2. Pivot to Vertical Solutions: Within the next year, both companies will pivot away from general-purpose models to focus on specific high-margin verticals. Zhipu will likely double down on financial services (e.g., automated report generation, risk analysis), while MiniMax will target the gaming and short-video content creation industry.
3. Consolidation Wave: The failed IPOs will trigger a wave of consolidation. Larger players like Baidu or ByteDance may acquire distressed AI startups at bargain prices. Zhipu, with its strong research team, is a prime acquisition target for Alibaba.
4. The 'Profitability Deadline': By Q3 2026, both companies must demonstrate a clear path to positive gross margin on their core products. If they fail, they will face a liquidity crisis as VC funding dries up.
The homecoming to the A-share market was supposed to be a victory lap. Instead, it has become a brutal reality check. The message from investors is clear: show us the money, not just the models.