Why NetEase Zhiqi Ditched Token Pricing: AI Value Shifts From Compute To Outcomes

For years, the enterprise AI market has operated on a simple premise: sell tokens, charge per query, and let customers figure out the value. NetEase Zhiqi, the B2B arm of NetEase, has broken that mold. Instead of pricing AI by the token, the company has integrated large language models into the core of its communication platform—voice calls, messaging, customer service workflows—so that every interaction becomes an execution trigger for an intelligent agent. The pricing model has flipped: customers now pay for completed tasks, improved resolution rates, and quantifiable business outcomes rather than raw compute consumption.

This is not a minor pricing tweak. It represents a fundamental rethinking of what enterprise AI should be. In the old model, vendors profited from inefficiency—more tokens meant more revenue. In NetEase Zhiqi's model, the vendor is incentivized to make AI as effective and efficient as possible, because revenue is tied to results, not volume. The company's communication stack, which already handles billions of interactions annually across sectors like fintech, e-commerce, and healthcare, now runs inference as a native function rather than an add-on service.

The significance extends beyond one company. If this model gains traction, it could force the entire enterprise AI ecosystem—from cloud providers to SaaS platforms—to rethink value propositions. The era of selling GPU time and token counts may be giving way to an era where AI vendors must prove their worth in concrete business metrics. NetEase Zhiqi's move is a bet that the real moat in enterprise AI is not model size, but the ability to translate technology into measurable ROI for the customer.

Technical Deep Dive

NetEase Zhiqi's architectural shift is deceptively simple in concept but technically demanding. The company has taken its existing real-time communication (RTC) and customer engagement platform—which handles voice, video, and messaging at scale—and embedded inference engines directly into the data plane. Instead of routing audio or text to a separate AI service that returns a response, the platform now runs lightweight LLM inference on the same infrastructure that manages the communication stream.

Architecture details: The system uses a hybrid approach. For latency-sensitive tasks like real-time voice transcription and intent detection, it employs distilled versions of open-source models (Qwen2.5-7B and Llama-3.1-8B) quantized to 4-bit precision, running on custom ASICs within NetEase's edge nodes. For complex reasoning tasks—multi-turn negotiation, compliance checks, escalation decisions—the platform falls back to larger models (Qwen2.5-72B) hosted on dedicated GPU clusters. The key innovation is a "smart router" that classifies each interaction's complexity in under 5 milliseconds and dispatches it to the appropriate inference tier. This tiered approach keeps average latency below 200ms for 95% of interactions while controlling costs.

Open-source contributions: NetEase Zhiqi has released a related optimization toolkit on GitHub called "InferEdge" (currently at 3,200 stars), which provides quantization-aware training scripts and a custom CUDA kernel for efficient attention computation on edge devices. The repo has seen active development with 12 releases in the past quarter, and the team claims a 2.3x throughput improvement over standard vLLM deployment for 7B-class models.

Performance benchmarks: Internal testing shows the tiered system achieves a 94.7% task completion rate on a proprietary benchmark of 10,000 customer service scenarios, compared to 91.2% for a single-model approach using GPT-4o-mini. Cost per interaction dropped 47% due to reduced GPU hours.

| Metric | NetEase Zhiqi Tiered System | Single-Model (GPT-4o-mini) | Single-Model (Llama-3.1-70B) |
|---|---|---|---|
| Task Completion Rate | 94.7% | 91.2% | 93.1% |
| Average Latency (p95) | 180ms | 420ms | 650ms |
| Cost per 1M Interactions | $1,240 | $2,340 | $3,100 |
| GPU Utilization | 68% | 55% | 42% |

Data Takeaway: The tiered architecture delivers a 47% cost reduction and 57% latency improvement over the GPT-4o-mini baseline while actually improving task completion. This validates that for many enterprise use cases, smaller, specialized models running on optimized infrastructure can outperform monolithic cloud-based solutions, especially when latency and cost are critical.

Key Players & Case Studies

NetEase Zhiqi is not the only company experimenting with outcome-based pricing, but it is the first major Chinese B2B vendor to fully commit to this model. The company's parent, NetEase, brings a unique advantage: deep experience in both consumer AI (NetEase Youdao for education, NetEase Music for recommendations) and enterprise communication (NetEase Cloud Communication, which powers services for 100,000+ business clients).

Competing approaches:
- Zendesk has introduced "AI agent" pricing at $99 per agent per month, but still charges per resolution, creating a hybrid model.
- Intercom charges per resolution for its Fin AI agent, but the resolution definition is narrow (ticket closed).
- Salesforce Einstein GPT remains on a per-user license model, with no outcome-based component.
- Twilio charges per API call for its CustomerAI, still token-adjacent.

NetEase Zhiqi's differentiation is its deep integration: because the AI is embedded in the communication layer, it can track outcomes across the entire customer journey—from first contact to resolution to follow-up—rather than just a single interaction.

| Company | Pricing Model | Outcome Metric | Integration Depth |
|---|---|---|---|
| NetEase Zhiqi | Outcome-based | Task completion, CSAT, resolution time | Full-stack (voice, chat, email) |
| Zendesk | Hybrid (agent + resolution) | Ticket closure | Chat + email only |
| Intercom | Per resolution | Ticket closure | Chat + email only |
| Salesforce | Per-user license | None (license-based) | CRM ecosystem |
| Twilio | Per API call | None (usage-based) | Communication APIs |

Data Takeaway: NetEase Zhiqi is the only vendor offering a truly outcome-based model with full-stack integration. Competitors either use hybrid models that still protect usage revenue or lack the infrastructure to measure outcomes across channels. This gives NetEase Zhiqi a first-mover advantage in the outcome-pricing space, but also means it bears more risk if outcomes underperform.

Case study: Ping An Insurance
One early adopter is Ping An Insurance, which deployed NetEase Zhiqi's platform for claims processing. The system handles 2.3 million interactions per month, with AI agents managing first-level triage. After six months, Ping An reported a 34% reduction in average handling time, a 28% increase in first-contact resolution, and a 12% improvement in customer satisfaction scores. NetEase Zhiqi's revenue from the account is directly tied to these metrics, not to the number of tokens consumed.

Industry Impact & Market Dynamics

The shift from token-based to outcome-based pricing could be as disruptive to enterprise AI as the shift from on-premise to SaaS was to enterprise software. The logic is compelling: if AI is truly a productivity multiplier, vendors should be paid for the productivity gains they deliver, not for the compute they consume.

Market data: The global enterprise AI market is projected to reach $185 billion by 2028, according to industry estimates. Currently, over 80% of enterprise AI spending is on infrastructure and compute (GPUs, cloud instances), with only 20% on application-layer services. Outcome-based pricing could flip this ratio, as vendors take on more of the compute risk and charge based on results.

| Year | Token-Based Pricing Market Share | Outcome-Based Pricing Market Share | Total Enterprise AI Spend (USD) |
|---|---|---|---|
| 2024 | 78% | 5% | $102B |
| 2025 (est.) | 70% | 12% | $128B |
| 2026 (est.) | 58% | 25% | $155B |
| 2028 (est.) | 40% | 42% | $185B |

Data Takeaway: If current trends hold, outcome-based pricing could become the dominant model by 2028, capturing over 40% of enterprise AI spend. This would represent a $77 billion market shift away from pure compute providers toward value-added AI services.

Second-order effects:
1. GPU demand may stabilize: If vendors optimize for outcomes rather than token volume, they will invest in model distillation, quantization, and efficient architectures rather than simply buying more H100s. This could moderate the explosive GPU demand growth.
2. Open-source models gain ground: Outcome-based pricing favors smaller, specialized models that can run on commodity hardware. This is a tailwind for the open-source ecosystem (Llama, Qwen, Mistral) and a headwind for proprietary API-based models.
3. New metrics and auditing: Outcome-based contracts require auditable metrics. This creates a new market for third-party AI performance auditors and standardized benchmarks for task completion, accuracy, and customer satisfaction.

Risks, Limitations & Open Questions

Measurement challenges: Defining and measuring "outcomes" is fraught with difficulty. Is a resolved ticket a good outcome if the customer was unhappy with the resolution? Should vendors be penalized for factors outside their control (e.g., product quality issues that generate more support tickets)? NetEase Zhiqi's approach uses a composite score of task completion, CSAT, and resolution time, but this is still a work in progress.

Adverse incentives: Outcome-based pricing could lead to perverse behaviors. A vendor might optimize for easy tasks that are quick to resolve, neglecting complex cases that require human intervention. NetEase Zhiqi has addressed this by weighting outcomes by difficulty, but the weighting algorithm itself introduces new risks of gaming.

Enterprise resistance: Many enterprises are accustomed to predictable, usage-based pricing. Outcome-based models introduce variable costs tied to performance, which can be difficult to budget for. Large enterprises with procurement departments may resist this shift.

Technical limitations: The tiered architecture works well for customer service and communication use cases, but may not generalize to other enterprise AI applications like code generation, document analysis, or data analytics. NetEase Zhiqi's model is tightly coupled to its communication platform, limiting its addressable market.

Ethical concerns: If AI agents are incentivized to close tickets quickly, they may cut corners—providing incomplete answers, misleading customers, or failing to escalate critical issues. NetEase Zhiqi has implemented guardrails and human-in-the-loop oversight, but the tension between efficiency and quality is inherent in outcome-based models.

AINews Verdict & Predictions

NetEase Zhiqi's move is bold, risky, and exactly what the enterprise AI market needs. The token-pricing model has created a misalignment of incentives: vendors profit when AI is inefficient, and customers bear all the risk of poor outcomes. Outcome-based pricing aligns incentives perfectly—vendors only get paid when customers get value.

Our predictions:
1. Within 18 months, at least three major US-based enterprise AI vendors will announce outcome-based pricing pilots. The competitive pressure from NetEase Zhiqi, combined with growing customer frustration over unpredictable AI costs, will force the market to move.
2. The open-source model ecosystem will see accelerated investment in efficiency. As outcome-based pricing gains traction, the ability to run models cheaply becomes a competitive advantage. Expect significant funding for startups focused on model compression, quantization, and edge inference.
3. A new category of "AI outcome auditor" will emerge. Third-party firms will specialize in verifying that AI vendors are delivering the outcomes they claim, similar to how SOC 2 audits verify security practices today.
4. NetEase Zhiqi will face a make-or-break moment in the next 12 months. If its early enterprise customers (like Ping An) renew and expand their contracts, the model will gain credibility. If churn is high, the industry will view outcome-based pricing as a marketing gimmick.

What to watch: The next earnings call from NetEase will be critical. Look for disclosure of customer retention rates, average contract value, and any mention of outcome-based pricing as a driver of revenue growth. Also watch for GitHub activity on InferEdge—if the community adoption accelerates, it signals that the technical approach is resonating.

NetEase Zhiqi has thrown down a gauntlet. The enterprise AI market will never be the same.

常见问题

这次公司发布“Why NetEase Zhiqi Ditched Token Pricing: AI Value Shifts From Compute To Outcomes”主要讲了什么？

For years, the enterprise AI market has operated on a simple premise: sell tokens, charge per query, and let customers figure out the value. NetEase Zhiqi, the B2B arm of NetEase…

从“NetEase Zhiqi outcome-based AI pricing explained”看，这家公司的这次发布为什么值得关注？

NetEase Zhiqi's architectural shift is deceptively simple in concept but technically demanding. The company has taken its existing real-time communication (RTC) and customer engagement platform—which handles voice, video…

围绕“how does NetEase Zhiqi measure AI task completion”，这次发布可能带来哪些后续影响？

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。