حروب الوكلاء: لماذا معظم مساعدي الذكاء الاصطناعي محكوم عليهم بالفشل بسبب تحديث النموذج التالي

May 2026
AI agentArchive: May 2026
تحول سوق وكلاء الذكاء الاصطناعي إلى سباق وحشي من التشابه. يطلق موفرو النماذج ومنصات الفيديو وشركات المحتوى منتجات متطابقة تقريبًا — أغلفة رقيقة حول واجهات برمجة تطبيقات LLM. يكشف تحليلنا أن الجيل التالي من النماذج الأساسية سيجعل معظم هؤلاء الوكلاء قديمين.
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The AI Agent landscape is experiencing a severe case of commoditization. Hundreds of companies—from OpenAI and Google to ByteDance and Notion—have launched agents that perform the same core functions: web browsing, file processing, task orchestration. The vast majority are simple API wrappers with custom UI and prompt templates, offering zero technical differentiation. This homogeneity is dangerous because the underlying LLMs are rapidly absorbing agent capabilities. GPT-5 and Gemini Ultra 2 now natively handle function calling, multi-step reasoning, and tool use—the very features agents were supposed to provide. The result is a pincer movement: price wars from below among identical products, and obsolescence from above as base models improve. The only survivors will be agents that embed themselves into vertical workflows—enterprise compliance, medical records, industrial control—where general-purpose models cannot easily generalize. AINews identifies three defensible strategies: proprietary data pipelines, hardware integration, and regulatory lock-in. Without one of these, agent startups face extinction within 18 months.

Technical Deep Dive

The Thin Wrapper Problem

At its core, the vast majority of today's AI agents follow the same architectural pattern: a user-facing chat interface → an LLM API call (GPT-4o, Claude 3.5, Gemini) → a function-calling layer → external tool APIs (web search, file storage, calendar). The 'agentic' behavior is achieved through prompt engineering—specifically, system prompts that instruct the LLM to break down tasks, call functions, and maintain state. This is not a moat; it's a configuration file.

Consider the typical agent stack:

| Layer | Common Implementation | Differentiation Potential |
|---|---|---|
| UI | React/Next.js frontend | Low (cosmetic only) |
| Orchestration | LangChain, AutoGPT, or custom Python | Medium (workflow design) |
| LLM Backend | GPT-4o, Claude 3.5, Gemini 1.5 Pro | None (commodity API) |
| Tool Integration | REST APIs for Google, Slack, Notion | Low (standardized endpoints) |
| Memory/State | Vector DB (Pinecone, Chroma) | Low (open-source solutions) |

The only layer where real differentiation could exist is the orchestration logic—how the agent plans, executes, and recovers from errors. But even here, open-source frameworks like LangChain (70k+ GitHub stars) and AutoGPT (160k+ stars) have democratized the patterns. A startup's 'proprietary' orchestration is often just a tweaked version of these libraries.

The Foundation Model Cannibalization

The existential threat to agents is that foundation models are rapidly absorbing their value proposition. GPT-4o's function calling, released in June 2024, already allowed models to output structured JSON for tool invocation. GPT-5 (expected late 2025) reportedly integrates multi-step planning as a native capability—meaning the model itself can decompose a task like 'book a flight and hotel' into sub-steps without external orchestration. Google's Gemini Ultra 2 has demonstrated 'agentic' behavior directly in its API, including persistent memory and tool use without a wrapper.

| Capability | GPT-4 (2023) | GPT-4o (2024) | GPT-5 (Expected 2025) |
|---|---|---|---|
| Function Calling | Manual JSON output | Native structured output | Implicit planning |
| Multi-step Reasoning | Prompt-dependent | Improved chain-of-thought | Built-in task decomposition |
| Tool Use | Requires external code | API-level tool definitions | Self-discovered tools |
| Memory | Context window only | 128K context | Persistent, stateful sessions |

Data Takeaway: Each generation of foundation models eliminates a layer of the agent stack. By GPT-5, the 'orchestration' layer—the supposed core of an agent—becomes a model-level primitive. Startups that only provide orchestration are building on sand.

The GitHub Evidence

A scan of trending AI repositories reveals the commoditization. The most popular agent frameworks—CrewAI (25k stars), AutoGPT (160k stars), BabyAGI (20k stars)—are all variations of the same loop: plan, execute, observe, replan. None have a technical moat. Even Microsoft's Copilot Studio and OpenAI's own GPTs are essentially managed versions of these patterns. The real innovation is happening at the model layer (e.g., Meta's Llama 3.1 405B with native tool use) or at the application layer with proprietary data (e.g., Harvey for legal, Cursor for code).

Key Players & Case Studies

The Three Fronts of the Agent War

The market can be divided into three categories of entrants, each with distinct advantages and vulnerabilities.

| Category | Examples | Core Strategy | Vulnerability |
|---|---|---|---|
| Model Vendors | OpenAI (GPTs), Google (Vertex AI Agent Builder), Anthropic (Claude Agents) | Own the model; offer agents as upsell | Cannibalize own API revenue; agents are loss leaders |
| Video/Content Platforms | ByteDance (Doubao Agent), YouTube (AI summaries), Notion (Notion AI) | Leverage existing user base and content | Limited to platform; no cross-domain utility |
| Pure-Play Agent Startups | Adept AI, Cognition AI (Devin), MultiOn | Build best-in-class orchestration | No model or data moat; highest obsolescence risk |

Model Vendors: OpenAI's GPTs (launched November 2023) were the first major attempt to productize agents. Users can create custom agents with instructions, knowledge files, and tool integrations. But GPTs are essentially a thin UI over GPT-4—they have no unique capabilities beyond what the model already offers. Adoption has been lukewarm; most users still prefer the raw chat interface. Google's Vertex AI Agent Builder is more enterprise-focused, offering integration with Google Workspace and BigQuery. But it's still a wrapper—any improvements to Gemini directly reduce the need for the agent layer.

Video/Content Platforms: ByteDance's Doubao Agent is a fascinating case. It integrates deeply with Douyin (TikTok's Chinese version) and Toutiao, allowing users to create agents that search videos, summarize content, and even generate short clips. The moat here is the proprietary video index—no other agent can access Douyin's internal video metadata. Similarly, Notion AI's agent features are valuable because they operate on the user's own documents. These platform-bound agents have a defensible niche, but they are tethered to their parent ecosystem and cannot become general-purpose assistants.

Pure-Play Startups: Cognition AI's Devin (launched March 2024) was hailed as the first AI software engineer. It can plan, code, debug, and deploy. But its underlying model is still an API call to GPT-4 or Claude. The differentiation is in the specialized toolchain—terminal, browser, code editor—and the long-context memory. However, as models improve, the need for a separate agent for coding diminishes. GitHub Copilot is already adding agentic features directly into VS Code. Devin's window of advantage is narrow.

Case Study: The Rise and Stall of AutoGPT

AutoGPT exploded in popularity in April 2023, reaching 160k GitHub stars within weeks. It promised autonomous goal-achieving agents. But within six months, the project stalled. The reason: GPT-4's improvements made AutoGPT's prompt-based planning redundant. Users found that simply asking GPT-4 to 'research and write a report on X' worked as well as AutoGPT's multi-step loop. The project's maintainers admitted that the core loop was 'a workaround for limitations that no longer exist.' This is a cautionary tale for every agent startup.

Industry Impact & Market Dynamics

The Commoditization Spiral

The agent market is trapped in a race to the bottom. Because there is no technical differentiation, competition is purely on price and features. The result: a proliferation of free tiers, low subscription fees ($10–$30/month), and feature bloat. Most agents lose money on every user because API costs for multi-step tasks are high, but they cannot raise prices due to competition.

| Agent Product | Monthly Price | API Cost per User (est.) | Profitability |
|---|---|---|---|
| OpenAI GPTs | Free (with ChatGPT Plus) | $5–$15 | Negative (subsidized) |
| Google Vertex AI Agent | $0.10 per session | $0.08 | Thin margin |
| Adept AI | $30 | $15–$25 | Negative |
| Notion AI | $10 add-on | $3–$8 | Positive (bundled) |

Data Takeaway: Only platform-bundled agents (Notion, Google Workspace) can achieve profitability because the agent is a feature, not the product. Standalone agents are burning capital.

The Funding Reality

VCs have poured over $2 billion into agent startups since 2023, according to PitchBook data. But the returns are unclear. Adept AI raised $350M at a $1B valuation in March 2024, but has struggled to find product-market fit. Cognition AI raised $175M at a $2B valuation, but faces the same existential threat. The market is pricing these companies as if they will become the next operating system, but the technical reality is that they are thin wrappers.

The Second-Order Effect: API Revenue Erosion

Ironically, the biggest loser from the agent boom may be the model vendors themselves. As agents proliferate, they increase API usage—but they also commoditize the model layer. If every agent uses GPT-4o, then no agent has an advantage. OpenAI's own GPTs compete with third-party agents, creating a channel conflict. The long-term equilibrium may be that model vendors offer agents as a loss leader to drive API adoption for enterprise customers, while pure-play agents get squeezed out.

Risks, Limitations & Open Questions

The Hallucination Amplification Problem

Agents amplify the core weakness of LLMs: hallucination. A single-step query has a 5–10% hallucination rate; a multi-step agent that makes 10 API calls has a compounded error rate of 40–65%. No agent has solved this. The common approach—using a 'verifier' model to check outputs—adds cost and latency without eliminating errors. For high-stakes domains (legal, medical, finance), this is unacceptable.

The Security Nightmare

Agents require broad permissions: access to email, files, calendars, and external APIs. This creates a massive attack surface. A compromised agent could exfiltrate data or execute unauthorized actions. The industry has no standard for agent security. Most agents use OAuth tokens with excessive scopes. Until a robust agent security framework emerges (e.g., Google's Project Mariner's sandboxed browser), enterprise adoption will be limited.

The Open Question: Will Agents Become a Protocol, Not a Product?

The most interesting possibility is that 'agent' becomes a protocol layer—like HTTP for the web—rather than a product. Standards like the Model Context Protocol (MCP) from Anthropic and Agent-to-Agent (A2A) from Google are attempts to create interoperable agent communication. If these protocols become ubiquitous, the value shifts to the tools and data that agents access, not the agents themselves. This would further commoditize the agent layer.

AINews Verdict & Predictions

Prediction 1: 80% of Agent Startups Will Fail by 2027

The math is simple: no technical moat + foundation model improvement + price competition = extinction. The only survivors will be those with one of three defensible assets:

1. Proprietary Data: Agents that train on or access unique datasets (e.g., legal documents, medical records, industrial sensor data) that general models cannot replicate.
2. Hardware Integration: Agents embedded in devices (robots, AR glasses, smart home hubs) where the interface is the moat.
3. Regulatory Lock-in: Agents in regulated industries (healthcare HIPAA compliance, financial audits) where certification and compliance create barriers to entry.

Prediction 2: The Winner Will Be a Vertical Agent, Not a Horizontal One

Just as the SaaS market fragmented into vertical solutions (Salesforce for CRM, Workday for HR, Veeva for pharma), the agent market will fragment. The most valuable agent companies will be those that deeply understand a single industry's workflows, data, and regulations. Examples: Harvey (legal), Cursor (coding), Sierra (customer support). These companies have data flywheels—every interaction improves their models for that domain.

Prediction 3: Model Vendors Will Acquire, Not Build

OpenAI, Google, and Anthropic will eventually acquire the most successful vertical agents to fill gaps in their enterprise offerings. The acquisition price will be based on the data and customer relationships, not the technology. This is the only realistic exit for agent startups.

What to Watch

- MCP and A2A adoption rates: If these protocols become standard, agent startups lose their last shred of differentiation.
- GPT-5's agentic capabilities: If GPT-5 can natively perform multi-step tasks with tool use, the agent market collapses overnight.
- Enterprise adoption of vertical agents: Watch for large contracts in legal, healthcare, and manufacturing—the real test of defensibility.

The agent war is not a battle of technology; it's a battle of positioning. The winners will not be the best engineers. They will be the ones who carve out a niche that the foundation models cannot easily reach. Everyone else is building a feature, not a company.

Related topics

AI agent110 related articles

Archive

May 20261261 published articles

Further Reading

ما وراء الذكاء العام: لماذا سيهيمن المتخصصون الرأسيون في الذكاء الاصطناعي على الموجة القادمةتوصلت صناعة الذكاء الاصطناعي إلى إجماع خطير حول ما يجب تدريبه: نماذج أكبر، بيانات أكثر، قدرة حاسوبية أكبر. تجادل AINews الاكتتاب العام لشركة Ledong Robot: هل يمكن لأكبر بائع في العالم الهروب من فخ الربحية؟شركة Ledong Robot، المتوجة كقائد عالمي في شحنات المكانس الكهربائية الروبوتية، تقدمت بطلب للاكتتاب العام في هونغ كونغ. لكتبلور المعرفة: الخندق في عصر وكلاء الذكاء الاصطناعي المستقليندفع انفجار تكنولوجيا الوكلاء الذكاء الاصطناعي من 'القدرة على العمل' إلى 'معرفة كيفية العمل'، لكن عنق زجاجة خفي يظهر: يتفهاتف OpenAI الذكي السري: وعد ألتمان المكسور ومعركة الهيمنة على الذكاء الاصطناعيتطور OpenAI سرًا هاتفًا ذكيًا خاصًا بها، مما يتناقض بشكل مباشر مع النفي العلني السابق من الرئيس التنفيذي سام ألتمان. تمث

常见问题

这次模型发布“Agent Wars: Why Most AI Assistants Are Doomed by the Next Model Update”的核心内容是什么?

The AI Agent landscape is experiencing a severe case of commoditization. Hundreds of companies—from OpenAI and Google to ByteDance and Notion—have launched agents that perform the…

从“AI agent market commoditization”看,这个模型发布为什么重要?

At its core, the vast majority of today's AI agents follow the same architectural pattern: a user-facing chat interface → an LLM API call (GPT-4o, Claude 3.5, Gemini) → a function-calling layer → external tool APIs (web…

围绕“vertical agent vs horizontal agent”,这次模型更新对开发者和企业有什么影响?

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会,企业则会更关心可替代性、接入门槛和商业化落地空间。