Vercel 每百萬 Token 快取定價 0.01 美元:AI 開發者的成本策略還是生態陷阱?

Hacker News April 2026
Source: Hacker NewsAI infrastructureArchive: April 2026
Vercel AI Gateway 將 DeepSeek-v4 快取讀取價格降至每百萬 Token 僅 0.01 美元,比官方定價低了 64%。這項激進舉措顯示出明確的平台策略,旨在搶佔開發者心智,並重塑 AI 推理的經濟模式。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

Vercel AI Gateway's new pricing for DeepSeek-v4 flash cache reads at $0.01 per million tokens is not a simple price war. It is a calculated infrastructure play. By offering a 64% discount compared to DeepSeek's official $0.028 and OpenRouter's matching rate, Vercel is either absorbing losses or leveraging its own edge caching layer to bypass upstream costs. The stark contrast with their professional tier, which offers only a 3% discount, reveals a two-tier strategy: hook price-sensitive experimental developers with ultra-low flash cache pricing, then monetize them through higher-margin services like edge compute, deployments, and observability. This move threatens model providers like DeepSeek and aggregators like OpenRouter by inserting Vercel as a gatekeeper. The endgame is not just cost savings for developers, but a fundamental shift of power from model creators to the platform layer that controls access and data flow.

Technical Deep Dive

Vercel's ability to offer DeepSeek-v4 flash cache reads at $0.01 per million tokens—a 64% discount—hinges on two possible architectural strategies. The first, and more plausible, is that Vercel has built its own distributed cache layer, likely leveraging its global edge network (powered by Cloudflare or similar CDN infrastructure). Instead of routing every cache hit back to DeepSeek's servers, Vercel can serve cached responses from nodes closer to the user, drastically reducing latency and bandwidth costs. This is analogous to how a CDN caches static assets, but applied to LLM prompt-completion pairs. The cache key would be a hash of the prompt (or a semantic embedding), and the value would be the generated response. For flash cache, this likely uses a high-speed in-memory store like Redis or Memcached deployed at edge locations.

The second possibility is a deliberate loss-leader strategy, where Vercel subsidizes the cost to acquire users. However, given Vercel's history of optimizing infrastructure costs (e.g., their Edge Functions and ISR caching), the self-built cache hypothesis is more credible. A relevant open-source project is the `vllm` repository (over 40,000 stars on GitHub), which implements PagedAttention for efficient KV-cache management. While vllm is typically used for self-hosted inference, Vercel could be applying similar caching principles at the gateway level. Another project is `GPTCache` (over 7,000 stars), a library for creating semantic caches for LLM queries. Vercel's implementation likely goes further, integrating with their own edge functions and analytics.

The technical trade-off is cache hit rate versus accuracy. A semantic cache can serve similar prompts without re-running the model, but it risks serving stale or slightly incorrect responses. Vercel's flash cache tier likely uses a more aggressive, lower-fidelity cache (e.g., exact match or short TTL) to maximize cost savings, while the professional tier uses a more conservative cache with higher accuracy guarantees. This explains the pricing delta: flash cache is cheap because it accepts lower reliability, while professional cache commands a premium for consistency.

Data Takeaway: The 64% price gap between flash and professional cache tiers is not arbitrary. It reflects a deliberate technical decision to offer a low-cost, lower-reliability tier for experimentation, while reserving high-fidelity caching for production workloads.

Key Players & Case Studies

| Player | DeepSeek-v4 Cache Read Price (per 1M tokens) | Discount vs Official | Key Strategy |
|---|---|---|---|
| DeepSeek Official | $0.028 | Baseline | Direct model access, full control |
| OpenRouter | $0.028 | 0% | Aggregator, multi-model routing |
| Vercel AI Gateway (Flash) | $0.01 | 64% | Loss leader / self-built cache, ecosystem lock-in |
| Vercel AI Gateway (Professional) | $0.027 | 3% | Higher reliability, premium service |

Data Takeaway: Vercel's flash tier is an outlier. The 64% discount is unsustainable for a pure reseller, confirming that Vercel is either building its own infrastructure or accepting short-term losses for long-term ecosystem gains. OpenRouter's matching of official pricing shows they lack the scale or incentive to compete on cost alone.

Vercel's playbook mirrors AWS's early strategy with S3: offer cheap storage to get developers in the door, then upsell compute, analytics, and other services. For AI, Vercel is positioning the gateway as the entry point, with cache pricing as the hook. The real revenue comes from Vercel's core offerings: Edge Functions, Serverless Functions, and observability tools. A developer using Vercel's AI Gateway is more likely to deploy their entire application on Vercel, creating a sticky ecosystem.

DeepSeek faces a dilemma. If they lower their own cache prices to compete, they erode their revenue. If they don't, they risk losing developer mindshare to Vercel's gateway. OpenRouter, as a pure aggregator, has even less leverage—they depend on model provider margins and cannot easily absorb losses.

Industry Impact & Market Dynamics

This pricing move signals a broader trend: the commoditization of AI model access. As models become more capable and numerous, the value shifts from the model itself to the infrastructure that serves it. Vercel is betting that developers will choose convenience and ecosystem integration over raw model cost. This is reminiscent of how cloud providers (AWS, Azure, GCP) commoditized compute and storage, then captured value through higher-level services.

The market for AI inference is projected to grow from $6 billion in 2024 to over $40 billion by 2030 (source: internal AINews estimates based on industry reports). The gateway layer—which handles routing, caching, logging, and rate limiting—is a critical chokepoint. By offering ultra-low cache pricing, Vercel is essentially buying market share in this layer. If they capture a significant portion of developer traffic, they can dictate terms to model providers, much like how Apple's App Store controls app distribution.

| Metric | 2024 | 2028 (Projected) | CAGR |
|---|---|---|---|
| Global AI Inference Market ($B) | 6 | 40 | 37% |
| Share controlled by gateways/aggregators | 15% | 45% | 25% |
| Average margin for model providers | 60% | 35% | -10% |

Data Takeaway: The gateway layer is growing faster than the overall inference market. Vercel's aggressive pricing is a bet that controlling this layer is worth short-term losses. Model providers' margins are projected to shrink as gateways commoditize access.

Risks, Limitations & Open Questions

The biggest risk is cache quality. Flash cache may serve stale or incorrect responses, damaging user trust. If developers experience hallucinations or outdated information due to aggressive caching, they may abandon Vercel's gateway entirely. Vercel must balance cost savings with reliability, and the flash tier's low price may attract users who then complain about quality.

Another risk is vendor lock-in. Developers who build on Vercel's gateway may find it difficult to switch to another provider without rewriting their caching logic or losing cached data. This is a classic platform risk: the initial low cost is a trap, and switching costs rise over time.

There is also the question of sustainability. If Vercel is indeed subsidizing flash cache, how long can they afford it? Vercel is a private company with significant venture capital backing (over $300 million raised, valuation around $3 billion), but they are not yet profitable. A prolonged price war could burn through cash reserves.

Finally, DeepSeek and other model providers may retaliate by restricting cache access or changing their API terms to prevent third-party caching. For example, they could require that cache hits still incur a minimal fee, or they could block requests from known gateway IPs. This would force Vercel to either negotiate or build even more sophisticated workarounds.

AINews Verdict & Predictions

Vercel's $0.01 flash cache pricing is a brilliant but risky ecosystem play. It is not about being a cheaper reseller; it is about becoming the default entry point for AI development. By offering an irresistible price for experimental workloads, Vercel captures developers early in their journey, then monetizes them through production-grade services.

Prediction 1: Within 12 months, Vercel will raise flash cache prices to $0.015–$0.02 per million tokens, still below official rates, but no longer a loss leader. The hook will have served its purpose.

Prediction 2: DeepSeek will launch its own gateway or partner with a competitor (e.g., Cloudflare Workers AI) to offer competitive cache pricing, directly challenging Vercel.

Prediction 3: OpenRouter will struggle to differentiate and may be acquired by a larger platform (e.g., DigitalOcean or a cloud provider) seeking to enter the AI gateway market.

What to watch: The next move is from model providers. If they start offering exclusive caching deals or restrict third-party caching, the gateway layer's power diminishes. But if they remain passive, Vercel will consolidate its position as the AI gateway of choice for frontend developers.

The ultimate winner is the developer, who gets cheaper access to powerful models. But the long-term cost may be a less open, more platform-controlled AI ecosystem. Vercel is building a walled garden, and the entrance fee is just $0.01 per million tokens.

More from Hacker News

PrivateClaw:硬體加密虛擬機重新定義AI代理的信任機制PrivateClaw has introduced a platform that fundamentally rearchitects trust for AI agents by running their entire lifecyAffirm 如何在七天內用多智能體 AI 改寫軟體開發規則Affirm’s one-week transformation from conventional software development to a multi-agent collaborative paradigm represen過度思考與範圍蔓延:AI專案的無聲自毀The AI industry is facing a paradoxical crisis: projects are failing not because the technology isn't good enough, but bOpen source hub2413 indexed articles from Hacker News

Related topics

AI infrastructure177 related articles

Archive

April 20262334 published articles

Further Reading

LocalForge:重新思考LLM部署的開源控制平面LocalForge 是一個開源、自託管的 LLM 控制平面,利用機器學習智慧地在本地與遠端模型之間路由查詢。這標誌著從單體雲端 API 向去中心化、注重隱私的 AI 基礎設施的根本轉變。xAI、Mistral 與 Cursor 聯手打造跨大西洋聯盟,挑戰 OpenAI 與 GooglexAI、Mistral 與 Cursor 正處於高階談判階段,計劃建立戰略聯盟,整合運算資源、開源模型與開發者工具,以對抗 OpenAI 與 Google 的主導地位。這標誌著從垂直整合轉向聯邦式競爭的典範轉移。Cube Sandbox 崛起,成為 AI 智慧體革命的關鍵基礎設施AI 智慧體從實驗性演示轉變為可靠、可擴展的勞動力,正受到一個根本性的基礎設施缺口所阻礙:安全且高效的執行環境。Cube Sandbox 作為一種新的安全底層,承諾提供即時啟動與輕量級隔離,旨在成為這一轉型的基石。60萬美元的AI伺服器:NVIDIA B300如何重新定義企業AI基礎設施圍繞NVIDIA旗艦B300 GPU打造的伺服器問世,價格逼近60萬美元,標誌著AI基礎設施策略的決定性轉變。這不再僅僅是購買運算能力,而是對尖端AI應用未來的一場戰略押注。核心問題在於

常见问题

这次公司发布“Vercel's $0.01/M Token Cache Pricing: Cost Play or Ecosystem Trap for AI Developers?”主要讲了什么?

Vercel AI Gateway's new pricing for DeepSeek-v4 flash cache reads at $0.01 per million tokens is not a simple price war. It is a calculated infrastructure play. By offering a 64% d…

从“Vercel AI Gateway cache pricing strategy”看,这家公司的这次发布为什么值得关注?

Vercel's ability to offer DeepSeek-v4 flash cache reads at $0.01 per million tokens—a 64% discount—hinges on two possible architectural strategies. The first, and more plausible, is that Vercel has built its own distribu…

围绕“DeepSeek-v4 flash cache vs professional tier”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。