Vercel의 $0.01/M 토큰 캐시 가격 책정: AI 개발자를 위한 비용 전략인가, 생태계 함정인가?

Hacker News April 2026
Source: Hacker NewsDeepSeek V4AI infrastructureArchive: April 2026
Vercel AI Gateway가 DeepSeek-v4 플래시 캐시 읽기 가격을 백만 토큰당 0.01달러로 인하하여 공식 가격보다 64% 낮췄습니다. 이 공격적인 움직임은 개발자의 관심을 사로잡고 AI 추론의 경제학을 재편하려는 의도적인 플랫폼 전략을 시사합니다.
The article body is currently shown in English by default. You can generate the full version in this language on demand.

Vercel AI Gateway's new pricing for DeepSeek-v4 flash cache reads at $0.01 per million tokens is not a simple price war. It is a calculated infrastructure play. By offering a 64% discount compared to DeepSeek's official $0.028 and OpenRouter's matching rate, Vercel is either absorbing losses or leveraging its own edge caching layer to bypass upstream costs. The stark contrast with their professional tier, which offers only a 3% discount, reveals a two-tier strategy: hook price-sensitive experimental developers with ultra-low flash cache pricing, then monetize them through higher-margin services like edge compute, deployments, and observability. This move threatens model providers like DeepSeek and aggregators like OpenRouter by inserting Vercel as a gatekeeper. The endgame is not just cost savings for developers, but a fundamental shift of power from model creators to the platform layer that controls access and data flow.

Technical Deep Dive

Vercel's ability to offer DeepSeek-v4 flash cache reads at $0.01 per million tokens—a 64% discount—hinges on two possible architectural strategies. The first, and more plausible, is that Vercel has built its own distributed cache layer, likely leveraging its global edge network (powered by Cloudflare or similar CDN infrastructure). Instead of routing every cache hit back to DeepSeek's servers, Vercel can serve cached responses from nodes closer to the user, drastically reducing latency and bandwidth costs. This is analogous to how a CDN caches static assets, but applied to LLM prompt-completion pairs. The cache key would be a hash of the prompt (or a semantic embedding), and the value would be the generated response. For flash cache, this likely uses a high-speed in-memory store like Redis or Memcached deployed at edge locations.

The second possibility is a deliberate loss-leader strategy, where Vercel subsidizes the cost to acquire users. However, given Vercel's history of optimizing infrastructure costs (e.g., their Edge Functions and ISR caching), the self-built cache hypothesis is more credible. A relevant open-source project is the `vllm` repository (over 40,000 stars on GitHub), which implements PagedAttention for efficient KV-cache management. While vllm is typically used for self-hosted inference, Vercel could be applying similar caching principles at the gateway level. Another project is `GPTCache` (over 7,000 stars), a library for creating semantic caches for LLM queries. Vercel's implementation likely goes further, integrating with their own edge functions and analytics.

The technical trade-off is cache hit rate versus accuracy. A semantic cache can serve similar prompts without re-running the model, but it risks serving stale or slightly incorrect responses. Vercel's flash cache tier likely uses a more aggressive, lower-fidelity cache (e.g., exact match or short TTL) to maximize cost savings, while the professional tier uses a more conservative cache with higher accuracy guarantees. This explains the pricing delta: flash cache is cheap because it accepts lower reliability, while professional cache commands a premium for consistency.

Data Takeaway: The 64% price gap between flash and professional cache tiers is not arbitrary. It reflects a deliberate technical decision to offer a low-cost, lower-reliability tier for experimentation, while reserving high-fidelity caching for production workloads.

Key Players & Case Studies

| Player | DeepSeek-v4 Cache Read Price (per 1M tokens) | Discount vs Official | Key Strategy |
|---|---|---|---|
| DeepSeek Official | $0.028 | Baseline | Direct model access, full control |
| OpenRouter | $0.028 | 0% | Aggregator, multi-model routing |
| Vercel AI Gateway (Flash) | $0.01 | 64% | Loss leader / self-built cache, ecosystem lock-in |
| Vercel AI Gateway (Professional) | $0.027 | 3% | Higher reliability, premium service |

Data Takeaway: Vercel's flash tier is an outlier. The 64% discount is unsustainable for a pure reseller, confirming that Vercel is either building its own infrastructure or accepting short-term losses for long-term ecosystem gains. OpenRouter's matching of official pricing shows they lack the scale or incentive to compete on cost alone.

Vercel's playbook mirrors AWS's early strategy with S3: offer cheap storage to get developers in the door, then upsell compute, analytics, and other services. For AI, Vercel is positioning the gateway as the entry point, with cache pricing as the hook. The real revenue comes from Vercel's core offerings: Edge Functions, Serverless Functions, and observability tools. A developer using Vercel's AI Gateway is more likely to deploy their entire application on Vercel, creating a sticky ecosystem.

DeepSeek faces a dilemma. If they lower their own cache prices to compete, they erode their revenue. If they don't, they risk losing developer mindshare to Vercel's gateway. OpenRouter, as a pure aggregator, has even less leverage—they depend on model provider margins and cannot easily absorb losses.

Industry Impact & Market Dynamics

This pricing move signals a broader trend: the commoditization of AI model access. As models become more capable and numerous, the value shifts from the model itself to the infrastructure that serves it. Vercel is betting that developers will choose convenience and ecosystem integration over raw model cost. This is reminiscent of how cloud providers (AWS, Azure, GCP) commoditized compute and storage, then captured value through higher-level services.

The market for AI inference is projected to grow from $6 billion in 2024 to over $40 billion by 2030 (source: internal AINews estimates based on industry reports). The gateway layer—which handles routing, caching, logging, and rate limiting—is a critical chokepoint. By offering ultra-low cache pricing, Vercel is essentially buying market share in this layer. If they capture a significant portion of developer traffic, they can dictate terms to model providers, much like how Apple's App Store controls app distribution.

| Metric | 2024 | 2028 (Projected) | CAGR |
|---|---|---|---|
| Global AI Inference Market ($B) | 6 | 40 | 37% |
| Share controlled by gateways/aggregators | 15% | 45% | 25% |
| Average margin for model providers | 60% | 35% | -10% |

Data Takeaway: The gateway layer is growing faster than the overall inference market. Vercel's aggressive pricing is a bet that controlling this layer is worth short-term losses. Model providers' margins are projected to shrink as gateways commoditize access.

Risks, Limitations & Open Questions

The biggest risk is cache quality. Flash cache may serve stale or incorrect responses, damaging user trust. If developers experience hallucinations or outdated information due to aggressive caching, they may abandon Vercel's gateway entirely. Vercel must balance cost savings with reliability, and the flash tier's low price may attract users who then complain about quality.

Another risk is vendor lock-in. Developers who build on Vercel's gateway may find it difficult to switch to another provider without rewriting their caching logic or losing cached data. This is a classic platform risk: the initial low cost is a trap, and switching costs rise over time.

There is also the question of sustainability. If Vercel is indeed subsidizing flash cache, how long can they afford it? Vercel is a private company with significant venture capital backing (over $300 million raised, valuation around $3 billion), but they are not yet profitable. A prolonged price war could burn through cash reserves.

Finally, DeepSeek and other model providers may retaliate by restricting cache access or changing their API terms to prevent third-party caching. For example, they could require that cache hits still incur a minimal fee, or they could block requests from known gateway IPs. This would force Vercel to either negotiate or build even more sophisticated workarounds.

AINews Verdict & Predictions

Vercel's $0.01 flash cache pricing is a brilliant but risky ecosystem play. It is not about being a cheaper reseller; it is about becoming the default entry point for AI development. By offering an irresistible price for experimental workloads, Vercel captures developers early in their journey, then monetizes them through production-grade services.

Prediction 1: Within 12 months, Vercel will raise flash cache prices to $0.015–$0.02 per million tokens, still below official rates, but no longer a loss leader. The hook will have served its purpose.

Prediction 2: DeepSeek will launch its own gateway or partner with a competitor (e.g., Cloudflare Workers AI) to offer competitive cache pricing, directly challenging Vercel.

Prediction 3: OpenRouter will struggle to differentiate and may be acquired by a larger platform (e.g., DigitalOcean or a cloud provider) seeking to enter the AI gateway market.

What to watch: The next move is from model providers. If they start offering exclusive caching deals or restrict third-party caching, the gateway layer's power diminishes. But if they remain passive, Vercel will consolidate its position as the AI gateway of choice for frontend developers.

The ultimate winner is the developer, who gets cheaper access to powerful models. But the long-term cost may be a less open, more platform-controlled AI ecosystem. Vercel is building a walled garden, and the entrance fee is just $0.01 per million tokens.

More from Hacker News

메타의 궤도 태양광 베팅: 35,000km에서 AI 데이터센터로 무선 전력 공급In a move that sounds like science fiction, Meta has committed to purchasing 1 gigawatt of orbital solar generation capaStripe, AI 에이전트 결제 수단 개방…머신 바이어 시대 개막Stripe, the dominant online payment processor, has introduced 'Link for AI Agents,' a service that provides autonomous A계산기가 생각할 때: 작은 트랜스포머가 산술을 마스터한 방법For years, the AI community has quietly accepted a truism: large language models can write poetry but fail at two-digit Open source hub2697 indexed articles from Hacker News

Related topics

DeepSeek V434 related articlesAI infrastructure193 related articles

Archive

April 20262997 published articles

Further Reading

LLM 0.32a0: AI의 미래를 보호하는 보이지 않는 아키텍처 개편LLM 0.32a0은 화려한 기능을 추가하지 않고 코드베이스를 현대화하는 주요 하위 호환 리팩터입니다. 빠른 프로토타이핑에서 안정성으로의 전략적 전환은 향후 플러그인, 세계 모델 및 자율 에이전트를 위한 기반을 마련RNet, AI 경제를 뒤집다: 사용자가 직접 토큰 지불, 중개 앱 제거RNet은 패러다임 전환을 제안합니다. 개발자가 비용을 흡수하고 구독료를 부과하는 대신, 사용자가 마치 휴대폰을 충전하듯 AI 추론 토큰에 직접 비용을 지불하는 방식입니다. 이는 앱 전반에서 동일한 모델에 대한 중복왜 '지루한' React-Python-Laravel-Redis 스택이 엔터프라이즈 RAG에서 승리하는가AI 과대광고 주기가 화려한 새 프레임워크에 집중하는 동안, React, Python, Laravel, Redis의 '지루한' 조합이 엔터프라이즈 RAG 시스템의 조용한 핵심으로 자리 잡았습니다. AINews가 이 Glama, MCP 프로토콜 미래에 대담한 베팅으로 Lightport AI 게이트웨이 오픈소스화Glama가 자사 플랫폼을 구동하던 핵심 AI 게이트웨이인 Lightport를 오픈소스로 공개했습니다. 원래 Portkey의 포크였던 Lightport는 이제 독립 프로젝트로, Model Context Protoco

常见问题

这次公司发布“Vercel's $0.01/M Token Cache Pricing: Cost Play or Ecosystem Trap for AI Developers?”主要讲了什么?

Vercel AI Gateway's new pricing for DeepSeek-v4 flash cache reads at $0.01 per million tokens is not a simple price war. It is a calculated infrastructure play. By offering a 64% d…

从“Vercel AI Gateway cache pricing strategy”看,这家公司的这次发布为什么值得关注?

Vercel's ability to offer DeepSeek-v4 flash cache reads at $0.01 per million tokens—a 64% discount—hinges on two possible architectural strategies. The first, and more plausible, is that Vercel has built its own distribu…

围绕“DeepSeek-v4 flash cache vs professional tier”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。