Sturnus Open Source Smart Router Dynamically Picks Fastest LLM Provider

Hacker News June 2026
Source: Hacker NewsArchive: June 2026
Sturnus is an open-source intelligent routing proxy that continuously measures real-time latency across multiple OpenAI-compatible LLM providers and automatically routes each request to the fastest available backend. It requires zero code changes and promises to eliminate the headache of provider selection for developers.
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The proliferation of large language model providers has created a new operational challenge for developers: how to consistently achieve the lowest possible inference latency when multiple services offer compatible APIs. Sturnus, a newly discovered open-source tool, directly addresses this pain point. It acts as a lightweight proxy layer that sits between an application and several LLM backends (e.g., OpenAI, Anthropic, or local models), continuously pinging each endpoint for real-time latency and availability. On every request, Sturnus automatically selects the optimal route based on current dynamic performance data, not static weights. This is not simple load balancing; it is an intelligent routing strategy that adapts to congestion, degradation, or outright failures, seamlessly failing over to alternatives without any interruption to the application. The tool can be deployed via Docker or as a single binary, requiring zero modifications to existing codebases. Industry observers see this as a critical step in the maturation of LLM infrastructure, moving from single-vendor dependency to a multi-active routing paradigm—similar to how content delivery networks revolutionized web page load times. As the model ecosystem becomes increasingly fragmented, performance-first middleware like Sturnus is poised to become a standard component in production AI stacks, freeing developers from the agony of provider selection and letting them focus on building features.

Technical Deep Dive

Sturnus is architected as a transparent HTTP proxy that implements the OpenAI API specification. When an application sends a request to Sturnus, the proxy does not immediately forward it. Instead, it maintains a continuously updated latency map of all configured upstream providers. The core mechanism is a lightweight health-check and latency probe that runs on a configurable interval (default: every 5 seconds). Each probe sends a minimal request (e.g., a tiny completion or a ping to the `/v1/models` endpoint) to every provider and measures the round-trip time. These measurements are stored in a sliding window, and Sturnus applies a simple but effective selection algorithm: it picks the provider with the lowest average latency over the last N probes, provided the provider is healthy (i.e., it has not returned errors or timeouts in the recent window). If a provider fails consecutively, it is temporarily removed from the candidate pool and rechecked periodically.

From an engineering perspective, Sturnus avoids the complexity of full-blown API gateways like Kong or Envoy, which are heavy and require extensive configuration. Instead, it is a single Go binary (or Docker image) that can be deployed as a sidecar container alongside the application. The GitHub repository (sturnus/sturnus) has already garnered over 2,300 stars in its first weeks, indicating strong community interest. The codebase is clean and modular, with separate packages for probing, routing, and provider management. It supports all OpenAI-compatible endpoints, including chat completions, embeddings, and image generation, making it versatile for various use cases.

Performance benchmarks conducted by the community show that Sturnus introduces negligible overhead—typically less than 5 milliseconds of additional latency per request, which is dwarfed by the potential gains of routing to a faster provider. In a test with three providers (OpenAI, Anthropic, and a local vLLM instance), Sturnus reduced average response time by 35% compared to a static round-robin approach, and by 20% compared to always using the default provider.

| Metric | Static Round-Robin | Always Default Provider | Sturnus Smart Routing |
|---|---|---|---|
| Avg. Latency (ms) | 420 | 380 | 250 |
| P99 Latency (ms) | 890 | 720 | 480 |
| Failure Rate (%) | 2.1 | 1.5 | 0.3 |
| Overhead (ms) | 0 | 0 | 4.5 |

Data Takeaway: Sturnus delivers a 34% reduction in average latency and a 33% reduction in P99 latency compared to the best static strategy, with a failure rate five times lower. The overhead is minimal, making it a clear win for latency-sensitive applications.

Key Players & Case Studies

Sturnus is not alone in this space. Several commercial and open-source alternatives exist, but Sturnus differentiates itself by being purely latency-focused and provider-agnostic. The main competitors include:

- OpenRouter: A commercial service that aggregates multiple LLM providers and offers a unified API with automatic failover. However, it is a hosted service, meaning all traffic passes through OpenRouter's servers, introducing a fixed hop and potential privacy concerns. Sturnus, being self-hosted, keeps data within the user's infrastructure.
- LiteLLM: An open-source Python library that provides a unified interface for over 100 LLM providers. It supports load balancing and fallbacks, but it is a library, not a proxy, requiring code changes. Sturnus requires zero code changes.
- Portkey: A commercial AI gateway with observability, caching, and routing features. It is more feature-rich but also more complex and expensive. Sturnus is lightweight and free.
- Custom solutions: Many teams build their own routing logic using tools like Envoy or NGINX with custom Lua scripts. This is time-consuming and brittle. Sturnus offers a turnkey solution.

| Feature | Sturnus | OpenRouter | LiteLLM | Portkey |
|---|---|---|---|---|
| Deployment | Self-hosted (Docker/binary) | Hosted | Library | Hosted |
| Code Changes Required | None | None | Yes | None |
| Latency Optimization | Real-time probes | Static routing | Round-robin | Weighted |
| Privacy | Full control | Data leaves network | Full control | Data leaves network |
| Cost | Free | Pay-per-token (markup) | Free | Subscription |
| Open Source | Yes | No | Yes | No |

Data Takeaway: Sturnus occupies a unique niche: it is the only fully open-source, self-hosted, zero-code-change solution focused purely on real-time latency optimization. For developers who prioritize privacy and control, it is the most attractive option.

Industry Impact & Market Dynamics

The emergence of tools like Sturnus signals a fundamental shift in the LLM infrastructure stack. Just as CDNs abstracted away the complexity of global server selection for web content, intelligent routers are abstracting away the complexity of provider selection for AI inference. This is happening against a backdrop of explosive growth in the LLM provider market. According to recent estimates, the number of companies offering OpenAI-compatible APIs has grown from fewer than 10 in early 2023 to over 50 today, including hyperscalers (AWS Bedrock, GCP Vertex AI), startups (Together AI, Fireworks AI, DeepInfra), and open-source model hosts (Replicate, Hugging Face Inference Endpoints).

This fragmentation creates a classic 'paradox of choice' for developers. While having many options theoretically drives down costs and improves quality, the operational overhead of managing multiple providers—testing, monitoring, failover—often negates the benefits. Sturnus and similar tools solve this by automating the decision-making process, effectively turning a multi-provider strategy from a burden into a seamless advantage.

The market for AI middleware is projected to grow from $1.2 billion in 2024 to $8.5 billion by 2028, according to industry forecasts. This includes API gateways, observability platforms, and routing tools. Sturnus, as an open-source project, is unlikely to capture significant direct revenue, but it will accelerate adoption of multi-provider architectures, benefiting the entire ecosystem. Larger players like Portkey and OpenRouter may feel pressure to offer more transparent latency-based routing or risk being outflanked by simpler, free alternatives.

| Metric | 2023 | 2024 | 2025 (est.) |
|---|---|---|---|
| Number of LLM API Providers | ~10 | ~30 | ~60 |
| Avg. API Price per 1M tokens (GPT-4 class) | $30 | $15 | $8 |
| % of Production Apps Using >1 Provider | 12% | 28% | 45% |
| AI Middleware Market Size ($B) | 0.5 | 1.2 | 2.8 |

Data Takeaway: The rapid increase in provider count and the simultaneous drop in prices are creating a perfect storm for multi-provider adoption. Middleware that simplifies this complexity, like Sturnus, is becoming essential infrastructure.

Risks, Limitations & Open Questions

Despite its promise, Sturnus is not without risks and limitations. First, the latency measurement itself is a form of overhead. While the benchmark shows it is minimal, in extremely latency-sensitive applications (e.g., real-time voice assistants), every millisecond matters. The probing interval also introduces a trade-off: too frequent probes increase overhead and may trigger rate limits on free-tier APIs; too infrequent probes mean the latency data is stale, leading to suboptimal routing.

Second, Sturnus currently supports only OpenAI-compatible APIs. While this covers the vast majority of providers, it excludes proprietary protocols like Anthropic's native API (though Anthropic also offers an OpenAI-compatible endpoint via Amazon Bedrock) or Google's Vertex AI. This limits its applicability for teams that want to use non-standard providers.

Third, there is an ethical and economic question: by constantly probing and switching providers, Sturnus could encourage a 'race to the bottom' on pricing, where providers are forced to compete solely on latency, potentially sacrificing reliability or safety. Providers might also implement anti-scraping measures to detect and block such routing proxies.

Finally, Sturnus does not currently handle more sophisticated routing criteria like cost optimization, content filtering, or model capability matching. A developer might want to route simple queries to a cheap, fast model and complex reasoning tasks to a more capable but slower model. Sturnus is purely latency-focused, which is a limitation for production use cases that require nuanced routing policies.

AINews Verdict & Predictions

Sturnus is a well-executed tool that solves a genuine and growing pain point. Its simplicity—zero code changes, lightweight deployment, focus on a single metric—is its greatest strength. In a world where every millisecond of latency can impact user engagement and revenue, the ability to automatically route to the fastest provider is a no-brainer for many applications.

Prediction 1: Within 12 months, Sturnus will be integrated into the default deployment templates of major cloud platforms (AWS, GCP, Azure) as a recommended sidecar for LLM-powered applications. The cloud providers have a vested interest in making multi-provider architectures easy, as it reduces vendor lock-in and increases overall cloud consumption.

Prediction 2: Sturnus will spawn a wave of forks and extensions that add cost-aware routing, model capability matching, and caching. The core latency-focused engine will become a building block for more comprehensive AI gateways.

Prediction 3: Commercial providers like OpenRouter and Portkey will respond by offering free tiers or open-sourcing parts of their routing logic to compete with Sturnus. The market will bifurcate into simple, free, self-hosted routers (like Sturnus) and complex, paid, managed gateways (like Portkey).

What to watch: The Sturnus GitHub repository's star count and commit frequency. If the community rapidly adds features like cost optimization and multi-model routing, it could become the de facto standard. If development stalls, it will remain a niche tool for latency-obsessed developers. Either way, the concept of intelligent LLM routing is here to stay.

More from Hacker News

UntitledAINews has identified a critical breakthrough in the evolution of AI coding agents: PMB, a persistent memory system builUntitledAINews has uncovered OpenPlan, a novel infrastructure layer that functions as a real-time navigation system for AI agentUntitledDeepMind's newly published 'AI Control Roadmap' is a technical blueprint for governing autonomous agents. As AI agents bOpen source hub5068 indexed articles from Hacker News

Archive

June 20262189 published articles

Further Reading

Jak LLM Router przekształca ekonomię programowania AI dzięki inteligentnej orkiestracji modeliNowy projekt open source o nazwie LLM Router fundamentalnie zmienia ekonomię programowania wspomaganego przez AI. DziałaWzrost znaczenia routerów LLM: Jak inteligentna orkiestracja redefiniuje architekturę AIW rozwoju aplikacji AI zachodzi fundamentalna zmiana architektoniczna. Zamiast dążyć do jednego, wszechmocnego modelu, iPMB Gives AI Coding Agents Permanent Memory with SQLite and Local-First DesignPMB introduces a local-first persistent memory system for AI coding agents, leveraging SQLite for structured storage andOpenPlan: The Waze for AI Agents That Solves Multi-Agent Traffic JamsOpenPlan is emerging as a real-time navigation layer for AI agents, borrowing Waze's crowdsourced logic to optimize mult

常见问题

GitHub 热点“Sturnus Open Source Smart Router Dynamically Picks Fastest LLM Provider”主要讲了什么?

The proliferation of large language model providers has created a new operational challenge for developers: how to consistently achieve the lowest possible inference latency when m…

这个 GitHub 项目在“Sturnus vs OpenRouter latency comparison”上为什么会引发关注?

Sturnus is architected as a transparent HTTP proxy that implements the OpenAI API specification. When an application sends a request to Sturnus, the proxy does not immediately forward it. Instead, it maintains a continuo…

从“how to deploy Sturnus with Docker compose”看,这个 GitHub 项目的热度表现如何?

当前相关 GitHub 项目总星标约为 0,近一日增长约为 0,这说明它在开源社区具有较强讨论度和扩散能力。