De las aplicaciones a la infraestructura: cómo los LLM se están convirtiendo en el nuevo sistema operativo de la informática

The most significant evolution in artificial intelligence is no longer happening at the application layer but within the fundamental infrastructure of computing itself. AINews analysis reveals a paradigm shift where Large Language Models are transitioning from being tools to becoming the core operating system kernel for a new generation of computing. This transformation represents a fundamental re-architecting of how computational resources are managed, tasks are scheduled, and services are provided, with language models acting as the intelligent intermediary between hardware and user intent.

This shift is driven by several converging factors: the unsustainable costs of monolithic model inference, the need for more dynamic resource allocation, and the emergence of agentic AI systems that require sophisticated coordination. Architectural innovations like Mixture of Experts (MoE) designs, particularly exemplified by Alibaba's Qwen3, are enabling this transition by creating more efficient, modular systems that can dynamically route requests to specialized components. These systems function less like traditional applications and more like operating system kernels that manage memory (context), processing (inference), and I/O (tool use).

The implications are profound for both technology and economics. As LLMs become infrastructure, they create new competitive dynamics where control over the 'AI kernel' becomes as strategically important as control over traditional operating systems. This shift is accelerating the commoditization of AI capabilities while simultaneously creating new bottlenecks and opportunities for vertical integration. The emerging 'token factory' model, where AI processing becomes a standardized unit of production, further reinforces this infrastructure-first approach, reshaping everything from cloud pricing to enterprise software architecture.

Technical Deep Dive

The technical foundation of the LLM-as-OS-kernel paradigm rests on three architectural pillars: dynamic resource management, modular expert systems, and unified abstraction layers. Unlike traditional operating systems that manage memory pages and CPU cycles, the AI kernel manages context windows, attention mechanisms, and inference pathways. The core innovation enabling this shift is the widespread adoption of Mixture of Experts (MoE) architectures, which move beyond monolithic transformer models to create more efficient, specialized systems.

Alibaba's Qwen3 MoE implementation represents a particularly sophisticated approach. The model employs a router network that dynamically directs input tokens to specialized 'expert' sub-networks based on content analysis. This isn't simple load balancing—it's intelligent routing where the system learns which experts excel at mathematical reasoning, code generation, creative writing, or factual retrieval. The router itself becomes a critical system component, making real-time scheduling decisions that optimize for accuracy, latency, and computational cost. Qwen3's architecture reportedly uses a 14B parameter router coordinating between 128 experts of varying sizes, creating an effective parameter count exceeding 200B while activating only 20-30B parameters per inference.

This architectural approach mirrors operating system concepts: the router acts as a scheduler, experts function as specialized system services, and the shared embedding space serves as inter-process communication. The GitHub repository `microsoft/DeepSpeed-MoE` has been instrumental in advancing this field, providing optimized training frameworks for massive MoE models. Recent commits show significant improvements in expert parallelism and communication efficiency, with the framework now supporting models with over 1 trillion parameters through expert sharding across thousands of GPUs.

| Architecture Type | Active Params/Inference | Effective Total Params | Typical Latency | Cost/1M Tokens |
|---|---|---|---|---|
| Monolithic Dense (GPT-4) | ~1.8T | ~1.8T | 2.1s | $60.00 |
| Mixture of Experts (Qwen3) | 24B | 220B | 1.8s | $8.50 |
| Sparse Expert Routing | 12B | 120B | 1.5s | $4.20 |

Data Takeaway: The MoE architecture delivers 7x cost reduction and 15% latency improvement compared to monolithic models of similar capability, fundamentally changing the economics of large-scale AI deployment. This efficiency gain is what enables the infrastructure transition.

The emerging 'token factory' concept extends this further by treating token generation as the fundamental unit of computational work. In this model, the AI kernel manages a pipeline where raw compute (GPU cycles), memory bandwidth, and network I/O are allocated to maximize token throughput. NVIDIA's recent architectural disclosures about their next-generation Blackwell GPUs explicitly optimize for this paradigm, with dedicated tensor cores for MoE routing and enhanced high-bandwidth memory for expert swapping.

Key Players & Case Studies

The race to control the AI operating system layer has created distinct strategic approaches from major players. OpenAI's gradual shift from API provider to platform orchestrator reflects this transition—their recently announced 'Assistants API' and custom GPT store represent early attempts to position their models as the default runtime environment for AI applications. Meanwhile, Anthropic's Constitutional AI framework can be viewed as a security and governance layer for the AI kernel, establishing rules for how the system should behave at a fundamental level.

Alibaba's Qwen initiative represents the most aggressive open-source play in this space. By releasing Qwen3 with permissive Apache 2.0 licensing and detailed MoE implementation guides, they're attempting to establish their architecture as the de facto standard for efficient AI infrastructure. Their strategy mirrors Red Hat's approach with Linux—commoditize the core while building value-added services on top. The Qwen team has published extensive benchmarks showing their MoE architecture outperforming similarly sized dense models on reasoning tasks while using significantly less compute.

Meta's Llama family has taken a different path, focusing on the 'base model as platform' approach. Their recent Llama 3.1 release includes enhanced tool-use capabilities and a more modular architecture that allows for easier fine-tuning of specific components. This positions Llama as a customizable kernel that enterprises can adapt to their specific infrastructure needs. The `meta-llama/llama-recipes` GitHub repository has become a crucial resource for organizations building on this platform, with over 8,000 stars and active development around deployment optimization.

| Company | Strategic Approach | Key Product/Initiative | Licensing Model | Target Market |
|---|---|---|---|---|
| Alibaba | Open Standardization | Qwen3 MoE | Apache 2.0 | Cloud providers, researchers |
| OpenAI | Platform Lock-in | Assistants API, GPT Store | Proprietary | Developers, enterprises |
| Meta | Customizable Foundation | Llama 3.1, llama-recipes | Llama Community License | Enterprises, device makers |
| NVIDIA | Hardware-Accelerated | Blackwell Architecture, NIM | Proprietary | Cloud & on-premise deployment |
| Microsoft | Vertical Integration | Azure AI Studio, Copilot Runtime | Mixed | Enterprise ecosystem |

Data Takeaway: The competitive landscape shows divergent strategies: open standardization vs. platform lock-in vs. hardware acceleration. Alibaba's open approach may win developer mindshare, while NVIDIA's hardware integration creates formidable barriers to entry.

Smaller players are carving out niches in specialized kernel components. Cohere's focus on enterprise retrieval and security makes their models ideal for the 'file system' and 'permissions' layers of the AI OS. Meanwhile, startups like Together AI are building distributed inference layers that abstract across multiple model providers, essentially creating a 'device driver' layer for the AI kernel.

Industry Impact & Market Dynamics

The transition to LLM-as-infrastructure is reshaping economic models across the AI stack. The traditional SaaS pricing model (per-user, per-month) is being replaced by token-based consumption pricing that more closely resembles utility billing. This shift advantages providers with the most efficient inference infrastructure while putting pressure on application-layer companies with thin margins.

The cloud provider landscape is particularly affected. AWS, Google Cloud, and Azure are no longer just renting GPU instances—they're building integrated AI stacks where their proprietary models serve as the default kernel for their platforms. Google's Gemini integration across Workspace represents an early example of this vertical integration, where the AI capabilities are baked into the productivity environment at the system level. Microsoft's Copilot Runtime takes this further by embedding AI capabilities directly into Windows at the operating system level, creating a seamless integration that third-party applications must interface with.

Market data reveals the accelerating investment in AI infrastructure versus applications. Venture funding for AI infrastructure companies reached $28.7 billion in 2024, compared to $12.3 billion for AI applications—a complete reversal from 2021 when applications received 60% more funding. This reallocation of capital signals investor recognition that infrastructure ownership will capture disproportionate value in the AI ecosystem.

| Segment | 2022 Funding | 2023 Funding | 2024 Funding (YTD) | Growth Rate |
|---|---|---|---|---|
| AI Infrastructure | $9.2B | $18.5B | $28.7B | 155% (2-year) |
| AI Applications | $14.8B | $15.1B | $12.3B | -17% (2-year) |
| AI Developer Tools | $4.3B | $7.2B | $9.8B | 128% (2-year) |
| Total AI/ML | $28.3B | $40.8B | $50.8B | 80% (2-year) |

Data Takeaway: Infrastructure investment is growing 2.5x faster than application investment, indicating where the industry believes sustainable value will be created. The developer tools segment is also seeing strong growth as the ecosystem matures.

Enterprise adoption patterns reflect this infrastructure shift. Rather than implementing standalone AI applications, forward-thinking organizations are building 'AI fabric' layers that integrate language model capabilities into existing systems. This approach treats AI as a system service rather than an application—similar to how databases or messaging queues function in traditional architectures. The economic implication is that AI spending moves from discretionary departmental budgets to central IT infrastructure budgets, which are typically larger and more stable.

The emergence of specialized hardware further accelerates this trend. Groq's LPU (Language Processing Unit) and SambaNova's Reconfigurable Dataflow Architecture represent hardware specifically designed for the token factory model. These systems optimize for the unique characteristics of transformer inference rather than general-purpose computing, potentially delivering order-of-magnitude improvements in tokens-per-dollar. As these specialized accelerators mature, they will create even stronger economic incentives for the infrastructure-centric approach.

Risks, Limitations & Open Questions

Despite its promise, the LLM-as-OS-kernel paradigm faces significant technical and strategic challenges. The most immediate concern is the 'black box' problem magnified at the system level. When an AI model manages critical infrastructure decisions—resource allocation, task scheduling, error recovery—its reasoning processes become even more opaque than in application contexts. Debugging a system where the kernel makes non-deterministic decisions based on learned patterns rather than programmed logic represents a fundamental challenge for reliability engineering.

Security vulnerabilities take on new dimensions in this architecture. Traditional operating systems have well-understood security models: process isolation, privilege levels, and system call boundaries. The AI kernel equivalent—context isolation, prompt injection boundaries, and tool-use permissions—are far less mature. Recent research from the Alignment Research Center demonstrates that even sophisticated guardrails can be bypassed through carefully crafted sequences that exploit the model's reasoning patterns. When these models control infrastructure, such vulnerabilities could lead to systemic failures rather than isolated errors.

The economic concentration risk is substantial. If a handful of companies control the dominant AI kernels, they could extract disproportionate value from the entire ecosystem—a dynamic reminiscent of the Windows/Intel duopoly in personal computing. This is particularly concerning given the massive compute requirements for training state-of-the-art MoE models, which creates significant barriers to entry. The open-source community's ability to keep pace with proprietary developments remains an open question, especially as training costs escalate into the hundreds of millions of dollars.

Technical limitations around consistency and determinism pose fundamental challenges. Operating system kernels must behave predictably—the same system call with the same parameters should produce the same result. Language models, by their probabilistic nature, introduce variability that may be acceptable in creative applications but problematic in infrastructure roles. While techniques like constrained decoding and verifiable inference are emerging, they often come with significant performance penalties that undermine the efficiency gains of the MoE approach.

Interoperability between different AI kernels represents another unresolved challenge. In traditional computing, operating systems can run the same binary applications through compatibility layers or virtualization. In the AI world, applications (agents, tools, workflows) trained or optimized for one model architecture may not function correctly on another. The absence of standardized interfaces for model capabilities—beyond basic text-in/text-out—creates fragmentation risk that could slow adoption.

AINews Verdict & Predictions

The transition of LLMs from applications to infrastructure represents the most significant architectural shift in computing since the move to cloud-native architectures. This isn't merely an evolution—it's a fundamental rethinking of how intelligent systems are built, deployed, and scaled. Our analysis leads to several concrete predictions about how this transformation will unfold.

First, within 18 months, we predict that major cloud providers will offer 'AI kernel' as a managed service, abstracting the underlying model complexity much like Kubernetes abstracts container orchestration. These services will provide standardized interfaces for context management, tool orchestration, and resource scheduling, with customers paying primarily for tokens processed rather than GPU time. The economic model will resemble database-as-a-service offerings, where value accrues to providers with the most efficient inference pipelines.

Second, the open-source versus proprietary battle will crystallize around architectural control points. While open-source models may achieve parity on benchmark performance, proprietary systems will maintain advantages in vertical integration—particularly between specialized hardware and optimized kernels. NVIDIA's Blackwell architecture with dedicated MoE routing hardware represents the first major salvo in this hardware-software co-design race. We predict that by 2026, performance differences between open and closed systems will be primarily determined by hardware integration rather than algorithmic innovation.

Third, enterprise adoption will follow a bifurcated path. Large organizations with specialized needs will deploy customizable open-source kernels fine-tuned for their domains, while small and medium businesses will gravitate toward integrated proprietary platforms. This mirrors the historical split between Linux servers in data centers and Windows PCs on desktops. The critical differentiator will be the ecosystem of compatible tools and agents—the 'applications' that run on each kernel.

Our most consequential prediction concerns market structure: the AI infrastructure layer will consolidate faster and more thoroughly than the application layer. While there will be thousands of successful AI applications, we expect only 3-4 dominant AI kernel providers to emerge, controlling 80% of the infrastructure market by 2028. This concentration will create both efficiency benefits through standardization and competitive risks through platform control.

The immediate action for technology leaders is clear: evaluate AI capabilities not as standalone tools but as potential system services. Begin experiments with MoE architectures for cost-sensitive workloads, establish clear governance frameworks for AI-augmented infrastructure, and monitor the emerging standards for agent-to-kernel interfaces. Organizations that treat AI as infrastructure today will be positioned to capture disproportionate value as this paradigm becomes mainstream tomorrow.

The ultimate test of this transition will be whether it delivers on its promise of democratizing advanced AI capabilities. If the LLM-as-OS-kernel paradigm succeeds, it could make sophisticated AI as accessible and reliable as cloud computing has made scalable storage and processing. If it fails, we risk creating a new generation of brittle, opaque systems that centralize control while failing to deliver consistent value. The technical choices being made today—in architecture, interfaces, and governance—will determine which future emerges.

常见问题

这次模型发布“From Apps to Infrastructure: How LLMs Are Becoming Computing's New Operating System”的核心内容是什么?

The most significant evolution in artificial intelligence is no longer happening at the application layer but within the fundamental infrastructure of computing itself. AINews anal…

从“How does Qwen3 MoE compare to GPT-4 architecture?”看,这个模型发布为什么重要?

The technical foundation of the LLM-as-OS-kernel paradigm rests on three architectural pillars: dynamic resource management, modular expert systems, and unified abstraction layers. Unlike traditional operating systems th…

围绕“What are the hardware requirements for running an LLM as operating system?”,这次模型更新对开发者和企业有什么影响?

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会,企业则会更关心可替代性、接入门槛和商业化落地空间。