ChatGPT Plays DOOM: How LLMs Became Game Consoles Inside Your Chat Window

Q: 围绕“How much does it cost to run DOOM in ChatGPT?”，这次模型更新对开发者和企业有什么影响？

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会，企业则会更关心可替代性、接入门槛和商业化落地空间。

A developer has demonstrated that ChatGPT and Claude can run the classic 1993 game DOOM by encoding the entire game state as text within the model's context window. Each game tick is processed as a new prompt, with the model generating the next frame and game state update through text output. This experiment reveals that large language models, when prompted correctly, can simulate deterministic systems—challenging the view that LLMs are merely probabilistic text predictors. The achievement has profound implications: chat interfaces could evolve into lightweight application runtimes, enabling interactive tools, spreadsheets, and games without traditional software installation. For AI companies, this opens a new revenue model—charging for 'in-context applications' that leverage idle compute resources. While currently a novelty, the DOOM port signals a future where the chat window becomes a platform for executing arbitrary logic, not just generating text.

Technical Deep Dive

The core innovation behind running DOOM in a chat window is the repurposing of the LLM's context window as a virtual machine. The developer, known in open-source circles as 'adamm,' created a system where the entire game state—player position, enemy locations, health, ammo, map layout—is serialized into a structured text format. This text is appended to a system prompt that instructs the model to act as a game engine.

How it works step-by-step:
1. The initial prompt contains the full game logic rules, map data, and a text-based representation of the first frame (e.g., a grid of ASCII characters representing walls, enemies, and the player).
2. The user inputs an action (e.g., 'move forward', 'shoot').
3. The LLM processes the entire context—rules + current state + action—and outputs a new text block representing the updated game state and the next rendered frame.
4. This output is fed back as input for the next action, creating a loop.

The key technical challenge is maintaining state coherence across turns. LLMs have no inherent memory beyond the context window, so every piece of game state must be explicitly included in each prompt. This leads to rapid context window consumption—a single DOOM session can consume tens of thousands of tokens per minute. The developer mitigated this by compressing state descriptions and using a 'delta encoding' approach, where only changes from the previous frame are transmitted.

A related open-source project, 'llm-games' (GitHub: ~2.3k stars), explores similar territory by running simple text-based games like Snake and Tetris inside GPT-4. The DOOM port, however, is orders of magnitude more complex due to real-time rendering and collision detection requirements.

Performance metrics:
| Metric | DOOM on ChatGPT (GPT-4o) | DOOM on Claude 3.5 Sonnet | Native DOOM (1993) |
|---|---|---|---|
| Frames per second | ~0.3 | ~0.5 | 35 |
| Latency per action | 3-5 seconds | 2-4 seconds | <16ms |
| Tokens per frame | ~1,200 | ~900 | N/A |
| Cost per minute | ~$0.15 | ~$0.10 | Free |
| State accuracy | 92% (occasional hallucination) | 96% | 100% |

Data Takeaway: The latency and cost are prohibitive for real-time gaming, but the fact that state accuracy exceeds 90% is remarkable. It proves that LLMs can simulate deterministic logic with high fidelity, given sufficient prompt engineering. The primary bottleneck is context window size and inference speed, not model intelligence.

Key Players & Case Studies

The Developer: 'adamm' – A pseudonymous developer who first demonstrated the DOOM port on a technical forum. They previously contributed to the 'llm-games' repo and have a background in compiler design. Their approach was to treat the LLM as an interpreter for a domain-specific language (DSL) that describes game mechanics. The DSL is essentially a set of rules encoded in natural language, which the model 'executes' by generating the next state.

OpenAI and Anthropic's Response: Neither company has officially commented, but internal sources suggest both teams are studying the implications. OpenAI's GPT-4o has a 128k token context window, while Anthropic's Claude 3.5 Sonnet offers 200k tokens. The DOOM port works better on Claude due to its larger context and slightly lower latency, but GPT-4o benefits from superior instruction following for complex logic.

Comparison of LLM-as-Platform Capabilities:
| Platform | Context Window | State Accuracy | Latency | Cost per 1M tokens | Best Use Case |
|---|---|---|---|---|---|
| GPT-4o | 128k | 92% | 3-5s | $5.00 | Complex logic, creative tasks |
| Claude 3.5 Sonnet | 200k | 96% | 2-4s | $3.00 | Long-running sessions, stateful apps |
| Gemini 1.5 Pro | 1M | 88% | 4-6s | $7.00 | Large state spaces, memory-heavy tasks |
| Llama 3.1 405B (local) | 128k | 85% | 8-12s | ~$0.50 (self-hosted) | Privacy-sensitive applications |

Data Takeaway: Claude 3.5 Sonnet emerges as the best current platform for in-context applications due to its balance of accuracy, latency, and cost. Gemini's massive context window is attractive for state-heavy apps but suffers from higher hallucination rates. Local models like Llama 3.1 offer cost advantages but lag in performance.

Industry Impact & Market Dynamics

This experiment is more than a parlor trick—it signals a potential shift in how AI companies monetize their platforms. Currently, LLM providers charge per token for text generation. The DOOM port demonstrates that the same infrastructure can run applications, effectively turning chat interfaces into 'app stores' for lightweight programs.

Business Model Implications:
- Context-as-a-Service: AI companies could offer tiered pricing based on context window usage, with 'app mode' charging a premium for deterministic execution guarantees.
- In-Chat App Stores: OpenAI or Anthropic could allow third-party developers to submit 'context apps'—packages of prompts and state management rules that run inside the chat window. Revenue sharing would mirror Apple's App Store model.
- Compute Arbitrage: Running apps inside LLMs is computationally inefficient compared to native code, but it leverages existing inference infrastructure. For low-frequency applications (e.g., daily planning tools, interactive fiction), the overhead may be acceptable.

Market Size Projections:
| Segment | 2024 Value | 2028 Projected | CAGR |
|---|---|---|---|
| LLM API Market | $6.5B | $45B | 38% |
| In-Context Applications | $0 | $3.2B | N/A |
| AI Platform-as-a-Service | $12B | $80B | 35% |
| Traditional Game Streaming | $8B | $12B | 8% |

Data Takeaway: The in-context application market is nascent but could capture 7% of the LLM API market by 2028. The growth will depend on latency improvements and the development of standardized 'context app' frameworks.

Risks, Limitations & Open Questions

Hallucination and State Corruption: The biggest risk is that the LLM 'hallucinates' game state—e.g., placing the player in a wall or spawning an enemy where none exists. In the DOOM port, this occurred ~8% of the time with GPT-4o. For critical applications (e.g., financial modeling), even 1% error rates are unacceptable.

Context Window Exhaustion: Long sessions inevitably fill the context window, forcing the model to 'forget' early state. The DOOM port uses a sliding window approach, but this can break game logic if critical information is dropped. Future LLMs with larger context windows (e.g., 1M+ tokens) could mitigate this, but at higher cost.

Security Concerns: If chat interfaces become app platforms, they inherit all the security risks of traditional app stores—malicious prompts could be embedded in 'apps' to exfiltrate user data or manipulate the model. Companies would need robust sandboxing and prompt inspection.

Economic Viability: At $0.10-$0.15 per minute, running DOOM is 100x more expensive than native gaming. For the model to be viable as a platform, costs must drop by at least an order of magnitude, or applications must be high-value (e.g., interactive data analysis, legal document drafting).

AINews Verdict & Predictions

The DOOM port is a watershed moment—not because anyone will actually play DOOM this way, but because it proves that LLMs can be programmed to execute deterministic logic. This challenges the 'stochastic parrot' narrative and opens the door to a new computing paradigm: the Contextual Computer.

Our Predictions:
1. By Q1 2026, at least one major LLM provider (OpenAI or Anthropic) will release an official 'Context App SDK' that allows developers to build and deploy applications that run inside the chat window. This will include state management libraries, error correction modules, and a billing API.
2. By 2027, the first 'killer app' for in-context computing will emerge—likely in the form of interactive data dashboards for business analysts, not games. The ability to query, visualize, and manipulate data entirely through conversation will be transformative for knowledge workers.
3. The 'DOOM test' will become a standard benchmark for LLM determinism and instruction-following capability, alongside MMLU and HumanEval. A model's ability to run DOOM without state corruption for 100+ turns will be a key selling point.
4. Local LLMs will struggle to compete due to latency constraints, but specialized 'inference accelerators' for context-based computing will emerge, potentially as FPGA-based co-processors.

The bottom line: The chat window is no longer just a place for conversation. It is becoming a computer. The DOOM port is the first glimpse of that future—and it runs at 0.3 frames per second. But like the original DOOM itself, the technology is crude now, but the platform it heralds will change everything.

More from Hacker News

常见问题

这次模型发布“ChatGPT Plays DOOM: How LLMs Became Game Consoles Inside Your Chat Window”的核心内容是什么？

A developer has demonstrated that ChatGPT and Claude can run the classic 1993 game DOOM by encoding the entire game state as text within the model's context window. Each game tick…

从“Can ChatGPT run other games besides DOOM?”看，这个模型发布为什么重要？

The core innovation behind running DOOM in a chat window is the repurposing of the LLM's context window as a virtual machine. The developer, known in open-source circles as 'adamm,' created a system where the entire game…

围绕“How much does it cost to run DOOM in ChatGPT?”，这次模型更新对开发者和企业有什么影响？