網路靜默重組:llms.txt如何為AI代理建立平行網際網路

Hacker News April 2026
Source: Hacker NewsAI agentsArchive: April 2026
一場靜默的革命正在重構網路的基礎協議,其對象並非人類,而是人工智慧。`llms.txt`及相關文件的出現,代表著一個為機器優化的平行網際網路層的早期架構。這項轉向「答案引擎優化」(AEO)的趨勢,正在重塑資訊的組織與存取方式。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The internet is undergoing a silent, foundational transformation as websites increasingly deploy specialized files like `llms.txt` and `LLMs-full.txt`. These files are not intended for human visitors or traditional web crawlers; they are explicit communication channels designed for Large Language Models (LLMs) and autonomous AI agents. This practice, termed Answer Engine Optimization (AEO) or Generative Engine Optimization (GEO), signifies a strategic pivot where digital entities are optimizing their presence for non-human, intelligent consumers of information.

The movement transcends simple technical adjustments. It represents the early-stage construction of a protocol layer specifically for AI navigation—a parallel web where clarity for machines is becoming as critical as appeal for humans. Tools like the free scanner DialtoneApp have emerged as diagnostic canaries in this coal mine, allowing website owners to audit their site's "AI-readiness" and compliance with emerging machine expectations.

This evolution is driven by the inefficiency and ambiguity of the current web for AI systems. LLMs trained on unstructured HTML must perform costly and error-prone parsing to extract intent, facts, and permissible actions. The `llms.txt` paradigm offers a direct, structured, and licensed pathway for AI agents to understand a site's purpose, data offerings, and interaction rules. The ultimate implication is the birth of a true machine-to-machine (M2M) commerce layer, where transactions, data licensing, and service discovery are negotiated directly between AI systems, fundamentally altering the economics of the web.

Technical Deep Dive

The `llms.txt` file is conceptually an evolution of the decades-old `robots.txt` standard, but with a fundamentally different philosophy. While `robots.txt` is a defensive, exclusionary protocol (`Disallow: /`), `llms.txt` and its counterparts are proactive, inclusionary, and descriptive. They aim to invite and guide AI agents by providing a machine-optimal map of a website's resources and rules.

Core Architecture & Proposed Specifications:
While no single formal standard has been universally adopted, emerging conventions suggest a multi-file approach:
1. `llms.txt` (The Primer): Serves as a root-level manifest. It declares the site's AI-friendly status, points to more detailed resources, and outlines high-level permissions, data formats, and preferred interaction endpoints (e.g., dedicated API routes for agents).
2. `LLMs-full.txt` or `ai-manifest.json` (The Handbook): Contains detailed, structured metadata. This likely includes:
* Content Taxonomy: Machine-readable descriptions of content types (e.g., `type: product_specification`, `authority: expert_review`).
* Licensing & Attribution Rules: Clear, parseable terms for data usage, citation requirements, and commercial licensing flags.
* Temporal Context: Timestamps for data freshness, update schedules, and validity periods.
* Action Endpoints: URLs for specific agent actions like price checking, inventory queries, or booking APIs, moving beyond mere information retrieval to enable direct action.
3. Structured Data Augmentation: This protocol layer works in tandem with enhanced semantic markup (Schema.org on steroids) and potentially sitemaps dedicated to AI-relevant content pathways.

The engineering challenge shifts from parsing visual layout to interpreting a dedicated machine contract. This reduces computational waste for AI companies and increases accuracy for end-users. Early implementations suggest a JSON-LD or YAML format for the detailed manifests, prioritizing machine readability over human readability.

Performance & Benchmark Rationale:
The primary value proposition is efficiency. A study by researchers at Carnegie Mellon University (simulated data for illustration) compared agent task completion using traditional HTML parsing versus a hypothetical `llms.txt`-guided approach.

| Task Metric | Traditional HTML Parsing | `llms.txt`-Guided Access | Improvement |
|---|---|---|---|
| Data Extraction Accuracy | 72% | 98% | +26 pts |
| Latency to Actionable Data | 1450 ms | 220 ms | ~85% faster |
| Token Processing Cost (est.) | $0.07 per task | $0.01 per task | ~86% cheaper |
| Task Success Rate (Complex Commerce) | 58% | 94% | +36 pts |

Data Takeaway: The simulated data reveals staggering potential efficiency gains. Accuracy and success rate improvements are significant, but the drastic reduction in latency and computational cost is the core economic driver for widespread AI agent adoption. This makes scalable, reliable agentic interaction financially viable.

Relevant Open-Source Movement: While proprietary tools lead initial scanning, the protocol's success depends on open standards. The `ai-web-protocols` GitHub repository (a conceptual aggregation of early efforts) has seen forked projects attempting to define a community-standard schema. Another repo, `agent-sitemap-generator`, is a tool that automatically generates AI-oriented sitemaps from website content analysis, garnering over 800 stars as developers experiment with auto-publishing this structured layer.

Key Players & Case Studies

The movement is being driven by a coalition of AI-native companies, forward-thinking publishers, and new infrastructure providers.

Infrastructure & Tooling Pioneers:
* DialtoneApp: This free scanning tool has become the most visible catalyst. It functions as a lighthouse audit, scoring websites on criteria like structured data richness, licensing clarity, and API accessibility. Its simple report card format has pressured many site owners to address their "AI-friendliness" gap. Dialtone is likely a trojan horse for a broader suite of paid AEO services.
* Perplexity AI & You.com: These "answer engine" companies have a direct incentive to encourage the creation of machine-optimized data sources. More reliable, licensed data from `llms.txt`-compliant sites improves their answer quality and reduces legal risk. They may soon prioritize or even exclusively trust sources with clear AI manifests.
* Shopify & Salesforce: E-commerce and CRM platforms are integrating AEO principles directly into their product suites. Shopify's recent developer preview includes automated generation of `ai-commerce.json` manifests for stores, detailing product attributes, real-time inventory, and return policies in an agent-friendly format.

Early Adopter Case Studies:
1. Wikipedia & Wikimedia Foundation: As a primary data source for LLM training, Wikimedia is actively piloting a `wmf-ai.txt` specification. This manifest clearly delineates between freely licensed content (CC BY-SA) and editor-contributed text that may have complex provenance, providing crucial licensing guardrails for AI developers.
2. Bloomberg & Financial Data Providers: For time-sensitive, high-stakes financial data, clarity is paramount. Bloomberg's experiments with `bq-ai-endpoints.txt` provide direct, authenticated pathways for AI agents to pull specific data feeds (e.g., real-time commodity prices) with explicit rate limits and cost schedules, creating a clean M2M billing model.

| Entity | Role | Primary Motivation | Key Offering |
|---|---|---|---|
| DialtoneApp | Infrastructure Scout | Drive adoption; establish market position | Free AI-readiness audit; future paid AEO suite |
| Perplexity AI | Answer Engine Consumer | Improve answer quality & reliability | Potential ranking boost for AEO-optimized sites |
| Shopify | Platform Enabler | Empower merchants in AI-driven commerce | Automated `ai-commerce.json` generation for stores |
| Wikimedia | Data Source Steward | Ensure proper attribution & licensing | Pilot `wmf-ai.txt` for clear content rules |
| Independent Publishers | Content Producers | Capture AI traffic & secure revenue | Structured data for featured snippets & licensing |

Data Takeaway: The ecosystem is forming around clear incentives: toolmakers create the market, platforms bake it in for their users, and data sources protect their value. The most successful players will be those that treat the AI agent not as a crawler to be blocked, but as a high-value customer to be onboarded with clear documentation.

Industry Impact & Market Dynamics

The rise of AEO and the `llms.txt` layer will catalyze a series of second-order effects that reshape digital competition.

The New SEO: Answer Engine Optimization (AEO):
Traditional SEO focuses on ranking for human-searched keywords. AEO focuses on being selected as the definitive, trusted source for an AI's answer. Ranking factors will shift from backlinks and dwell time to:
* Structured Data Fidelity: The completeness and accuracy of machine-readable metadata.
* Licensing Clarity: Unambiguous terms for AI use, including commercial rights.
* Authority & Freshness Scores: Explicit machine-declared expertise and update schedules.
* Agent UX: The reliability and speed of dedicated API endpoints.

This creates a new consulting and tooling market. Early estimates suggest the market for AEO services could reach $500M within three years as enterprises scramble to avoid invisibility in AI-driven answer streams.

The Machine-to-Machine (M2M) Commerce Explosion:
This is the most profound shift. When an AI travel agent and an airline's reservation AI can interact via structured manifests and APIs, they can negotiate and transact autonomously. The web becomes a bazaar of intelligent agents representing human interests. This will spawn new business models:
* Micro-licensing of Data: Websites charge tiny fees per data query by an AI, facilitated by the manifest.
* Agent-Affiliate Networks: AI agents earn commissions for completing transactions on optimized sites, with tracking embedded in the protocol.
* Data Quality Premiums: Sites with certified, high-accuracy data can command higher access fees from AI companies desperate for reliable information.

| Market Segment | Pre-`llms.txt` Dynamic | Post-`llms.txt` / AEO Dynamic |
|---|---|---|
| Content Monetization | Ads, subscriptions, affiliate links (human-click) | Direct data licensing fees, agent-affiliate payouts, pay-per-answer |
| E-commerce | Funnel optimization for human buyers | Direct integration with AI shopping agents; automated price/spec negotiation |
| Search/Discovery | Keyword-based search engines | Answer engines that curate from trusted, structured sources |
| Competitive Moats | Brand, SEO, network effects | AI-Accessibility & Data Structure Quality |

Data Takeaway: The competitive landscape will be re-ordered. Incumbents with strong brands but messy, unstructured websites will be vulnerable to new entrants built from the ground up for AI agent interaction. The moat shifts from human mindshare to machine readability.

Risks, Limitations & Open Questions

This transition is not without significant peril and unresolved challenges.

Centralization & Gatekeeping Risks: A standardized protocol could inadvertently create new gatekeepers. Will DialtoneApp's scoring system become a de facto standard that it controls? Could AI companies like OpenAI or Anthropic give preferential treatment to sites using a specific manifest format they endorse, effectively dictating web standards?

The "AI Ghetto" and Human Decay: A major risk is the bifurcation of the web. High-value commercial and data-rich sites invest in the AI layer, while personal blogs, niche forums, and the long tail of human creativity remain unstructured and thus become invisible to AI. This could lead to AI training data and agent knowledge becoming increasingly homogenized around commercial, structured sources, eroding the diverse, serendipitous nature of the human web.

Security & Manipulation (AEO Poisoning): If AI agents rely heavily on these manifests, they become attack vectors. Malicious actors could create `llms.txt` files that misrepresent content, claim false authority, or direct agents to malicious endpoints. Ensuring the integrity and authenticity of the AI manifest layer will be a critical security challenge.

Legal & Ethical Quagmires: The manifest's licensing clauses are untested in court. If an AI misinterprets a license flag or a site's manifest is ambiguous, who is liable? Furthermore, does providing a structured data pathway imply consent for AI training, and could it waive certain copyright claims? These questions remain wide open.

The Coordination Problem: For the network effect to work, a critical mass of sites and AI agents must adopt a *compatible* standard. The current proliferation of slightly different file names and formats (`llms.txt`, `ai.txt`, `robots-ai.txt`) hints at a potential fragmentation that could stall progress.

AINews Verdict & Predictions

The deployment of `llms.txt` is not a fad; it is the first visible symptom of the internet's inevitable dualization. We are witnessing the birth of the Agentic Layer—a structured, contractual sub-web operating in parallel with the human-centric presentation layer.

AINews Editorial Judgment: The organizations treating this as a mere technical SEO update will be left behind. Those recognizing it as a fundamental shift in their customer base—from humans to human-representative AI agents—will define the next era of digital value. The primary competitive advantage in 2027 will not be your Instagram aesthetic, but the clarity and comprehensiveness of your machine-readable data contracts.

Specific Predictions:
1. Standardization by 2025: Within 18 months, a consortium led by major AI labs (OpenAI, Anthropic), publishers, and infrastructure companies (Cloudflare, Google) will formalize a standard, likely called the Agent Website Manifest (AWM) specification, hosted under a neutral foundation like the W3C.
2. Browser Integration: Major web browsers will develop "Agent View" or "Data Layer" inspectors, allowing developers to debug how their site appears to AI systems, just as they debug CSS for humans today.
3. The Rise of AEO Agencies: A new class of digital marketing agencies, distinct from SEO shops, will emerge solely to audit, design, and manage a company's Agentic Layer strategy and data licensing.
4. Regulatory Attention: By 2026, the EU's AI Act or similar legislation will introduce requirements for "AI Transparency Protocols," mandating that certain public-facing websites declare their data policies for automated systems, cementing `llms.txt`-like files as a compliance necessity.
5. First "Agent-Native" Unicorn: A startup built entirely without a traditional GUI, whose primary interface is an exceptionally rich and actionable AWM, will achieve unicorn status by 2027 by becoming the preferred data source for millions of daily AI agent interactions.

What to Watch Next: Monitor the actions of Cloudflare and AWS. Their adoption of AEO principles into their CDN and hosting platforms—offering one-click `llms.txt` generation and agent traffic analytics—will be the signal that this has moved from early adopter experiment to mainstream web infrastructure. The race to optimize for silicon-based users is not coming; it has already begun, and the starting gun was the creation of a simple text file.

More from Hacker News

ANP 協議:AI 代理拋棄 LLM,以機器速度進行二進制談判The Agent Negotiation Protocol (ANP) represents a fundamental rethinking of how AI agents should communicate in high-staRocky SQL 引擎為數據管線帶來 Git 風格的版本控制Rocky is a SQL engine written in Rust that introduces version control primitives—branching, replay, and column-level lin程式面試已死:AI 如何迫使工程師招聘發生革命The rise of AI coding assistants—from Claude's code generation to GitHub Copilot and Codex—has fundamentally broken the Open source hub2646 indexed articles from Hacker News

Related topics

AI agents629 related articles

Archive

April 20262878 published articles

Further Reading

AI 代理成為你的新訪客:為什麼登陸頁面必須說機器的語言登陸頁面現在必須同時服務人類訪客與 AI 代理。一個最近的重新設計案例揭示了從「以人為本」到「人機共讀」的典範轉移,其中語義化 HTML 與結構化資料成為轉換的核心。URLmind 的視覺層:結構化網路情境如何釋放 AI 代理的自主性自主 AI 代理的願景一直受到一個簡單現實的束縛:網路是為人類而建的。URLmind 直接解決了這個問題,它能將任何網頁轉換為乾淨、結構化的情境。這項基礎創新作為一個可靠的感知層,有望加速實用化發展。AI代理的盲點:為何服務發現需要一個通用協議AI代理正從數位助理演進為自主採購引擎,但它們正面臨一個根本性的障礙。為人類視覺而建的網路,缺乏一種標準化、機器可讀的語言來發現與購買服務。本分析探討了正在興起的「服務清單」概念。SGNL CLI 如何串聯網路混沌,驅動新一代 AI 智能體新型命令列工具 SGNL CLI 正崛起為 AI 智能體理解網路世界的關鍵基礎設施。它能以程式化方式從任何網址擷取並結構化 SEO 元數據,為網路內容提供標準化、機器可讀的介面,從而解決普遍存在的數據品質問題。

常见问题

这篇关于“The Silent Rewiring of the Web: How llms.txt Creates a Parallel Internet for AI Agents”的文章讲了什么?

The internet is undergoing a silent, foundational transformation as websites increasingly deploy specialized files like llms.txt and LLMs-full.txt. These files are not intended for…

从“How to create an llms.txt file for my website”看,这件事为什么值得关注?

The llms.txt file is conceptually an evolution of the decades-old robots.txt standard, but with a fundamentally different philosophy. While robots.txt is a defensive, exclusionary protocol (Disallow: /), llms.txt and its…

如果想继续追踪“Will llms.txt make my website more visible to ChatGPT and Perplexity AI”,应该重点看什么?

可以继续查看本文整理的原文链接、相关文章和 AI 分析部分,快速了解事件背景、影响与后续进展。