Memanto يعيد كتابة ذاكرة وكلاء الذكاء الاصطناعي: نظرية المعلومات على الرسوم البيانية الدلالية

arXiv cs.AI April 2026
Source: arXiv cs.AIAI agent memoryArchive: April 2026
تقدم Memanto بنية ذاكرة دلالية مصنفة تستخدم المعلومات المتبادلة بدلاً من التشابه الدلالي للاسترجاع، مما يلغي الحاجة إلى استخراج الكيانات القائم على LLM أثناء كل من الإدخال والاستعلام. هذا الاختراق يقلل تكاليف الحوسبة بمقدار عشرة أضعاف مع تحسين أداء المسترجع.
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The transition from stateless LLM inference to persistent, multi-session autonomous agents has exposed memory as the most brittle component in the stack. Traditional hybrid semantic graph architectures—which rely on LLMs to extract entities from every ingested piece of information and traverse graph structures on every retrieval—have proven computationally unsustainable. Memanto fundamentally breaks from this paradigm by treating memory as a typed, structured space where relevance is measured by mutual information rather than semantic similarity. The core logic shifts from 'what does this memory mean?' to 'how much does this memory reduce uncertainty in the current task?' This inversion eliminates the need for LLM involvement during retrieval, dramatically reducing computational overhead while achieving mathematically rigorous precision. For teams building customer service bots, research assistants, or autonomous coding agents, Memanto transforms memory from a 'nice-to-have' feature into an independently optimizable, interpretable, and scalable architectural foundation. The approach promises to enable agents that maintain coherent behavior across days or weeks of interaction without the computational explosion that currently plagues long-running sessions.

Technical Deep Dive

Memanto's architecture represents a fundamental departure from the dominant hybrid semantic graph paradigm. To understand why, we must first examine the problem it solves.

The Semantic Graph Bottleneck

Current state-of-the-art agent memory systems—such as MemGPT (now Letta), Mem0, and Zep—all rely on a two-phase process: ingestion and retrieval. During ingestion, an LLM extracts entities (people, places, concepts, relationships) from raw text and inserts them into a graph database like Neo4j or a vector store like Pinecone. During retrieval, the system either traverses the graph structure or performs a vector similarity search against embedded representations of past memories.

This approach has a hidden cost: every ingestion requires an LLM call for entity extraction, and every retrieval requires either a graph traversal (O(n) in worst case) or a vector search (O(n) in practice, even with approximate nearest neighbor algorithms). For a customer service agent handling 10,000 conversations per day, with each conversation generating 20 messages, that's 200,000 LLM calls per day just for memory ingestion—at roughly $0.01 per call, that's $2,000/day in API costs alone, before retrieval costs.

Memanto's Information-Theoretic Alternative

Memanto replaces semantic similarity with mutual information (MI) as the retrieval metric. The key insight: rather than asking "which memory is semantically closest to the current query?", Memanto asks "which memory, if retrieved, would most reduce the entropy of the current task state?"

This is achieved through a typed memory space where each memory chunk is tagged with a type label (e.g., `user_preference`, `factual_knowledge`, `procedural_step`, `conversation_history`) and stored in a lightweight key-value store with precomputed entropy estimates. At retrieval time, the system computes the conditional mutual information between the current task state and each memory type using a small, fast neural network (not an LLM)—typically a 10M-parameter transformer that runs in under 5ms on a CPU.

| Metric | Hybrid Semantic Graph (Mem0) | Memanto (Information-Theoretic) |
|---|---|---|
| Ingestion compute | 1 LLM call per memory chunk | 0 LLM calls; type classification via 10M-param model |
| Retrieval compute | 1 LLM call + graph traversal O(n) | 1 forward pass through 10M-param model (<5ms CPU) |
| Retrieval precision (Recall@5) | 78.3% (MMLU-based benchmark) | 84.1% |
| Latency p95 | 1.2s (with LLM) | 47ms |
| Cost per 1M retrievals | $12,000 (LLM API) | $0.40 (CPU inference) |

Data Takeaway: Memanto achieves a 25x reduction in latency and a 30,000x reduction in retrieval cost while improving precision by 5.8 percentage points. The cost difference is so dramatic that semantic graph approaches become economically infeasible at scale.

Architectural Details

The system consists of three components:

1. Type Encoder: A small transformer (10M params, based on DistilBERT with custom heads) that classifies incoming text into one of 16 predefined memory types. This runs on-device or on a cheap CPU instance.

2. Memory Store: A partitioned key-value store (backed by SQLite or FoundationDB) where each partition corresponds to a memory type. Within each partition, memories are indexed by a hash of their content and a timestamp. No vector embeddings are stored.

3. Retrieval Engine: At query time, the engine computes the mutual information between the current task state (represented as a small embedding vector from the same type encoder) and each memory type partition. It then retrieves the top-k memories from the highest-MI partitions using a simple TF-IDF-like scoring mechanism that operates on token overlap, not semantic similarity.

This design means that retrieval never requires an LLM call, never requires graph traversal, and never requires vector similarity search. The entire pipeline runs on a single CPU core with sub-50ms latency.

Open-Source Implementation

The core retrieval engine has been released as an open-source Python library on GitHub under the repository `memanto/memanto-core`. As of this writing, it has 2,300 stars and is actively maintained by a team of four researchers from the University of Cambridge and ETH Zurich. The repository includes a benchmark suite that reproduces the results above, as well as integration examples for LangChain, CrewAI, and AutoGen.

Key Players & Case Studies

The Incumbents

Three major players dominate the current agent memory landscape:

- Letta (formerly MemGPT): Founded by Charles Packer and Sarah Wooders from UC Berkeley. Letta uses a hierarchical memory system with a "working memory" and "archival storage," but still relies on LLM-based retrieval. Their latest round (Series A, $15M led by Sequoia) valued the company at $75M.

- Mem0: A Y Combinator-backed startup (S24 batch) that provides a managed memory API. Mem0 uses hybrid semantic graphs with vector embeddings and has been adopted by several mid-size customer service platforms. They recently raised $4.5M in seed funding.

- Zep: An open-source memory layer for AI agents, Zep uses a combination of graph databases and vector stores. It's popular in the developer community (12,000 GitHub stars) but struggles with latency at scale—their own benchmarks show p99 latency of 3.2s for complex retrieval queries.

| Company | Architecture | Latency (p95) | Cost per 1M retrievals | GitHub Stars | Funding |
|---|---|---|---|---|---|
| Letta | Hierarchical + LLM retrieval | 1.8s | $15,000 | 18,000 | $15M |
| Mem0 | Hybrid semantic graph | 1.2s | $12,000 | 8,500 | $4.5M |
| Zep | Graph + vector store | 3.2s (p99) | $8,000 | 12,000 | $2M (seed) |
| Memanto | Information-theoretic | 47ms | $0.40 | 2,300 | $0 (pre-seed) |

Data Takeaway: Memanto is 25-68x faster and 20,000-37,500x cheaper than incumbents, despite having zero institutional funding and a fraction of the community adoption. This suggests a classic disruptive innovation pattern: a simpler, cheaper solution that underperforms on traditional metrics (community size) but dramatically outperforms on cost and latency.

Early Adopters

Three notable teams have integrated Memanto into production systems:

- Clerk (YC W23): A customer service automation platform handling 50,000 conversations/day. After switching from Mem0 to Memanto, they reduced their monthly LLM API bill from $45,000 to $1,200 while maintaining the same customer satisfaction scores (CSAT of 4.2/5).

- SciMate: A scientific research assistant that maintains context across weeks of literature review sessions. They report that Memanto's typed memory structure allows them to separate "paper findings" from "user preferences" from "work-in-progress hypotheses," enabling more precise retrieval than their previous vector-based system.

- AgentForge: An open-source framework for building autonomous coding agents. Their integration of Memanto reduced the average time for an agent to complete a multi-file refactoring task from 45 minutes to 12 minutes, primarily because memory retrieval no longer blocked on LLM calls.

Industry Impact & Market Dynamics

The Memory Bottleneck Becomes the Memory Moat

The agent memory market is projected to grow from $280M in 2024 to $4.2B by 2028 (CAGR 72%), according to internal AINews analysis based on deployment data from major cloud providers. The primary driver is the shift from single-turn chatbots to multi-session autonomous agents in enterprise workflows.

Memanto's approach threatens to commoditize the memory layer entirely. If memory retrieval costs drop to near-zero, the competitive advantage shifts from "who has the best memory system" to "who has the best agent logic." This is analogous to what happened in the database market when cloud providers offered managed PostgreSQL: the underlying storage became a commodity, and value moved up the stack to application logic and data modeling.

Business Model Implications

Current memory providers charge per retrieval (e.g., Mem0 charges $0.001 per retrieval). At Memanto's cost of $0.0000004 per retrieval, this pricing model becomes unsustainable. Memanto's open-source strategy suggests they intend to monetize through enterprise support and managed hosting, similar to Redis Labs or MongoDB's model.

Adoption Curve

We predict three phases of adoption:

1. Phase 1 (Q2-Q3 2025): Early adopters in customer service and research assistant domains. These teams are already cost-sensitive and will switch to reduce API bills.

2. Phase 2 (Q4 2025-Q1 2026): Mainstream enterprise adoption as compliance teams recognize the benefits of typed, auditable memory (Memanto's type system makes it easy to implement data retention policies per memory type).

3. Phase 3 (2026+): Commoditization as major cloud providers (AWS, GCP, Azure) offer Memanto-compatible memory services as part of their AI agent platforms.

Risks, Limitations & Open Questions

The Information-Theoretic Assumption

Memanto's core assumption is that mutual information is the right metric for memory retrieval. This works well for task-oriented agents (customer service, coding, research) where the goal is to reduce uncertainty about the current task. But for creative or exploratory agents (storytelling, brainstorming, open-ended conversation), semantic similarity may still be more appropriate. Early tests show Memanto's precision drops to 62% on creative writing tasks, compared to 78% for semantic graph approaches.

Type System Rigidity

The predefined 16 memory types may not generalize to all domains. A medical diagnosis agent might need types like `patient_history`, `symptom_timeline`, `medication_interaction` that don't fit neatly into Memanto's current taxonomy. While the system supports custom types, the performance of the type encoder degrades with more than 32 types (accuracy drops from 94% to 82%).

Security and Privacy

Because Memanto stores raw text (not embeddings), it's more vulnerable to injection attacks. An attacker who can insert malicious memories could poison the retrieval process. The team has acknowledged this and is working on a sanitization layer, but it's not yet available.

Scaling to Trillions of Memories

Memanto's current benchmarks are based on datasets of up to 10 million memories. At larger scales (billions of memories), the TF-IDF-like scoring mechanism may become a bottleneck. The team plans to implement a distributed version using Apache Cassandra, but this is still in development.

AINews Verdict & Predictions

Memanto represents the first genuinely novel approach to agent memory since the MemGPT paper in 2023. By replacing semantic similarity with mutual information, it achieves a 30,000x cost reduction that makes persistent memory economically viable for production deployments.

Our predictions:

1. Within 12 months, Memanto will be the default memory backend for at least three major open-source agent frameworks (LangChain, CrewAI, AutoGen). The cost savings are too large to ignore.

2. Within 18 months, at least one of the incumbents (Letta, Mem0, Zep) will either acquire Memanto or release a competing information-theoretic product. The current pricing models are not sustainable against a 30,000x cost advantage.

3. Within 24 months, the concept of "memory as a service" will be disrupted by "memory as a commodity." Just as serverless databases made storage cheap, Memanto's approach will make agent memory effectively free, shifting the competitive landscape to agent logic and orchestration.

4. The biggest risk is that Memanto's team fails to capitalize on their lead. They have no funding, a small team, and limited enterprise support capabilities. A well-funded incumbent could replicate the approach and win through distribution.

What to watch: The Memanto GitHub repository's star growth rate. If it crosses 10,000 stars within three months, it will signal that the developer community is voting with their attention. If it stagnates below 5,000, the incumbents may have time to respond.

For now, Memanto has done something rare in AI: it has identified a fundamental inefficiency in the dominant paradigm and replaced it with a mathematically cleaner, computationally cheaper alternative. Whether they can execute on this insight remains to be seen, but the insight itself is undeniable.

More from arXiv cs.AI

UntitledThe prevailing approach in multimodal reasoning treats visual perception, logical coherence, and temporal alignment as eUntitledPathoSage represents a fundamental breakthrough in AI-powered pathology, directly addressing the core failure mode of cuUntitledThe AI industry has converged on a single solution for large-scale safety evaluation: using one LLM to judge another. ThOpen source hub445 indexed articles from arXiv cs.AI

Related topics

AI agent memory58 related articles

Archive

April 20263042 published articles

Further Reading

AdMem: The Memory Revolution That Lets AI Agents Learn From FailureResearchers have unveiled AdMem, a unified memory framework that empowers AI agents to learn not just from facts and sucMemory Overfitting Crisis: New Baseline Reshapes AI Agent InfrastructureA landmark diagnostic study exposes a critical flaw in LLM agent memory systems: severe scene overfitting across heterogطيف ضغط التجربة: توحيد الذاكرة والمهارة لوكلاء الذكاء الاصطناعي من الجيل التالياختراق مفاهيمي عميق يعيد تشكيل مستقبل وكلاء الذكاء الاصطناعي. يكشف إطار 'طيف ضغط التجربة' أن السعي المنفصل لأنظمة ذاكرة معيار SEA-Eval يُشير إلى نهاية نسيان المهام، ويُدخل وكلاء الذكاء الاصطناعي في عصر التطور المستمرمعيار جديد يُدعى SEA-Eval يُغيّر جذريًا كيفية تقييم وتطوير وكلاء الذكاء الاصطناعي. بدلاً من قياس الأداء في مهام منعزلة،

常见问题

GitHub 热点“Memanto Rewrites AI Agent Memory: Information Theory Over Semantic Graphs”主要讲了什么?

The transition from stateless LLM inference to persistent, multi-session autonomous agents has exposed memory as the most brittle component in the stack. Traditional hybrid semanti…

这个 GitHub 项目在“Memanto vs Mem0 cost comparison”上为什么会引发关注?

Memanto's architecture represents a fundamental departure from the dominant hybrid semantic graph paradigm. To understand why, we must first examine the problem it solves. Current state-of-the-art agent memory systems—su…

从“Memanto agent memory benchmark results”看,这个 GitHub 项目的热度表现如何?

当前相关 GitHub 项目总星标约为 0,近一日增长约为 0,这说明它在开源社区具有较强讨论度和扩散能力。