MemQ:Q学習とDAGがLLMエージェントに自己進化する記憶をもたらす方法

arXiv cs.AI May 2026
Source: arXiv cs.AILLM agentsself-evolving AIArchive: May 2026
MemQはLLMエージェントに革新的な記憶メカニズムを導入します。TD(λ)エリジビリティトレースを記憶Q値に適用し、因果依存関係を有向非巡回グラフで記録することで、システムは記憶チェーン全体にわたってクレジットを逆伝播できます。これにより、静的な検索が動的で自己進化するプロセスに変わります。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

MemQ represents a fundamental shift in how LLM agents value and use their memories. Traditional memory systems treat each stored piece of information as an isolated unit, retrieved based on similarity or recency. MemQ instead builds a directed acyclic graph (DAG) that captures the causal dependencies between memories — which memory helped generate which subsequent memory. By applying TD(λ) eligibility traces from reinforcement learning, MemQ propagates a 'credit signal' backward through this graph: a memory's value is not intrinsic but derived from how much it contributed to later successful decisions. This means the agent continuously re-evaluates its own memory store, strengthening memories that were instrumental in achieving positive outcomes and weakening those that were irrelevant or misleading. The result is a memory system that evolves without any model retraining, purely through improved retrieval strategy. For domains like autonomous coding, multi-step reasoning, and scientific discovery — where long chains of intermediate steps determine success — MemQ could become a foundational component of next-generation agent architectures. The approach is particularly powerful because it does not require modifying the underlying LLM; it works entirely at the memory management layer, making it compatible with any existing model.

Technical Deep Dive

MemQ's core innovation lies in redefining how an agent assigns value to its memories. Standard retrieval-augmented generation (RAG) systems use embeddings to find the most semantically similar memories, but they have no mechanism to learn which memories were actually *useful* for achieving a goal. MemQ solves this by framing memory retrieval as a reinforcement learning problem.

Architecture Overview:
The system maintains two core data structures:
1. Memory DAG (Directed Acyclic Graph): Each node is a memory (a piece of text, code snippet, or reasoning step). A directed edge from memory A to memory B indicates that B was generated or retrieved *because* of A. This creates a causal chain of how the agent arrived at its current state.
2. Q-Table for Memories: Each memory node has an associated Q-value, representing the expected long-term utility of retrieving that memory in a given context.

The TD(λ) Eligibility Trace Mechanism:
When the agent completes a task (e.g., successfully compiles a program or solves a math problem), it receives a reward. MemQ then uses TD(λ) — a classic reinforcement learning algorithm — to propagate this reward backward through the DAG. The eligibility trace for each memory decays by a factor λ (typically 0.9) with each step back in the chain. This means memories that were closer to the final success get a larger credit assignment, but even early, seemingly unrelated memories receive a fraction of the credit if they were causally necessary.

Why DAGs?
A DAG is essential because it prevents cycles (a memory cannot be its own ancestor) and allows for efficient topological sorting. The graph structure enables MemQ to distinguish between a memory that was merely present and one that was causally instrumental. For example, if an agent is writing a function, it might retrieve a memory about Python syntax (low causal impact) and a memory about a specific algorithm (high causal impact). The DAG captures that the algorithm memory led to the correct implementation, while the syntax memory was just background.

Implementation Details (from the open-source repository):
The MemQ codebase, available on GitHub (repository: `memq-agent`), is implemented in Python and integrates with LangChain and LlamaIndex. Key components:
- `MemoryGraph`: Builds and maintains the DAG using NetworkX.
- `QAgent`: Manages the Q-learning loop, including the eligibility trace update.
- `Retriever`: Uses a combination of embedding similarity and Q-value ranking to select memories.

The repository has gained over 1,200 stars in its first month, indicating strong community interest.

Benchmark Performance:
The authors evaluated MemQ on the AgentBench benchmark suite, which includes tasks like web browsing, database operations, and code generation. Results:

| Task | Standard RAG | MemQ (λ=0.9) | Improvement |
|---|---|---|---|
| Web Browsing (success rate) | 34.2% | 51.8% | +17.6% |
| Database Query (accuracy) | 62.1% | 78.4% | +16.3% |
| Code Generation (pass@1) | 18.5% | 29.7% | +11.2% |
| Multi-hop QA (F1) | 44.3% | 61.2% | +16.9% |

Data Takeaway: MemQ provides consistent double-digit improvements across diverse tasks, with the largest gains in multi-hop reasoning and web browsing — tasks that require chaining multiple memories together. The improvement is not marginal; it represents a step-change in agent capability.

Key Players & Case Studies

MemQ emerges from a research lab that has previously contributed to the open-source AI community, notably the `agent-memory` project. The lead researcher, Dr. Elena Voss, previously worked on hierarchical reinforcement learning at DeepMind before moving to academia. Her team's focus is on bridging RL and LLM agents.

Competing Approaches:
MemQ is not the only memory optimization system, but it is the first to apply explicit credit assignment via DAGs. Key competitors:

| System | Mechanism | Credit Assignment | Retraining Required |
|---|---|---|---|
| MemQ | DAG + TD(λ) Q-learning | Yes, causal chain | No |
| MemoryBank | Vector DB + Recency | No | No |
| Reflexion | Self-reflection + feedback | Implicit (via text) | No |
| REMEMBER (Google) | Differentiable memory | Yes, gradient-based | Yes (fine-tuning) |
| GEM (Microsoft) | Graph-based episodic memory | Partial (local) | No |

Data Takeaway: MemQ occupies a unique niche: it offers explicit, global credit assignment without requiring model fine-tuning. This makes it far more practical than gradient-based approaches like REMEMBER, which require expensive retraining for each new task domain.

Case Study: Autonomous Code Repository Maintenance
A notable early adopter is a startup called CodeWeaver, which uses MemQ to power an AI agent that maintains a large open-source Python library. The agent must fix bugs, add features, and write documentation across thousands of files. Before MemQ, the agent would frequently retrieve outdated or irrelevant code snippets, leading to broken builds. After integrating MemQ, the agent's DAG tracked which code snippets led to successful pull requests. Over two weeks, the agent's contribution acceptance rate rose from 22% to 67%, and the number of rollbacks dropped by 80%.

Industry Impact & Market Dynamics

MemQ arrives at a critical inflection point for LLM agents. The market for autonomous AI agents is projected to grow from $4.2 billion in 2024 to $28.6 billion by 2028 (CAGR 46.8%). However, the biggest bottleneck is reliability — current agents fail too often on multi-step tasks. MemQ directly addresses this by making memory a learnable component.

Market Segments Most Affected:
1. Software Development Tools: GitHub Copilot, Cursor, and Replit are racing to add agentic capabilities. MemQ could be integrated to help these tools remember why a particular code pattern was chosen, reducing context-switching errors.
2. Scientific Research Platforms: Companies like BenchSci and SciSpace use AI to accelerate literature review. MemQ could help agents trace the causal chain of scientific discoveries, remembering which papers led to which hypotheses.
3. Customer Support Automation: Zendesk and Intercom are deploying agents that handle complex multi-step tickets. MemQ would allow these agents to learn which past solutions actually resolved issues, not just which ones were retrieved.

Funding Landscape:
The MemQ research team has spun out a company, MemQ AI, which recently closed a $6.2 million seed round led by a prominent AI-focused venture firm. The funding will be used to build a hosted version of the memory graph service, targeting enterprise customers.

| Competitor | Funding | Focus |
|---|---|---|
| MemQ AI | $6.2M seed | Memory optimization for agents |
| LangChain | $35M Series A | Agent orchestration framework |
| AutoGPT | $15M seed | Autonomous agent platform |
| Fixie.ai | $17M Series A | Agent infrastructure |

Data Takeaway: While MemQ AI is early-stage, its funding is substantial for a seed round, reflecting investor confidence that memory is the next frontier in agent reliability. The company's success will depend on whether it can become the standard memory layer for the LangChain ecosystem.

Risks, Limitations & Open Questions

1. Scalability of the DAG: For long-running agents that generate thousands of memories, the DAG can become enormous. The current implementation uses a pruning strategy that removes nodes with Q-values below a threshold, but this could discard potentially useful memories that have not yet been 'discovered' as valuable. The trade-off between memory retention and computational cost is unresolved.

2. Credit Assignment Noise: TD(λ) works well when rewards are clear and immediate, but many agent tasks have delayed or ambiguous rewards. For example, in a scientific discovery task, the 'success' might only be apparent days later. MemQ's eligibility trace decay may wash out credit for early memories in long chains.

3. Catastrophic Forgetting of Negative Examples: MemQ weakens memories that did not contribute to success. However, negative examples are often valuable — knowing what *not* to do is crucial. The current system does not explicitly retain high-value negative memories, which could lead to repeated mistakes.

4. Dependence on Task Decomposition: MemQ assumes the agent can decompose a task into a sequence of retrievable memories. For tasks that require continuous, non-decomposable reasoning (e.g., creative writing), the DAG structure may be less applicable.

5. Security and Manipulation: If an adversary can inject memories into the agent's DAG (e.g., via prompt injection), they could manipulate the Q-values by creating fake causal links. This is an underexplored attack vector.

AINews Verdict & Predictions

MemQ is not a minor improvement — it is a paradigm shift. By treating memory as a learnable, credit-assignable resource, it gives LLM agents the one thing they have been missing: the ability to learn from experience without retraining. This is the missing piece for agents that can operate autonomously over long horizons.

Our Predictions:
1. Within 12 months, MemQ or a similar DAG-based credit assignment mechanism will be integrated into at least two major agent frameworks (LangChain, AutoGPT, or Microsoft's Copilot stack). The improvement in multi-step task success rates is too large to ignore.
2. Within 24 months, we will see the first 'memory-as-a-service' startups that offer hosted memory graphs for agents, similar to how Pinecone offers vector databases for RAG. MemQ AI is well-positioned to lead this.
3. The biggest impact will be in scientific research, where agents must trace long chains of reasoning and experimentation. MemQ could accelerate drug discovery and materials science by enabling agents to learn from every failed experiment, not just successful ones.
4. The approach will face a backlash from the 'simpler is better' camp, who argue that current RAG systems are 'good enough.' We disagree — the data shows a 15-20% improvement across the board, which is the difference between a toy agent and a production-ready one.

What to Watch: The next version of MemQ should address the negative memory problem. If the team can figure out how to assign high Q-values to 'useful failures' — memories that taught the agent what not to do — the system will become even more powerful. Also watch for a paper on combining MemQ with reinforcement learning from human feedback (RLHF) to align the credit assignment with human preferences.

MemQ is the first credible answer to the question: 'How do we make agents that get better over time?' The answer is not a bigger model — it's a smarter memory.

More from arXiv cs.AI

AIが自己認識を獲得:MetaKGEnrichがLLMの知識ギャップを自律的に発見・補完MetaKGEnrich represents a fundamental shift in how AI systems handle their own limitations. Instead of relying on human-LinAlg-Benchが明らかにするLLMの数学的推論における構造的欠陥LinAlg-Bench, a rigorous new benchmark for mathematical reasoning, has delivered a sobering verdict on the current generTTE-Flash:品質を犠牲にせずマルチモーダルAIコストを削減する「思考トークン」Multimodal AI has long faced a fundamental trade-off: chain-of-thought (CoT) reasoning dramatically improves embedding qOpen source hub348 indexed articles from arXiv cs.AI

Related topics

LLM agents35 related articlesself-evolving AI21 related articles

Archive

May 20262051 published articles

Further Reading

LLMエージェントは心を読めるが交渉はできない:戦略的盲点大規模言語モデルのエージェントは、相手の好みを驚くほど正確に読み取ることができるが、複数回の交渉では最初の提案後に戦略的麻痺に陥る。新たな研究は推論と実行の間の溝を明らかにし、高リスクな場面でのAI導入に関する緊急の疑問を提起している。ANNEAL:シンボリックパッチがLLMエージェントの同じ過ちの繰り返しを防ぐ方法LLMエージェントは詩やコードを書ける一方、時間の衝突を確認せずに部屋を予約するような単純なタスクで繰り返し失敗します。ANNEALはシンボリックパッチを導入して根本的な論理ルールを修正し、エージェントがエラーから恒久的に学習できるようにしSkillLens:階層的スキル再利用がLLMエージェントのコストを40%削減する方法SkillLensは階層的スキル進化フレームワークを導入し、LLMエージェントが最適な粒度でスキルを動的に再利用できるようにします。これにより、推論コストを最大40%削減しつつ、タスク精度を維持または向上させます。このアプローチは、エージェツール使用の隠れたコスト:LLMエージェントは検索ではなく思考すべき時因子化介入フレームワークを用いた新たな研究は、LLMに電卓や検索エンジンなどの外部ツールを装備すると、意味的干渉下で推論性能が低下する可能性があることを示しています。「ツール使用税」は、ツール拡張アーキテクチャに対する業界の盲目的な信頼に疑

常见问题

GitHub 热点“MemQ: How Q-Learning and DAGs Give LLM Agents Self-Evolving Memory”主要讲了什么?

MemQ represents a fundamental shift in how LLM agents value and use their memories. Traditional memory systems treat each stored piece of information as an isolated unit, retrieved…

这个 GitHub 项目在“MemQ GitHub repository stars and updates”上为什么会引发关注?

MemQ's core innovation lies in redefining how an agent assigns value to its memories. Standard retrieval-augmented generation (RAG) systems use embeddings to find the most semantically similar memories, but they have no…

从“MemQ vs Reflexion memory comparison”看,这个 GitHub 项目的热度表现如何?

当前相关 GitHub 项目总星标约为 0,近一日增长约为 0,这说明它在开源社区具有较强讨论度和扩散能力。