arXiv cs.AI AI News
AINews has published 405 articles sourced from arXiv cs.AI, tracking recurring developments and primary announcements. Key themes include calibrated interactive RL, distribution shift, LLM agents.
Overview
AINews has published 405 articles sourced from arXiv cs.AI, tracking recurring developments and primary announcements. Key themes include calibrated interactive RL, distribution shift, LLM agents.
Published articles
405
Latest update
May 27, 2026
Quality score
9
Topic diversity
1657
Primary host
arxiv.org
Latest coverage from arXiv cs.AI
For years, training multi-turn dialogue agents has been haunted by a silent killer: distribution shift. Whether using static logs or prompt-based interactive RL, the dialogue histo…
A new preprint on arXiv has drawn a sharp line in the sand for artificial intelligence. Researchers have introduced a benchmark called 'Creative Physical Intelligence' (CPI), desig…
Hierarchical reinforcement learning (HRL) has long promised to solve long-horizon decision problems by discovering and reusing temporally extended skills. Yet in practice, most ski…
For years, the AI industry has approached hallucination detection by analyzing a model's final output layer, assuming that the most truthful representation emerges at the end of th…
The core tension in using large language models for science education has always been reliability: LLMs produce plausible step sequences but cannot guarantee the deterministic prec…
For years, the AI agent evaluation landscape has been dominated by a single, blunt metric: how much human labor can this AI replace? Benchmarks like SWE-bench and GAIA measured tas…
The OmniToM benchmark, developed by a consortium of researchers from leading AI labs and universities, systematically evaluates whether LLMs possess true theory of mind—the ability…
As AI agents evolve from simple chatbots to autonomous systems handling enterprise workflows—data entry, supply chain coordination, customer service escalation—the benchmarks used …
AINews has independently analyzed two newly released scientific AI agent frameworks—DeepTS/DeepCollector and DeepScribe—that are poised to fundamentally alter the daily workflow of…
The AI industry has long treated deployed agents as immutable models, testing them against static benchmarks on day one and assuming performance remains constant. AINews has uncove…
For years, embodied AI agents—robots and virtual assistants that perceive and act in the physical world—have excelled at executing explicit commands like 'pick up the red cup.' But…
For years, the prevailing approach to building long-term memory for AI agents has been to treat it as a problem of storage. The logic is seductive: give the agent a large database,…
The AI community has celebrated large language models for exhibiting what appears to be self-awareness, from expressing uncertainty to reflecting on their own knowledge boundaries.…
The core innovation of BrickAnything is a fundamental rethinking of how 3D geometry is represented for physical construction. Traditional methods treat brick generation as a post-p…
MEMOR-E, a four-legged mobile robot developed by a team of researchers from the University of Tokyo and the National Institute of Advanced Industrial Science and Technology (AIST),…
A new research paper has exposed a fundamental vulnerability in large language model (LLM)-driven ubiquitous systems: when sensor readings conflict with a user's verbal statement, …
For years, knowledge graph embeddings have treated concepts as single points in high-dimensional space. This works well for learning patterns from facts but fails catastrophically …
In a development that redefines the usability of quantum computing, a team has demonstrated the first seamless coupling of a femtosecond laser-pumped Coherent Ising Machine (CIM) w…
The Med-Stress framework, developed by a consortium of AI safety researchers, puts nine frontier large language models through a gauntlet of multi-turn clinical dialogues. In singl…
Formal verification of operating system kernels has long been the domain of a tiny elite. The seL4 kernel, for instance, took over a decade to verify and involved a team of world-c…
AINews has independently analyzed the 'Quantum Frog' cooperative game, a seemingly simple two-player title that hides a profound innovation. Its core mechanism—'time quantization'—…
The AI industry's obsession with ever-larger models is giving way to a more sobering engineering reality: the performance ceiling of production AI systems is defined not by any sin…
The AI industry has been locked in an arms race to build models that 'think' longer and harder. OpenAI's o1, DeepSeek-R1, and Anthropic's Claude Opus have all demonstrated that ext…
A pre-registered study has laid bare a troubling truth about the current generation of large language models: they suffer from a systemic 'difficulty effect' in confidence calibrat…