arXiv cs.AI AI News

AINews has published 405 articles sourced from arXiv cs.AI, tracking recurring developments and primary announcements. Key themes include calibrated interactive RL, distribution shift, LLM agents.

Overview

AINews has published 405 articles sourced from arXiv cs.AI, tracking recurring developments and primary announcements. Key themes include calibrated interactive RL, distribution shift, LLM agents.

Browse all source hubs Browse topic hubs Browse archive
Published articles

405

Latest update

May 27, 2026

Quality score

9

Topic diversity

1657

Primary host

arxiv.org

Latest coverage from arXiv cs.AI

Untitled
For years, training multi-turn dialogue agents has been haunted by a silent killer: distribution shift. Whether using static logs or prompt-based interactive RL, the dialogue histo…
Untitled
A new preprint on arXiv has drawn a sharp line in the sand for artificial intelligence. Researchers have introduced a benchmark called 'Creative Physical Intelligence' (CPI), desig…
Untitled
Hierarchical reinforcement learning (HRL) has long promised to solve long-horizon decision problems by discovering and reusing temporally extended skills. Yet in practice, most ski…
Untitled
For years, the AI industry has approached hallucination detection by analyzing a model's final output layer, assuming that the most truthful representation emerges at the end of th…
Untitled
The core tension in using large language models for science education has always been reliability: LLMs produce plausible step sequences but cannot guarantee the deterministic prec…
Untitled
For years, the AI agent evaluation landscape has been dominated by a single, blunt metric: how much human labor can this AI replace? Benchmarks like SWE-bench and GAIA measured tas…
Untitled
The OmniToM benchmark, developed by a consortium of researchers from leading AI labs and universities, systematically evaluates whether LLMs possess true theory of mind—the ability…
Untitled
As AI agents evolve from simple chatbots to autonomous systems handling enterprise workflows—data entry, supply chain coordination, customer service escalation—the benchmarks used …
Untitled
AINews has independently analyzed two newly released scientific AI agent frameworks—DeepTS/DeepCollector and DeepScribe—that are poised to fundamentally alter the daily workflow of…
Untitled
The AI industry has long treated deployed agents as immutable models, testing them against static benchmarks on day one and assuming performance remains constant. AINews has uncove…
Untitled
For years, embodied AI agents—robots and virtual assistants that perceive and act in the physical world—have excelled at executing explicit commands like 'pick up the red cup.' But…
Untitled
For years, the prevailing approach to building long-term memory for AI agents has been to treat it as a problem of storage. The logic is seductive: give the agent a large database,…
Untitled
The AI community has celebrated large language models for exhibiting what appears to be self-awareness, from expressing uncertainty to reflecting on their own knowledge boundaries.…
Untitled
The core innovation of BrickAnything is a fundamental rethinking of how 3D geometry is represented for physical construction. Traditional methods treat brick generation as a post-p…
Untitled
MEMOR-E, a four-legged mobile robot developed by a team of researchers from the University of Tokyo and the National Institute of Advanced Industrial Science and Technology (AIST),…
Untitled
A new research paper has exposed a fundamental vulnerability in large language model (LLM)-driven ubiquitous systems: when sensor readings conflict with a user's verbal statement, …
Untitled
For years, knowledge graph embeddings have treated concepts as single points in high-dimensional space. This works well for learning patterns from facts but fails catastrophically …
Untitled
In a development that redefines the usability of quantum computing, a team has demonstrated the first seamless coupling of a femtosecond laser-pumped Coherent Ising Machine (CIM) w…
Untitled
The Med-Stress framework, developed by a consortium of AI safety researchers, puts nine frontier large language models through a gauntlet of multi-turn clinical dialogues. In singl…
Untitled
Formal verification of operating system kernels has long been the domain of a tiny elite. The seL4 kernel, for instance, took over a decade to verify and involved a team of world-c…
Untitled
AINews has independently analyzed the 'Quantum Frog' cooperative game, a seemingly simple two-player title that hides a profound innovation. Its core mechanism—'time quantization'—…
Untitled
The AI industry's obsession with ever-larger models is giving way to a more sobering engineering reality: the performance ceiling of production AI systems is defined not by any sin…
Untitled
The AI industry has been locked in an arms race to build models that 'think' longer and harder. OpenAI's o1, DeepSeek-R1, and Anthropic's Claude Opus have all demonstrated that ext…
Untitled
A pre-registered study has laid bare a troubling truth about the current generation of large language models: they suffer from a systemic 'difficulty effect' in confidence calibrat…