arXiv cs.AI AI News

AINews has published 405 articles sourced from arXiv cs.AI, tracking recurring developments and primary announcements. Key themes include calibrated interactive RL, distribution shift, LLM agents.

Overview

AINews has published 405 articles sourced from arXiv cs.AI, tracking recurring developments and primary announcements. Key themes include calibrated interactive RL, distribution shift, LLM agents.

Browse all source hubs Browse topic hubs Browse archive

Published articles

405

Latest update

May 27, 2026

Quality score

Topic diversity

1657

Primary host

arxiv.org

Latest coverage from arXiv cs.AI

Untitled

arXiv cs.AI 05/28, 07:08 AM

For years, training multi-turn dialogue agents has been haunted by a silent killer: distribution shift. Whether using static logs or prompt-based interactive RL, the dialogue histo…

Source page LLM agents May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

A new preprint on arXiv has drawn a sharp line in the sand for artificial intelligence. Researchers have introduced a benchmark called 'Creative Physical Intelligence' (CPI), desig…

Source page embodied intelligence May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

Hierarchical reinforcement learning (HRL) has long promised to solve long-horizon decision problems by discovering and reusing temporally extended skills. Yet in practice, most ski…

Source page robotics May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

For years, the AI industry has approached hallucination detection by analyzing a model's final output layer, assuming that the most truthful representation emerges at the end of th…

Source page large language models May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

The core tension in using large language models for science education has always been reliability: LLMs produce plausible step sequences but cannot guarantee the deterministic prec…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

For years, the AI agent evaluation landscape has been dominated by a single, blunt metric: how much human labor can this AI replace? Benchmarks like SWE-bench and GAIA measured tas…

Source page human-AI collaboration May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

The OmniToM benchmark, developed by a consortium of researchers from leading AI labs and universities, systematically evaluates whether LLMs possess true theory of mind—the ability…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

As AI agents evolve from simple chatbots to autonomous systems handling enterprise workflows—data entry, supply chain coordination, customer service escalation—the benchmarks used …

Source page enterprise AI May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

AINews has independently analyzed two newly released scientific AI agent frameworks—DeepTS/DeepCollector and DeepScribe—that are poised to fundamentally alter the daily workflow of…

Source page AI Agents May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

The AI industry has long treated deployed agents as immutable models, testing them against static benchmarks on day one and assuming performance remains constant. AINews has uncove…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

For years, embodied AI agents—robots and virtual assistants that perceive and act in the physical world—have excelled at executing explicit commands like 'pick up the red cup.' But…

Source page embodied AI May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

For years, the prevailing approach to building long-term memory for AI agents has been to treat it as a problem of storage. The logic is seductive: give the agent a large database,…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

The AI community has celebrated large language models for exhibiting what appears to be self-awareness, from expressing uncertainty to reflecting on their own knowledge boundaries.…

Source page AI alignment May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

The core innovation of BrickAnything is a fundamental rethinking of how 3D geometry is represented for physical construction. Traditional methods treat brick generation as a post-p…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

MEMOR-E, a four-legged mobile robot developed by a team of researchers from the University of Tokyo and the National Institute of Advanced Industrial Science and Technology (AIST),…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

A new research paper has exposed a fundamental vulnerability in large language model (LLM)-driven ubiquitous systems: when sensor readings conflict with a user's verbal statement, …

Source page LLM May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

For years, knowledge graph embeddings have treated concepts as single points in high-dimensional space. This works well for learning patterns from facts but fails catastrophically …

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

In a development that redefines the usability of quantum computing, a team has demonstrated the first seamless coupling of a femtosecond laser-pumped Coherent Ising Machine (CIM) w…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

The Med-Stress framework, developed by a consortium of AI safety researchers, puts nine frontier large language models through a gauntlet of multi-turn clinical dialogues. In singl…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

Formal verification of operating system kernels has long been the domain of a tiny elite. The seL4 kernel, for instance, took over a decade to verify and involved a team of world-c…

Source page formal verification May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

AINews has independently analyzed the 'Quantum Frog' cooperative game, a seemingly simple two-player title that hides a profound innovation. Its core mechanism—'time quantization'—…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

The AI industry's obsession with ever-larger models is giving way to a more sobering engineering reality: the performance ceiling of production AI systems is defined not by any sin…

Source page multi-agent systems May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

The AI industry has been locked in an arms race to build models that 'think' longer and harder. OpenAI's o1, DeepSeek-R1, and Anthropic's Claude Opus have all demonstrated that ext…

Source page May 2026

Untitled

arXiv cs.AI 05/28, 07:08 AM

A pre-registered study has laid bare a troubling truth about the current generation of large language models: they suffer from a systemic 'difficulty effect' in confidence calibrat…

Source page AI safety May 2026