Ask or search anything...

Experiential Reinforcement Learning
15 Feb 2026

Experiential Reinforcement Learning (ERL) from Shi et al. introduces a novel training paradigm that augments standard reinforcement learning with an explicit experience reflection consolidation loop. This approach enables large language models to self-improve by transforming environmental feedback into structured behavioral revisions and internalizing these lessons for efficient, durable performance gains at inference time, demonstrating up to an 81% reward increase in complex sparse-reward tasks like Sokoban.

View blog
Resources2
GLM-5: from Vibe Coding to Agentic Engineering
17 Feb 2026

GLM-5, a foundation model from Zhipu AI and Tsinghua University, facilitates the shift from human-guided "vibe coding" to autonomous "agentic engineering" in AI. It delivers state-of-the-art results on agentic, reasoning, and coding benchmarks, often matching or exceeding leading proprietary models, driven by innovations like DeepSeek Sparse Attention and advanced reinforcement learning.

View blog
Resources
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
13 Feb 2026

SkillsBench introduces the first benchmark to systematically evaluate the effectiveness of "Agent Skills," which are structured procedural knowledge packages designed to augment large language model agents. The research finds that human-curated Skills improve agent performance by an average of +16.2 percentage points, with optimal benefits from 2-3 concise Skills, and that self-generated Skills offer little to no gain.

View blog
Resources398
Image Generation with a Sphere Encoder
16 Feb 2026

The Sphere Encoder framework, developed by researchers at Meta and the University of Maryland, enables high-fidelity image generation with minimal inference steps by encoding images onto a uniformly distributed spherical latent space. This method bypasses the multi-step processes of diffusion models and resolves the "posterior hole" issue in VAEs, delivering competitive generational fidelity with just a few passes.

View blog
Resources
Intrinsic Credit Assignment for Long Horizon Interaction
12 Feb 2026

Researchers at the University of Tübingen developed the "Intrinsic Credit Assignment for Long Horizon Interaction" (ΔBelief-RL) framework, which trains agents by using their own internal belief changes as a dense, per-turn reward signal. This approach allowed smaller models to significantly outperform larger general-purpose LLMs and state-of-the-art baselines in active information-seeking tasks, demonstrating enhanced efficiency and strong generalization across various interactive environments.

View blog
Resources5
Categorical Flow Maps
12 Feb 2026

Categorical Flow Maps (CFMs) introduce a self-distillable flow-matching method that adapts continuous-domain acceleration techniques to generate discrete data, achieving high-quality results in one to a few steps. The approach consistently outperforms prior few-step baselines across molecular graph synthesis, binarized image generation, and text modeling.

View blog
Resources21
BitDance: Scaling Autoregressive Generative Models with Binary Tokens
15 Feb 2026

Researchers from ByteDance and collaborating institutions developed BitDance, an autoregressive generative model that leverages high-entropy binary tokens and a binary diffusion head to enhance visual expressiveness and sampling efficiency. This approach enables over 30x faster inference for high-resolution image generation and achieves strong performance in class-conditional and text-to-image tasks.

View blog
Resources6
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

An investigation into large-scale AI agent societies on the Moltbook platform reveals that while systems can scale extensively, they do not spontaneously develop human-like socialization dynamics. The study found high individual agent inertia, ineffective content adaptation to community feedback, and a failure to establish stable influence hierarchies or shared social memory.

View blog
Resources1
Symmetry in language statistics shapes the geometry of model representations
16 Feb 2026

This paper identifies translation symmetry in low-order token co-occurrence statistics as the organizing principle behind the emergence of geometric structures in language model representations. It develops a mathematical theory predicting circles for cyclic concepts and 1D manifolds for continuous sequences, validating these predictions across various shallow and deep language models.

View blog
Resources
Revisiting the Platonic Representation Hypothesis: An Aristotelian View
16 Feb 2026

Researchers from EPFL and affiliated institutions developed a robust null-calibration framework for representational similarity analysis, correcting for biases related to model dimension and comparison depth. Their findings indicate that while global representation convergence trends are often artifactual, local neighborhood relationships consistently emerge across neural networks and modalities, leading to the proposed Aristotelian Representation Hypothesis.

View blog
Resources2
Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens
13 Feb 2026

Researchers from the University of Virginia and Google introduced the Deep-Thinking Ratio (DTR), a metric that measures LLM reasoning effort by tracking how stable internal token predictions become across model layers. This metric enabled Think@n, a test-time scaling strategy that reduces inference costs by approximately 50% while maintaining or improving accuracy compared to standard self-consistency.

View blog
Resources
Scalable Clifford-Based Classical Initialization for the Quantum Approximate Optimization Algorithm
15 Feb 2026
Variational Quantum Algorithms (VQAs), such as the Quantum Approximate Optimization Algorithm (QAOA), offer a promising route to tackling combinatorial optimization problems on near and intermediate-term quantum devices. However, their performance critically depends on the choice of initial parameters, and the limited expressiveness of the QAOA ansatz makes identifying effective initializations both difficult and unscalable. To address this, we propose a framework, Scalable Parameter Initialization for QAOA (SPIQ), that employs a relaxed QAOA ansatz to enable classical search over a set of Clifford-preparable quantum states that yield high-quality solutions. These states serve as superior QAOA initializations, driving rapid convergence while significantly reducing the quantum circuit evaluations needed to reach high-quality solutions and consequently lowering quantum-device cost. We present a scalable, application-agnostic initialization framework that achieves an absolute accuracy improvement of up to 80% over state-of-the-art initialization and reduces initial-state diversity by up to 10,000x across QUBO, PUBO, and PCBO problems spanning tens to hundreds of qubits. We further benchmark its performance on a wide range of problem formulations and instances derived from real-world datasets, demonstrating consistent and scalable improvements. Furthermore, we introduce two complementary strategies for selecting high-quality Clifford points identified by our search procedure and using them to seed multi-start optimization, thereby enhancing exploration and improving solution quality.
View blog
Resources
ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System
14 Feb 2026

ThunderAgent presents a program-aware inference system designed for high-throughput serving and reinforcement learning rollout of complex LLM agent workflows. It achieves 1.48–3.58x throughput gains over vLLM and 1.79–3.92x improvements in distributed RL rollouts by unifying the management of LLM inference and external tool execution.

View blog
Resources123
Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?
12 Feb 2026

An empirical evaluation determined that LLM-generated repository-level context files generally decrease coding agent task success rates by 0.5-2% and raise operational costs by over 20%, while human-provided context offers only marginal performance improvements (4% increase) at increased expense (up to 19% cost rise).

View blog
Resources
BEACONS: Bounded-Error, Algebraically-Composable Neural Solvers for Partial Differential Equations
16 Feb 2026

The BEACONS framework from Princeton University and PPPL introduces formally-verified neural solvers for Partial Differential Equations (PDEs), providing rigorous and provable L-infinity error bounds applicable even in extrapolatory scenarios. This neurosymbolic approach, combining approximation theory with algebraic compositionality, demonstrated superior accuracy, stability, and conservation properties across diverse linear and non-linear PDE problems compared to conventional neural networks.

View blog
Resources
Intelligent AI Delegation
12 Feb 2026

Google DeepMind researchers present a comprehensive framework for intelligent AI delegation, integrating insights from human organizational theory with advanced AI protocols and cryptography. This approach establishes a robust system for task distribution, accountability, and trust within hybrid AI-human networks, addressing the complexities of future agentic environments.

View blog
Resources47
Boundary Point Jailbreaking of Black-Box LLMs
16 Feb 2026

Researchers from the UK AI Security Institute and University of Oxford developed Boundary Point Jailbreaking (BPJ), the first fully automated black-box attack capable of bypassing state-of-the-art safeguards like Anthropic’s Constitutional Classifiers and OpenAI’s GPT-5 input classifier. This method increased average rubric scores on unseen harmful queries from 0% to 25.5-68% for Constitutional Classifiers and 75.6% for GPT-5.

View blog
Resources
AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

AnchorWeave presents a framework for long-horizon, world-consistent video generation by utilizing multiple local geometric memories and a multi-anchor weaving controller. This method outperforms existing approaches in maintaining spatial consistency and visual quality across extensive camera movements and generalizes to diverse scenarios.

View blog
Resources1
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models
13 Feb 2026

Researchers from Stanford University, Microsoft Spatial AI Lab, and ETH Zurich developed CoPE-VideoLM, a framework that leverages codec primitives like motion vectors and residuals for efficient video language models. This approach reduces the time-to-first-token by 86.2% and enables processing of videos up to 8 hours long within a fixed context, while also improving accuracy on temporal reasoning tasks.

View blog
Resources3,682
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents
15 Feb 2026

Researchers from Harbin Institute of Technology, Xiaohongshu Inc., and Shanghai JiaoTong University introduced REDSearcher, a scalable and cost-efficient framework for training long-horizon deep-search agents in both text and multimodal environments. The framework achieves state-of-the-art performance among open-source agents, demonstrating strong capabilities on complex benchmarks and reducing tool calls by 10.4% through efficient training and data synthesis.

View blog
Resources
There are no more papers matching your filters at the moment.