1 / 11

The Pulse of AI Research

Arxiv Trends — Jan 2025 to Mar 2026

A data-driven look at ~106k arxiv papers to see what AI researchers are actually working on right now, and where the field is headed.

Overview

The Numbers

High-level stats from our dataset. We scraped four of the most active AI-related categories on arxiv and ran topic modeling on every abstract.

105,983
Papers Analyzed
15
Months
4
Categories
20
Topics Found

cs.AI · cs.LG · cs.CL · stat.ML

Scale

Paper Volume Over Time

Monthly paper counts across eight AI/ML arxiv categories (cs.AI, cs.LG, cs.CL, cs.CV, cs.NE, cs.MA, cs.IR, stat.ML). The overall trend is up -- AI research output keeps accelerating. Hover over bars for exact numbers.

Topics

Topic Landscape

We used NMF (a topic modeling algorithm) to automatically discover 20 topic clusters from paper abstracts. This chart shows all of them and how they rank by volume.

Topic Heatmap

Rising

What's Rising

These topic clusters are growing their share of total papers month over month. The chart on the right shows the trend lines. Agents and reasoning are having a moment.

agent, agents, multi agent4.2% → 7.9%
attention, memory, inference6.0% → 7.8%
reasoning, cot, thought2.2% → 4.3%
rl, policy, reinforcement4.9% → 6.2%
diffusion, diffusion models, generation3.8% → 4.8%

Declining

What's Cooling

These topics are losing their share of total papers. Not dead -- just less dominant than they were at the start of 2025. The percentages show each topic's share of all papers.

llms, language, llm8.9% → 7.2%
data, training, synthetic9.0% → 6.9%
learning, machine, machine learning7.7% → 4.8%
detection, anomaly, anomaly detection5.3% → 3.8%
ai, human, systems5.9% → 5.2%

Bursts

Keyword Bursts

Some keywords suddenly spike in usage -- these bursts often signal a breakthrough paper or new trend. We detect them statistically: if a keyword's frequency jumps way above its historical average, it shows up here.

Burst Timeline

Network

Research Map

This network shows which keywords appear together in papers. Clusters of connected words reveal research communities. Bigger nodes are more common terms.

Research Network

Topics

Word Clouds

Word Clouds

Each cloud shows the most common words within a discovered topic cluster. Bigger word = more frequent in that cluster.

Takeaways

Key Takeaways

The big-picture patterns. If you remember one slide, make it this one.

1 agent, agents is the fastest-growing research theme, nearly doubling in share over 15 months.
2 llms, language topics are losing relative share — not dying, but being absorbed into more specific sub-fields.
3 50 keyword bursts detected — signaling sudden spikes of community attention.
4 Watchlist terms gaining momentum: world model (2.1x), agent (1.8x), reasoning (1.8x).
5 AI research output continues to accelerate — monthly volume grew steadily across the window.

Method

Methodology

How we built this: we pulled papers from the arxiv API, ran NLP topic modeling, and tracked how each topic's share changed month over month.

Data

Arxiv API — cs.AI, cs.LG, cs.CL, stat.ML
Jan 2025 – Mar 2026

Processing

TF-IDF vectorization, 12k features,
(1,2)-grams, min_df=5

Topics

NMF with 20 components,
linear regression on monthly proportions

Bursts

Z-score detection (z > 2.0),
growth ratio last 3 vs first 3 months

Network

PMI co-occurrence, top 80 terms,
greedy modularity communities

Tools

Python, scikit-learn, pandas, plotly,
matplotlib, networkx, wordcloud

Generated by Summer · Data from arxiv.org