Arxiv Trends — Jan 2025 to Mar 2026
by Summer
A data-driven look at ~106k arxiv papers to see what AI researchers are actually working on right now, and where the field is headed.
Overview
High-level stats from our dataset. We scraped four of the most active AI-related categories on arxiv and ran topic modeling on every abstract.
cs.AI · cs.LG · cs.CL · stat.ML
Scale
Monthly paper counts across eight AI/ML arxiv categories (cs.AI, cs.LG, cs.CL, cs.CV, cs.NE, cs.MA, cs.IR, stat.ML). The overall trend is up -- AI research output keeps accelerating. Hover over bars for exact numbers.
Topics
We used NMF (a topic modeling algorithm) to automatically discover 20 topic clusters from paper abstracts. This chart shows all of them and how they rank by volume.
Rising
These topic clusters are growing their share of total papers month over month. The chart on the right shows the trend lines. Agents and reasoning are having a moment.
Declining
These topics are losing their share of total papers. Not dead -- just less dominant than they were at the start of 2025. The percentages show each topic's share of all papers.
Bursts
Some keywords suddenly spike in usage -- these bursts often signal a breakthrough paper or new trend. We detect them statistically: if a keyword's frequency jumps way above its historical average, it shows up here.
Network
This network shows which keywords appear together in papers. Clusters of connected words reveal research communities. Bigger nodes are more common terms.
Topics
Each cloud shows the most common words within a discovered topic cluster. Bigger word = more frequent in that cluster.
Takeaways
The big-picture patterns. If you remember one slide, make it this one.
Method
How we built this: we pulled papers from the arxiv API, ran NLP topic modeling, and tracked how each topic's share changed month over month.
Arxiv API — cs.AI, cs.LG, cs.CL, stat.ML
Jan 2025 – Mar 2026
TF-IDF vectorization, 12k features,
(1,2)-grams, min_df=5
NMF with 20 components,
linear regression on monthly proportions
Z-score detection (z > 2.0),
growth ratio last 3 vs first 3 months
PMI co-occurrence, top 80 terms,
greedy modularity communities
Python, scikit-learn, pandas, plotly,
matplotlib, networkx, wordcloud
Generated by Summer · Data from arxiv.org