TL;DR

Researchers have released a comprehensive dataset analyzing 5,356 papers from ICLR 2026, focusing on institutional affiliations. This dataset aims to provide an accurate picture of the current landscape in AI research, avoiding common profile drift issues. The analysis highlights key institutional trends and regional contributions.

A new dataset derived from 5,356 accepted papers at ICLR 2026 offers the most accurate view yet of institutional affiliations in AI research, avoiding common profile drift issues.

The dataset was created through an end-to-end pipeline that extracts PDF-based affiliation data from each accepted paper, normalizing institution names to ensure consistency. This approach circumvents the typical problem of author profile drift, where outdated or incorrect affiliations are linked to past publications. The resulting data includes detailed information on institutions, countries, and regions, with ranking metrics based on the number of papers affiliated with each institution.

The dataset includes multiple ranking methods: counting each institution once per paper, only the first author’s affiliation, and fractional credit based on the number of institutions per paper. The top institutions span both academia and industry, with clear regional distinctions. Visualizations, such as treemaps, illustrate the distribution of research output, highlighting dominant institutions and geographic trends.

Why It Matters

This development matters because it provides a more reliable and detailed understanding of who is shaping AI research today. By accurately mapping institutional contributions, policymakers, researchers, and industry leaders can better assess research trends, collaboration patterns, and regional strengths. It also sets a new standard for how conference affiliation data should be collected and analyzed, moving beyond reliance on author profiles that often contain outdated or inconsistent information.

Amazon

AI research institutional affiliation dataset

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Previous analyses of AI research affiliations often relied on author profiles from platforms like OpenReview, which are prone to profile drift and outdated information. The ICLR 2026 dataset improves on this by extracting affiliations directly from the PDF papers, which are considered more accurate. This approach follows similar efforts in other conferences but is notable for its scale and automation, covering all accepted papers at ICLR 2026. The initiative aligns with ongoing efforts to improve transparency and accuracy in research metrics and institutional rankings.

“This pipeline provides a more accurate and current picture of institutional contributions in AI research, avoiding the common pitfalls of profile drift.”

— Dmytro Lopushanskyy, dataset creator

“The dataset offers valuable insights into research trends and institutional influence at this year’s conference.”

— ICLR organizers

Amazon

PDF affiliation data extraction tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is still unclear how the affiliations will evolve in future conferences or whether this pipeline can be easily adapted for other venues. Additionally, the impact of potential PDF parsing errors or institutional name ambiguities remains under assessment, although the current error rate is low.

Amazon

research institution ranking software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include applying this pipeline to upcoming conferences, refining the affiliation normalization process, and integrating the data into broader research trend analyses. Further, the creators plan to publish interactive visualizations and make the dataset publicly available for external analysis.

Tools and Methods of Program Analysis: 5th International Conference, TMPA 2019, Tbilisi, Georgia, November 7–9, 2019, Revised Selected Papers (Communications in Computer and Information Science)

Tools and Methods of Program Analysis: 5th International Conference, TMPA 2019, Tbilisi, Georgia, November 7–9, 2019, Revised Selected Papers (Communications in Computer and Information Science)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How does this dataset improve upon previous affiliation data?

It extracts affiliation data directly from PDF papers, avoiding outdated or incorrect author profile information, resulting in more accurate and current institutional mappings.

Can this pipeline be used for other conferences?

Yes, the methodology is adaptable, but it requires customization for different paper formats and submission systems. Plans are underway to extend its use to other major AI conferences.

What are the limitations of this dataset?

While highly accurate, the dataset relies on PDF parsing, which can occasionally misinterpret complex layouts. Also, institutional name normalization, although robust, may not cover all variants perfectly.

Will the dataset be publicly available?

Yes, the creators intend to publish the dataset and visualizations to support further research and transparency in institutional analysis.

You May Also Like

Anthropic now has more business customers than OpenAI, according to Ramp data

According to Ramp data, Anthropic now has more verified business clients than OpenAI for the first time, marking a significant shift in the AI industry.

Quantum Sensors: Detecting the Imperceptible

Keen to discover how quantum sensors can unveil hidden changes in our environment? Their revolutionary potential might just surprise you.

Augmented Reality Contact Lenses: Screens on Your Eyes

Just imagine having screens on your eyes—discover how augmented reality contact lenses could change the way you see the world.

Quantum Internet Arrives: Say Goodbye to Lag Forever

The quantum internet is here, promising instant, ultra-secure data transfer by harnessing…