TL;DR
Researchers have released a comprehensive dataset analyzing 5,356 papers from ICLR 2026, focusing on institutional affiliations. This dataset aims to provide an accurate picture of the current landscape in AI research, avoiding common profile drift issues. The analysis highlights key institutional trends and regional contributions.
A new dataset derived from 5,356 accepted papers at ICLR 2026 offers the most accurate view yet of institutional affiliations in AI research, avoiding common profile drift issues.
The dataset was created through an end-to-end pipeline that extracts PDF-based affiliation data from each accepted paper, normalizing institution names to ensure consistency. This approach circumvents the typical problem of author profile drift, where outdated or incorrect affiliations are linked to past publications. The resulting data includes detailed information on institutions, countries, and regions, with ranking metrics based on the number of papers affiliated with each institution.
The dataset includes multiple ranking methods: counting each institution once per paper, only the first author’s affiliation, and fractional credit based on the number of institutions per paper. The top institutions span both academia and industry, with clear regional distinctions. Visualizations, such as treemaps, illustrate the distribution of research output, highlighting dominant institutions and geographic trends.
Why It Matters
This development matters because it provides a more reliable and detailed understanding of who is shaping AI research today. By accurately mapping institutional contributions, policymakers, researchers, and industry leaders can better assess research trends, collaboration patterns, and regional strengths. It also sets a new standard for how conference affiliation data should be collected and analyzed, moving beyond reliance on author profiles that often contain outdated or inconsistent information.
AI research institutional affiliation dataset
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
Previous analyses of AI research affiliations often relied on author profiles from platforms like OpenReview, which are prone to profile drift and outdated information. The ICLR 2026 dataset improves on this by extracting affiliations directly from the PDF papers, which are considered more accurate. This approach follows similar efforts in other conferences but is notable for its scale and automation, covering all accepted papers at ICLR 2026. The initiative aligns with ongoing efforts to improve transparency and accuracy in research metrics and institutional rankings.
“This pipeline provides a more accurate and current picture of institutional contributions in AI research, avoiding the common pitfalls of profile drift.”
— Dmytro Lopushanskyy, dataset creator
“The dataset offers valuable insights into research trends and institutional influence at this year’s conference.”
— ICLR organizers
PDF affiliation data extraction tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It is still unclear how the affiliations will evolve in future conferences or whether this pipeline can be easily adapted for other venues. Additionally, the impact of potential PDF parsing errors or institutional name ambiguities remains under assessment, although the current error rate is low.
research institution ranking software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
Next steps include applying this pipeline to upcoming conferences, refining the affiliation normalization process, and integrating the data into broader research trend analyses. Further, the creators plan to publish interactive visualizations and make the dataset publicly available for external analysis.

Tools and Methods of Program Analysis: 5th International Conference, TMPA 2019, Tbilisi, Georgia, November 7–9, 2019, Revised Selected Papers (Communications in Computer and Information Science)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
How does this dataset improve upon previous affiliation data?
It extracts affiliation data directly from PDF papers, avoiding outdated or incorrect author profile information, resulting in more accurate and current institutional mappings.
Can this pipeline be used for other conferences?
Yes, the methodology is adaptable, but it requires customization for different paper formats and submission systems. Plans are underway to extend its use to other major AI conferences.
What are the limitations of this dataset?
While highly accurate, the dataset relies on PDF parsing, which can occasionally misinterpret complex layouts. Also, institutional name normalization, although robust, may not cover all variants perfectly.
Will the dataset be publicly available?
Yes, the creators intend to publish the dataset and visualizations to support further research and transparency in institutional analysis.