How can we quantify research interest similarity?

Title: Measuring research interest similarity with transition probabilities

Abstract: We introduce a family of paper and author similarity measures based on the concept that papers are more similar if they are more likely to be retrieved during a literature search following backward and forward citations. As this browsing process resembles a walk in a citation network, we operationalize the concept using the transition probability (TP) of random walkers. The proposed measures are continuous and symmetric, and can be implemented on any citation network. We conduct validation tests of the TP concept and other extant alternatives to gauge which metric can classify papers and predict future coauthors most consistently across different scales of analysis (coauthorships, journals, and disciplines). Our results show that the proposed basic TP measure outperforms alternative metrics such as personalized PageRank and the node2vec machine-learning technique in classification tasks at various scales. Additionally, we discuss how publication-level data can be leveraged to approximate the research interest similarity of individual scientists. This paper is accompanied by a Python package that implements all the tested metrics.

Published in Quantitative Science Studies, co-authored by Attila Varga (Corvinus University), Sadamori Kojaku (Binghampton University), Filipo Nasciment Silva (Indiana University).