Predicting the age of researchers using bibliometric data
Article [Accepted Manuscript]
Is part ofJournal of informetrics ; vol. 11, no. 3, pp. 713-729.
The age of researchers is a critical factor necessary to study the bibliometric characteristics of the scholars that produce new knowledge. In bibliometric studies, the age of scientific authors is generally missing; however, the year of the first publication is frequently considered as a proxy of the age of researchers. In this article, we investigate what are the most important bibibliometric factors that can be used to predict the age of researchers (birth and PhD age). Using a dataset of 3574 researchers from Québec for whom their Web of Science publications, year of birth and year of their PhD are known, our analysis falls under the linear regression setting and focuses on investigating the predictive power of various regression models rather than data fitting, considering also a breakdown by fields. The year of first publication proves to be the best linear predictor for the age of researchers. When using simple linear regression models, predicting birth and PhD years result in an error of about 3.7 years and 3.9 years, respectively. Including other bibliometric data marginally improves the predictive power of the regression models. A validation analysis for the field breakdown shows that the average length of the prediction intervals vary from 2.5 years for Basic Medical Sciences (for birth years) up to almost 10 years for Education (for PhD years). The average models perform significantly better than the models using individual observations. Nonetheless, the high variability of data and the uncertainty inherited by the models advice to caution when using linear regression models for predicting the age of researchers.