Predicting the age of researchers using bibliometric data
Article [Version acceptée]
Résumé·s
The age of researchers is a critical factor necessary to study the bibliometric characteristics of the
scholars that produce new knowledge. In bibliometric studies, the age of scientific authors is
generally missing; however, the year of the first publication is frequently considered as a proxy of the
age of researchers. In this article, we investigate what are the most important bibibliometric factors
that can be used to predict the age of researchers (birth and PhD age). Using a dataset of 3574
researchers from Québec for whom their Web of Science publications, year of birth and year of their
PhD are known, our analysis falls under the linear regression setting and focuses on investigating the
predictive power of various regression models rather than data fitting, considering also a breakdown
by fields. The year of first publication proves to be the best linear predictor for the age of
researchers. When using simple linear regression models, predicting birth and PhD years result in an
error of about 3.7 years and 3.9 years, respectively. Including other bibliometric data marginally
improves the predictive power of the regression models. A validation analysis for the field
breakdown shows that the average length of the prediction intervals vary from 2.5 years for Basic
Medical Sciences (for birth years) up to almost 10 years for Education (for PhD years). The average
models perform significantly better than the models using individual observations. Nonetheless, the
high variability of data and the uncertainty inherited by the models advice to caution when using
linear regression models for predicting the age of researchers.