Modeling a century of citation distributions
Article [Accepted Manuscript]
Is part ofJournal of informetrics ; vol. 3, no. 4, pp. 296-303.
The prevalence of uncited papers or of highly cited papers, with respect to the bulk of publications, provides important clues as to the dynamics of scientific research. Using 25 million papers and 600 million references from the Web of Science over the 1900–2006 period, this paper proposes a simple model based on a random selection process to explain the “uncitedness” phenomenon and its decline over the years. We show that the proportion of cited papers is a function of (1) the number of articles available (the competing papers), (2) the number of citing papers and (3) the number of references they contain. Using uncitedness as a departure point, we demonstrate the utility of the stretched-exponential function and a form of the Tsallis q-exponential function to fit complete citation distributions over the 20th century. As opposed to simple power-law fits, for instance, both these approaches are shown to be empirically well-grounded and robust enough to better understand citation dynamics at the aggregate level. On the basis of these models, we provide quantitative evidence and provisional explanations for an important shift in citation practices around 1960. We also propose a revision of the “citation classic” category as a set of articles which is clearly distinguishable from the rest of the field.