Words by the tail : assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits
Article [Version of Record]
Abstract(s)
This research assesses the evolution of lexical diversity in scholarly titles using a new indicator based on zipfian frequency-rank distribution tail fits. At the operational level, while both
head and tail fits of zipfian word distributions are more independent of corpus size than
other lexical diversity indicators, the latter however neatly outperforms the former in that
regard. This benchmark-setting performance of zipfian distribution tails proves extremely
handy in distinguishing actual patterns in lexical diversity from the statistical noise generated
by other indicators due to corpus size fluctuations. From an empirical perspective, analysis
of Web of Science (WoS) article titles from 1975 to 2014 shows that the lexical concentration
of scholarly titles in Natural Sciences & Engineering (NSE) and Social Sciences & Humanities (SSH) articles increases by a little less than 8% over the whole period. With the exception of the lexically concentrated Mathematics, Earth & Space, and Physics, NSE article
titles all increased in lexical concentration, suggesting a probable convergence of concentration levels in the near future. As regards to SSH disciplines, aggregation effects observed
at the disciplinary group level suggests that, behind the stable concentration levels of SSH
disciplines, a cross-disciplinary homogenization of the highest word frequency ranks may
be at work. Overall, these trends suggest a progressive standardization of title wording in
scientific article titles, as article titles get written using an increasingly restricted and crossdisciplinary set of words.