Dean's Blog: The Curse of the h-index

March 3, 2022

As my personal number of citations and h-index inch upwards I find myself in somewhat of a dilemma. On one hand, I can’t help being proud of those numbers as they represent a life’s work. On the other hand, they are, to use a term applied to Steve Jobs’ view of the world, a “reality distortion field” and worse, do not truly reflect what we aspire to as scholarly value within our School of Data Science (SDS) at the University of Virginia (UVA). Let me explain.

First, the distortion. Consider four examples from my own experience:

  1. I have a research paper that has over 37,000 citations that no one, well no one in the last 20 years, has read! It’s a paper about a database we developed that is heavily used and hence cited. While we have credible ways of citing data using Digital Object Identifiers (DOIs) and resources like ZenodoFigshare and Dryad, there is still not a culture of doing so nor a sense these resources will persist. Worse still, I have not worked on that resource in the past eight years, yet it still gets cited and attributed to me, ignoring those who actually work on the resource today. This is a perfect form of citation creep – subsequent papers cite the reference in a previous paper without seeking updated, more pertinent references to cite. 
  2. There is work that I feel was academically the most challenging yet has relatively few citations since each body of work is in a niche area. Occasionally, a subset of such papers, regarded as sleeping beauties, get the kiss of life because of unforeseen events, such as a pandemic, or they were ahead of their time and get rediscovered, read and cited. In short, the number of citations of a paper does not reflect the value of the work, at least in the eye of this author.
  3. There are papers I have written in the area of professional development which have significant views but relatively few citations. For example, a paper titled Ten Simple Rules for Getting Published has over 146,000 views but only 50 citations. How do we truly measure the value to the scholarly enterprise?
  4. There is work that I have contributed to that relates to standards and policies, for example, the FAIR Principles, which have many citations, but also many authors. How should we assign credit appropriately?

I could go on, but you get the idea. Citation counts and, by association, h-indices distort one’s scholarly contribution yet they are widely used. Virtually all CV’s that come across my desk–and there are many as we build out the School of Data Science–include these bibliometrics and I am guilty of reading too much into them. Why? They are simple, free, easily calculable, easily referenced comparative metrics thanks to Google Scholar Profiles. In fact, at SDS, we insist our faculty have a Google Scholar Profile. So what is the answer?

I am not sure there is an immediate one when using the h-index alone as a metric. Perhaps a variety of pertinent indices? Certainly, data scientists should weigh in on establishing the means to measure levels of collaboration, levels of translation, levels of mentorship and other elements we value in our scholars. To do so means understanding the limitations. Data needed to make such measurements are either incomplete and/or just not freely available. Further such analyses must be kept current to have value which requires a sustained effort and hence funding. That Microsoft Academic gave up trying to do this speaks to the difficulty. Google Scholar succeeds as they have the resources and data to make it work, but no apparent desire to create API’s and make the data accessible to enable a broader set of applications.

Of course, all of this does a terrible disservice to those individuals and organizations who devote their time, energy and intellect to developing bibliometric tools attempting to give credit where it is due and to recognize the true contributions of individuals, groups of individuals and organizations. I applaud your efforts. 

In 2008, in a tongue-in-cheek editorial, I am Not a Scientist, I am a Number, I outlined the notion of a Scholar Factor (SF) which took into account a broader base of scholarly activity, but not all activity that is needed to provide the true profile of a scholar. Fundamental to the SF was the idea that the scholar and their output could be found and analyzed through the use of unique identifiers. At that time, DOIs to uniquely identify published work were being assigned and used. Subsequently, ORCID ids are increasingly in use to disambiguate individuals, and DOIs are used for data, software and other research products. 

There is hope, but there remains a lot of work for data scientists accompanied by the need for a cultural shift (the hardest part). Until then we will continue to rely on human intuition and the qualitative measures that go with it. While I would advocate that there must always be a human in the loop, they could be better equipped to evaluate true scholarly value according to criteria that really matter in the world of today. Until then, we will remain in a “reality distortion field.”


Stephenson Dean