The Secret Half-Lives of Scientific Papers

19 December 2013 4:00 pm

Scholarly papers can have relatively long “half-lives,” finds a survey released yesterday by a U.S.-based association of publishers. More than one-half of the total downloads of the articles covered by the survey took place more than 2 years after publication, while in some fields it took more than 4 years for a paper to hit its half-life.

The findings come as the U.S. government, and other governments around the world, attempt to establish policies and deadlines for making government-funded research published in private journals freely available to the public. U.S. officials have suggested that allowing publishers to keep taxpayer-funded papers behind paywalls for a year should be adequate to protect the business model of journals that charge fees to access papers. Some publishers have generally agreed, but others have pushed back, saying that’s not enough time. The new survey, sponsored by the Professional/Scholarly Publishing (PSP) Division of the Association of American Publishers in Washington, D.C., injects some hard data into the debate.

“There has been extensive dialogue surrounding public access and embargo periods but assumptions, opinions and ideas have never been grounded in actual data about usage of journal literature,” said John Tagler, PSP’s executive director, in a statement. “Rigorous, scientifically sound studies such as this are critical to setting rational and effective policy.” The results, Tagler added, support the view that “a one-size-fits-all embargo period for scholarly works will not fairly address disparities in journal usage.”

The study is the latest effort to answer a long-standing question: What happens to a research article after it is published? When it comes to citations, the data are obsessively measured, although their significance is hotly debated. And what many researchers would love to know is: “How many times is my work actually read?” Publishers, meanwhile, are interested in the business implications of a 1-year open-access deadline.

To explore such issues, Philip Davis, a publishing industry consultant in Ithaca, New York, took a look at download statistics, a potentially good proxy for determining the reading patterns for an article. As a metric, he calculated article half-lives, or how long it took to reach one-half of a paper’s total downloads.

Download data are rarely made public, so Davis reached out to a wide range of academic publishers and asked them to share. “I started the study [in] late summer,” Davis tells ScienceInsider in an e-mail:

Some of the publishers had to do a lot of work; many had to write programs to extract and count usage events from million and millions of lines of transaction logs. In some cases, they provided me with access to their usage reports and I calculated the half-lives. Everyone I worked with saw the value in doing this kind of research and were very supportive with getting me the data that I needed to do the study.

Publishers are very competitive by nature and none of them wanted to go alone with the study, so there needed to be sufficient participation by enough publishers across the subject disciplines where I could show the data but not reveal the details of any one publisher.

In the end, Davis got data from 13 publishers that use various business models with their journals (the largest were Elsevier, Wiley, and Springer). It included download information from 2812 journals covering 10 disciplines, from science and engineering to the social sciences and humanities.

The surprise was that unlike many blog posts—including, no doubt, this one—scholarly articles continue to be read years after publication. The median half-life across all publishers was between 2 and 4 years. Papers in the health sciences were on the lower end, at 2 to 3 years, and the longest lived fields were humanities, physics, and mathematics with article downloads peaking between 4 to 5 years after publication.

The findings come with some caveats, Davis notes. He couldn’t account for papers shared by duplicating PDF versions, for instance, and he had to use statistical sampling techniques to fill in some data gaps. He also was not able to identify the funding sources for papers, so could not see if government-funded papers had different download patterns from those funded by other sources.

Still, “this is the first comprehensive study of such data,” says H. Frederick Dylla, executive director of the American Institute of Physics in College Park, Maryland. “In 2012 more than 28,000 [journals] were published by more than 2500 scholarly publishers.” Each of those publishers has access to their own download data, but evidence on the wider trends has been anecdotal.

“Most people in publishing would have assumed that humanities or mathematics journal articles have longer article lives than medical journal articles,” says David Crotty, senior editor at Oxford University Press, “but I'm not sure they could have defined those patterns accurately. A lot of previous efforts have gone into studying citation half-lives, which is really interesting, but doesn't necessarily have as direct a correlation with subscription as does usage.” Crotty says that publishers are likely to release more data after this. “We're under increasing pressure to provide evidence to help set policy in these matters, so the more we can accurately gauge the ways readers use articles, the better.”