Davis et al’s 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion

Davis, PN, Lewenstein, BV, Simon, DH, Booth, JG, & Connolly, MJL (2008) Open access publishing, article downloads, and citations: randomised controlled trial British Medical Journal 337: a568

Overview (by SH):

Davis et al.‘s study was designed to test whether the “Open Access (OA) Advantage” (i.e., more citations to OA articles than to non-OA articles in the same journal and year) is an artifact of a “self-selection bias” (i.e., better authors are more likely to self-archive or better articles are more likely to be self-archived by their authors).

The control for self-selection bias was to select randomly which articles were made OA, rather than having the author choose. The result was that a year after publication the OA articles were not cited significantly more than the non-OA articles (although they were downloaded more).

The authors write:

“To control for self selection we carried out a randomised controlled experiment in which articles from a journal publisher?s websites were assigned to open access status or subscription access only”

The authors conclude:

“No evidence was found of a citation advantage for open access articles in the first year after publication. The citation advantage from open access reported widely in the literature may be an artefact of other causes.”


To show that the OA advantage is an artefact of self-selection bias (or of any other factor), you first have to produce the OA advantage and then show that it is eliminated by eliminating self-selection bias (or any other artefact).

This is not what Davis et al. did. They simply showed that they could detect no OA advantage one year after publication in their sample. This is not surprising, since most other studies, some based based on hundreds of thousands of articles, don’t detect an OA advantage one year after publication either. It is too early.

To draw any conclusions at all from such a 1-year study, the authors would have had to do a control condition, in which they managed to find a sufficient number of self-selected, self-archived OA articles (from the same journals, for the same year) that do show the OA advantage, whereas their randomized OA articles do not. In the absence of that control condition, the finding that no OA advantage is detected in the first year for this particular sample of 247 out of 1619 articles in 11 physiological journals is completely uninformative.

The authors did find a download advantage within the first year, as other studies have found. This early download advantage for OA articles has also been found to be correlated with a citation advantage 18 months or more later. The authors try to argue that this correlation would not hold in their case, but they give no evidence (because they hurried to publish their study, originally intended to run four years, three years too early.)

(1) The Davis study was originally proposed (in December 2006) as intended to cover 4 years:

Davis, PN (2006) Randomized controlled study of OA publishing (see comment

It has instead been released after a year.

(2) The Open Access (OA) Advantage (i.e., significantly more citations for OA articles, always comparing OA and non-OA articles in the same journal and year) has been reported in all fields tested so far, for example:

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation ImpactIEEE Data Engineering Bulletin 28(4) pp. 39-47.

(3) There is always the logical possibility that the OA advantage is not a causal one, but merely an effect of self-selection: The better authors may be more likely to self-archive their articles and/or the better articles may be more likely to be self-archived; those better articles would be the ones that get more cited anyway.

(4) So it is a very good idea to try to control methodologically for this self-selection bias: The way to control it is exactly as Davis et al. have done, which is to select articles at random for being made OA, rather than having the authors self-select.

(5) Then, if it turns out that the citation advantage for randomized OA articles is significantly smaller than the citation advantage for self-selected-OA articles, the hypothesis that the OA advantage is all or mostly just a self-selection bias is supported.

(6) But that is not at all what Davis et al. did.

(7) All Davis et al. did was to find that their randomized OA articles had significantly higher downloads than non-OA articles, but no significant difference in citations.

(8) This was based on the first year after publication, when most of the prior studies on the OA advantage likewise find no significant OA advantage, because it is simply too early: the early results are too noisy! The OA advantage shows up in later years (1-4).

(9) If Davis et al. had been more self-critical, seeking to test and perhaps falsify their own hypothesis, rather than just to confirm it, they would have done the obvious control study, which is to test whether articles that were made OA through self-selected self-archiving by their authors (in the very same year, in the very same journals) show an OA advantage in that same interval. For if they do not, then of course the interval was too short, the results were released prematurely, and the study so far shows nothing at all: It is not until you have actually demonstrated an OA advantage that you can estimate how much of that might due to a self-selection artefact!

(10) The study shows almost nothing at all, but not quite nothing, because one would expect (based on our own previous study, which showed that early downloads, at 6 months, predict enhanced citations at a year and a half or later) that Davis’s increased downloads too would translate into increased citations, once given enough time.

Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation ImpactJournal of the American Association for Information Science and Technology (JASIST) 57(8) pp. 1060-1072.

(11) The findings of Michael Kurtz and collaborators are also relevant in this regard. They looked only at astrophysics, which is special, in that (a) it is a field with only about a dozen journals, to which every research astronomer has subscription access — these days they also have free online access via ADS — and (b) it is a field in which most authors self-archive their preprints very early in arxiv — much earlier than the date of publication.

Kurtz, M. J. and Henneken, E. A. (2007) Open Access does not increase citations for research articles from The Astrophysical Journal. Preprint deposited in arXiv September 6, 2007.

(12) Kurtz & Henneken, too, found the usual self-archiving advantage in astrophysics (i.e., about twice as many citations for OA papers than non-OA), but when they analyzed its cause, they found that most of the cause was the Early Advantage of access to the preprint, as much as a year before publication of the (OA) postprint. In addition, they found a self-selection bias (for preprints — which is all that were involved here, because, as noted, in astrophysics, as of publication, everything is OA): The better articles by the better authors were more likely to have been self-archived as preprints.

(13) Kurtz’s results do not generalize to all fields, because it is not true of other fields either that (a) they already have 100% OA for their published postprints, or that (b) many authors tend to self-archive preprints before publication.

(14) However, the fact that early preprint self-archiving (in a field that is 100% OA as of postprint publication) is sufficient to double citations is very likely to translate into a similar effect, in a no-OA, no-preprint field, if one reckons on the basis of the one-year access embargo that many publishers are imposing on the postprint. (The yearlong “No-Embargo” advantage provided by postprint OA in other fields might not turn out to be so big as to double citations, as the preprint Early Advantage in astrophysics did, because at least there is some subscription access to the postprint; but the counterpart of the Early Advantage for the postprint is likely to be there too.)

(15) Moreover, the preprint OA advantage is primarily Early Advantage, and only secondarily Self-Selection.

(16) The size of the postprint self-selection bias would have been what Davis et al. tested — if they had done the proper control, and waited long enough to get an actual OA effect to compare against.

(17) We had reported in an unpublished 2007 pilot study that there was no statistically significant difference between the size of the OA advantage for mandated (i.e., obligatory) and unmandated (i.e., self-selected) self-archiving:

Hajjem, C & Harnad, S. (2007) The Open Access Citation Advantage: Quality Advantage Or Quality Bias?Preprint deposited in arXiv January 22, 2007. 

(18) We will soon be reporting the results of a 4-year study on the OA advantage in mandated and unmandated self-archiving that confirms these earlier findings: Mandated self-archiving is like Davis et al.‘s randomized OA, but we find that it does not reduce the OA advantage at all — once enough time has elapsed for there to be an OA Advantage at all. 

Stevan Harnad
50th Green OA Self-Archiving Mandate Worldwide: France’s ANR/SHS

The Humanities and Social Sciences branch of France’s Agence Nationale de la recherche has just announced its Green OA self-archiving mandate — France’s first funder mandate (France’ second mandate overall, and the world’s 50th). See ROARMAP

Note that the situation in France with central repositories is very different from the case of NIH’s PMC repository: France’s HAL is a national central repository where (in principle) (1) all French research output –from every field, and every institution– can be deposited and (again, in principle) (2) every French institution (or department or funder) can have its own interface and “look” in HAL, a “virtual” Institutional Repository (IR), saving it the necessity of creating an IR of its own if it does not feel it needs to.

The crucial underlying question — and several OA advocates in France are raising the question, notably, Hélène Bosc, in a forthcoming article (meanwhile, see this) — is whether the probability of adopting institutional OA mandates in France is increased or decreased by the HAL option: Are universities more inclined to adopt a window on HAL, and to mandate central deposit of all their institutional research output, or would they be more inclined to mandate deposit in their own autonomous university IRs, which they manage and control?

Again, the SWORD protocol for automatic import and export between IRs and CRs is pertinent, because then it doesn’t matter which way institutions prefer to do it.

Agence Nationale de la recherche (ANR) (Humanities and Social Sciences Branch) (FRANCE*

Institution’s/Department’s OA Eprint Archives

Institution’s/Department’s OA Self-Archiving Policy

[Paraphrase by T. Chanier]

“The Humanities and Social Sciences branch of the French National Research Agency (ANR) mandates [requires] researchers involved in projects that it funds to deposit their scienfically validated (refereed postprint) publications in the HAL-SHS open archive, without any delay.

“HAL is a nation-wide open archive supported by all public French Research Insitutions. Hal-SHS is a sub-part dedicated to Humanities and Social Sciences

“In November 2007, the general ANR agency had merely invited its researchers to deposit in HAL

“This time (July 2008), the ANR’s SHS branch mandates that its researchers deposit (and requests project leaders to confirm that the deposit is done by everyone).”

** text extracted (July 2008) from the ANR SHS text **

“Par un communiqué :en date du 14 novembre 2007, l?ANR incite les chercheurs, porteurs ou partenaires de projets financés par elle, à intégrer leurs publications dans le système d?archives ouvertes HAL avec lequel elle collabore.

“Le Département SHS de l?ANR et la cellule-support de l?ENS LSH souhaitent donner un écho particulier à cette directive qu?ils jugent essentielle pour une visibilité accrue de la recherche française en SHS.

“La communauté des porteurs et partenaires de projets ANR en SHS doit ainsi se mobiliser autour d?un objectif commun qui est celui d?un dépôt systématique de leurs productions scientifiques dans HAL- SHS, interface SHS de l?archive HAL.

“Il est demandé aux porteurs et responsables de projets ANR de s?assurer, au sein de leurs équipes (chercheurs, universitaires, post-doc, doctorants, qu?ils soient français et étrangers), de l?intégration dans HAL de l?ensemble des publications (articles, communications, contributions à ouvrages collectifs, ou autres productions éligibles) réalisées dans le cadre du projet, et ce, au fur et à mesure de leur élaboration (par exemple dès la soumission à une revue puis à nouveau au moment de la publication effective).

Registered by: Thierry Chanier (Professor, leader of ANR SHS-funded project) on 29 Jul 2008

Open access repositories begin to reap benefits for South African science as CSIR research goes global

There are interestingsigns of an increase in the momentum of change in researchcommunications in South Africa. And equally interesting reflectionsto be made on who is not in this game – for example where are UCT and Wits in all this? 

The latestmove has been the announcement in Seoul, Korea of the creation of aglobal science gateway, WorldWideScience.org. (Thanks to PeterSuber's Blog and Denise Nicholson's Newsletter for alerting meto this news.) The good news is that this time there is a good SouthAfrican presence through the participation of the CSIR's ResearchSpace repository and the African journals from 24 countries thatappear as a result of AfricanJournals Online (AJOL).

WorldWideScience is,according to its website, 'a global science gateway connecting you tonational and international scientific databases. It hopes to'accelerate scientific discovery and progress by providing one-stopsearching of global science sources'. This project is managed by theWorldWideScienceAlliance backed by a bilateral agreement between the USDepartment of Energy's Office ofScientific and Technical Information (OSTI) and the BritishLibrary and run through the Paris-based InternationalCouncil for Scientific and Technical Information (ICSTI), Ablogon the OSTI site provides some background:

Thedilemma is that no single scientist can be expected to be aware ofthe hundreds of high-quality STI sources on the web. Moreover, evenif a person were aware of all of these sources, he or she simplywouldn’t have the time to search them one-by-one to find thescientific knowledge that will help accelerate his or her ownefforts. And, finally, this scientist will not be able to find thelarge majority of these resources through typical search engines(such as Google, Yahoo!, MSN, etc.) because most scientific databasesare only accessible in the “deep web.”

The answer proved to bethe creation of federated searching and precision relevance rankingtechnology to provide a single gateway to a number of nationalscience databases.

The CSIR putsSouth Africa on the map with its participation and its presence onthe Executive Board of the Alliance, while the 24 African countriesthat have journals in the AJOL service give Africa a much strongerpresence than it would have otherwise. Although up until recentlyAJOL has provided abstracts from its member journals, there are now39 open access journals available (including the SouthAfrican Journal of Medicine) and AJOL is in the process ofupgrading its website to provide full text to all journals. It is tobe hoped that there will be more open journals to come.

The story of the CSIR'sestablishment of its repository is an interesting one, described insome detail in anarticle in Ariadne by Martie van Deventer and Heila Pienaar in April2008. As Martie and Heila describe it, the process of creatinginstitutional repositories at the University of Pretoria and the CSIRwas an uphill slog, but one that has proved very worthwhile. Thestory is telling: the initiatives originality started out with a 2002national strategy for a framework for e-research, which resulted in2004 in the plan for a framework, SARIS. As it was planned, it wouldhave provided a national portal, Open Access standards and OAinstitutional repositories, and a digital curation service, all thislinked to the national innovation plan. However, as the authors putit, 'it soon became evident that there would be no nationalco-ordination of these efforts in the near future, and thatindividual institutions would have to start their own initiatives.Fortunately organisations such as eIFL and the Mellon Foundation havebeen playing an important role in the development of the SouthAfrican information industry and with their assistance severalinitiatives were kick- started.'

After a fairly fragmentedstart, things came together in 2007 and there is now a morecollaborative approach to creating institutional repositoriesn inSouth Africa, the article reports. There are now 10 South Africanrepositories listed in Open Doar. (UCT, by the way, does not have an institutional repository,although there are departmental repositories in ComputerScience and UCT Lawspacein the Faculty of Law, which is not listed in Open Doar).

As for the CSIR's ResearchSpace, which is now getting worldwide exposure (which can only begood for the institution and its reputation) the story is a familiarone of personal commitment by a group of dedicated advocates, helpedby collaboration and information-sharing with the University ofPretoria (UP) and its team. UP, with support from a strategiccommitment by senior management, in the wake of SARIS, created firstan institutional thesis and dissertations repository, UPeTD(with mandated deposit) and then a research repository, UPSpace.With growing support from academic staff, as the benefits ofincreased exposure became clear, and top-level commitment to thevalue of open access repositories, UP is considering a mandate fordeposit of academic articles.

At the CSIR, althoughthere was support for the idea of bringing the science council's bodyof research online in open access, barriers were created when theorganisation centralised its ICT management, so that the repositoryhad to queue for services. The situation was salvaged bycollaboration with the University of Pretoria and a more gradualapproach. From there the open access effect took over, as Googlesearches started to find the content that was being uploaded:

CSIRIS staff members werestill in the process of uploading documents when the IT departmentbecame aware of additional activity on their server. By the end ofApril 2007 just fewer than 6,000 copies of documents had beendownloaded… By the end of June, this figure had become more than28,000 documents. After several presentations and discussions it wasas if the organisation suddenly saw the potential of the initiativeand a formal decision was taken to make the repository part of theintegral design of the organisation’s new Internet site…Obviouslythe key stakeholders, government departments, are also pleasedbecause, in support of the CSIR’s core mandate (to improve thequality of lives of ordinary South Africans), publicly fundedresearch has become more accessible to a wider community.

The moral of the story – championship atinstitutional level is a necessary component if institutionalrepositories are to really fly, but this would go nowhere withoutdedication and commitment from the people driving this initiatives –from library and information services. The benefits become clearvery quickly and the added exposure for institutional (and national)research then becomes hard to ignore. A core problem in theinstitutions that are not following this path would appear to be afailure, that is all too common in South Africa, to recognise thestrategic importance of taking advantage of the opportunitiesoffered by digital technologies and the internet, not only forrepositories, but for its publishing activities more broadly . Mostuniversities in South Africa do not differ from their US colleagues,as the IthakaReport into University Publishing in a Digital Age, describes it:

Publishing generallyreceives little attention from senior leadership at universities andthe result has been a scholarly publishing industry that many in theuniversity community find to be increasingly out of step with theimportant values of the academy. As information transforms thelandscape of scholarly publishing, it is critical that universitiesdeploy the full range of their resources – faculty research andteaching activity, library collections, information technologycapacity, and publishing expertise – in ways that best serve bothlocal interests and the broader public interest. We will argue that arenewed commitment to publishing in its broadest sense can enableuniversities to more fully realize the potential global impact oftheir academic programs, enhance the reputations of their specificinstitutions, maintain a strong voice in determining what constitutesimportant scholarship and which scholars deserve recognition, and insome cases reduce costs. There seems to us to be a pressing andurgent need to revitalize the university’s publishing role andcapabilities in this digital age.

It is telling that both UCT and Wits, which claim thetop research spot in South Africa, do not appear to be taking this onboard at senior level. Why is this?


Hybrid-Gold Discount From Publishers That Embargo Green OA: No Deal

I am not at all sure that Kudos are in order for Oxford University Press (OUP), just because they offer authors at subscribing institutions a discount on their hybrid Gold OA fee:

Unlike the American Psychological Association (yes, the much maligned APA!), the American Physical Society, Elsevier, Cambridge University Press and all the other 232 publishers (57%) of the 6457 journals (63%) that are on the side of the angelsfully Green on immediate post-print self-archiving — OUP is among the Pale-Green minority of 48 publishers (12%) of 3228 journals (32%) (such as Nature, which back-slid to a postprint embargo ever since 2005).

OUP’s post-print policy is:

12 month embargo on science, technology, medicine articles
24 month embargo on arts and humanities articles
Pre-print can only be posted prior to acceptance
Pre-print must not be replaced with post-print, instead a link to published [toll] version
Articles in some journals can be made Open Access on payment of additional charge

Should we really be singing the praises of each publisher’s discount on their hybrid Gold OA fee for the double-payment they are exacting (from the subscribers as well as the authors)?

I would stop applauding as progress for OA every self-interested step taken by those publishers who do not first take the one essential OA-friendly step: going (fully) Green.

Yes, OUP are lowering fees annually in proportion to hybrid Gold OA uptake, but they are meanwhile continuing to hold the post-print hostage for 12-24 months.

In reality, all the fee reduction means is an adjustment for double-dipping — plus a lock-in on the price of Gold OA, and a lockout of Green OA.

Stevan Harnad
Kudos to Oxford: transitioning to open access

Kudos to Oxford, which continues to provide a great role model for transitioning to open access, with an announcement reminding subscribers about discounts for their authors on open access charges, a new program to extend these discounts to consortial subscribers, and price adjustments to take open access fees revenue into account for the third year in a row. For NIH-funded authors, the OA charges include not only full OA, but also deposit in PMC.

Thanks to Peter Suber on Open Access News.

This post is part of the Transitioning to Open Access and Resources and Tips: Publishers series.

Update and comment: while OUP is a role model in this one aspect, OUP’s “green” policies need some significant shaping up to be truly exemplary. Currently, OUP is pale green, allowing author self-archiving but with significant restrictions, such as lengthy embargoes. To be truly exemplary, OUP should adopt a full green policy, permitting (or better yet, like Nature, encouraging) author self-archiving of postprints immediately on acceptance for publication. This is not only good OA policy; it is good publishing policy, to keep authors who are increasingly needing to provide OA to fulfill OA mandate policies, or to take advantage of the OA impact advantage as more and more authors become aware of this. [Thanks to Stevan Harnad for the tip on the OUP pale green policy].

Noteworthy Dramatic Growth July 2008: PMC & RoMEO

PubMedCentral Submissions Jump Sharply Under New NIH Policy, Library Journal Academic Newswire July 24, 2008. According to NIH’s David Lipman, “It’s still too early to compute compliance rates, but the early returns suggest a stunning turnaround”. The figures reported in the June 2008 Dramatic Growth of Open Access support this viewpoint.

JISC recently announced that the RoMEO service has now exceeded 400 publisher copyright policies on self-archiving. This is excellent news – with all the funders, universities, and researchers and librarians wanting and looking for opportunities to self-archive, every publisher should have a policy, and have it posted at RoMEO!

National Research Council OA Mandate begins January 2009

Update July 27: the NRC announcement can be found here. Thanks to Alison Ball and Peter Suber on Open Access News.

Richard Akerman on Science Library Pad has posted news of an open access mandate policy at the National Research Council, to take effect January 2009.

Alma Swan on “Where researchers should deposit their articles”

Alma Swan has just posted an excellent overview of “Where researchers should deposit their articles

This clear, solid, sensible essay converges on the essence of a rather divergent series of discussion threads currently ongoing in the American Scientist Open Access Forum.

It is followed up with the preliminary posting of some results from a survey of Institutional Repository (IR) managers which indicate that

(1) The IRs with mandated deposit have the least difficulty collecting content (compared to IRs with no institutional deposit policy at all or merely a policy encouraging deposit).

(2) The IRs with author-only deposit have the least difficulty collecting content (compared to IRs with librarian-only deposit or both author- and librarian-deposit).

(3) The IRs with author deposit have the least difficulty collecting metadata (compared to IRs with librarian-only deposit or both author- and librarian-deposit).

Excerpts from the Alma Swan’s essay:

The issue of which model for Open Access self-archiving is best ? asking researchers to deposit their work in centralised, subject-based repositories or in their own institutional repository ? is again being discussed at length….

“…Chris Awre and I argued three years ago, in our study on ‘Linking UK Repositories‘ (and in a short paper from that study here) that distributed deposit was the best model to aim for, [but] we were arguing from a theoretical standpoint. Only a handful of universities in the UK had at the time shown any sign of understanding what opportunities lay ahead in the way universities disseminate the results of their efforts, and of the responsibilities they have towards society.

[Since then] subject-based collections have been making the running and… until recently most institutions have seemed to be disinterested in supporting the efforts to make research more widely available and used…

“The universities continued to snore but while they did so at least the funders were out of bed, showered and breakfasted. Unfortunately, instead of nudging awake the universities – their partners in research endeavour and the employers of the people to whom they hand out funds ? some big funders let them lie, circumventing them in the mechanics of the Open Access process. I would suggest that in doing this they were failing to take the whole research community’s interests into account…

“Now there are stirrings in the academy… universities finally ‘get it’, which is great for them, for research and for society. Unfortunately, they are getting it later than would have been ideal [because] …many [funder] mandates stipulate [a central repository] as the deposit locus (not so good for the employers of the fundees – the universities).

“[W]e shouldn’t get too wound up about this… but it is a shame that we have arrived at a point where universities, the mainstays of our societies’ research endeavours, have to develop more complex policies than would otherwise have been the case had funders simply directed their grantees to deposit their work in their institutional collections and harvested from there. The funders know where their grantees are, the repository software has a metadata field for funder, so the mechanics are simple. The benefit of such a move would have been to help the universities see the overall plan (earlier than they have done), ensure they put the right infrastructure in place and encouraged them to apply similar polices to cover all the research their employees do. The whole research community would thus be included and benefiting by this time, not just the… communities covered by big funder mandates. I would say that the research funders have rather let down their partners, the universities, in this sense.

“Deposit rates for [funder-mandated repositories] are not yet all they should be…. [P]eople are taking steps to remedy this, but how much easier it is for universities to attain a high level of compliance: they say, quite simply, that the repository is where they will be looking for material to be included in research assessment (and for staff appraisals, promotions boards, tenure committees …)…. [T]here is one thing more important to a researcher than a hypothetical risk of not getting future funding, and that is a non-hypothetical risk of not being employed for too much longer. It sharpens the focus just a tad.

“…So subject-specific collections… should be harvesting from the university repositories all the material that is relevant to that subject. They can provide all manner of nice services on that collection, tailored to the needs of that particular subject community.

Distributed, local deposit works with human nature, researcher preferences and the structure of the international research system, which remains institutionally-based; and the universities – those large, expensive edifices we all pay for and wish to see operate at maximum efficiency – get to collect their own research together and use the collection to manage their research effort so much better than ever before.”