API Implications of the recent SHERPA/RoMEO Upgrade

A new version of SHERPA/RoMEO was released on the 15th August. The main change was the introduction of a new database that improves the coverage and accuracy of journal information. Full details of the new and improved features can be found at:

This blog is about the effects the upgrade has on the RoMEO API.

New API Version 2.9

Firstly, we have released a new version of the API – version 2.9 – that uses the new RoMEO Journals database. The base URL is:

This version also has additional features and optional query arguments, of which the main ones are:

  • &la= – Language. This lets you return results in one of the supported languages – Spanish, Portuguese, and (for the API only) German
  • &versions=all – Returns data in an enhanced XML schema that has a separate section for the Publisher’s version/PDF. Previously, this information was included within the Post-print data.
  • &pdate= – Lets you search for publishers by RoMEO update date

Further information and documentation for Version 2.9 is available at:

REST-style URLs

Please note also that we now have REST-style URLs for specific human RoMEO pages, e.g.:

  • For a journal, using an ISSN or ESSN


Optionally add a language code

e.g. – for German output

  • For a publisher, using its persistent RoMEO ID


Again, optionally with a language code

Migrating Applications

The new API works exactly as before, so you may only need to change the base URL in your applications. However, we suggest that you may also wish to upgrade your applications to exploit the new API features.

Please note that V.2.9 has a usage cap of 500 requests per day per IP address. This should be sufficient for most repository applications, but could affect large application such as CRIS systems. With efficient practices, the cap should not be a problem even for these systems. For more advice on efficient use of the API, please see the poster we presented at OAI7:

Older API Versions

Version 2.4 of the API (and later prototype versions) have been converted to use the new Journals database, and will continue to be supported for the foreseeable future. The main differences you may notice are fewer failed journal queries, and fewer journals returning multiple publishers. However, you need to upgrade to V.2.9 if you wish to exploit the new features.

Earlier 2.x versions (V.2.1 to 2.3) still use the original RoMEO database tables, which will not longer be updated. You therefore must upgrade your applications if you are using these versions. In September 2011, we will start redirecting requests for these versions to V.2.9. This probably will not adversely affect applications, but there is a risk it might. We plan to delete these early versions completely by the end of November 2011.

Future Registration of API Users

We plan to introduce a registration system in the near future for regular and heavy API users. This will permit registered users to exceed the daily usage cap, which is mainly necessary for initial processing of large bibliographies or for research purposes. It will also ensure that registered users receive advanced notice of future developments, and timely notification of service issues.

Please contact us at if you have any queries or concerns about the upgrade or future planned changes, or are interested in registering.


Getting Excited About Getting Cited: No Need To Pay For OA

Gaulé, Patrick & Maystre, Nicolas (2011) Getting cited: Does open access help? Research Policy (in press)

G & M:Cross-sectional studies typically find positive correlations between free availability of scientific articles (?open access?) and citations? Using instrumental variables, we find no evidence for a causal effect of open access on citations. We provide theory and evidence suggesting that authors of higher quality papers are more likely to choose open access in hybrid journals which offer an open access option. Self-selection mechanisms may thus explain the discrepancy between the positive correlation found in Eysenbach (2006) and other cross-sectional studies and the absence of such correlation in the field experiment of Davis et al. (2008)? Our results may not apply to other forms of open access beyond journals that offer an open access option. Authors increasingly self-archive either on their website or through institutional repositories. Studying the effect of that type of open access is a potentially important topic for future research…

What the Gaulé & Maystre (G&M) (2011) article shows — convincingly, in my opinion — is that in the case of paid hybrid gold OA, most of the observed citation increase is better explained by the fact that the authors of articles that are more likely to be cited are also more likely to pay for hybrid gold OA. (The effect is even stronger when one takes into account the phase in the annual funding cycle when there is more money available to spend.)

But whether or not to pay money for the OA is definitely not a consideration in the case of Green OA (self-archiving), which costs the author nothing. (The exceedingly low infrastructure costs of hosting Green OA repositories per article are borne by the institution, not the author: like the incomparably higher journal subscription costs, likewise borne by the institution, they are invisible to the author.) 

I rather doubt that G & M’s economic model translates into the economics of doing a few extra author keystrokes — on top of the vast number of keystrokes already invested in keying in the article itself and in submitting and revising it for publication. 

It is likely, however — and we have been noting this from the very outset — that one of the multiple factors contributing to the OA citation advantage (alongside the article quality factor, the article accessibility factor, the early accessibility factor, the competitive [OA vs non-OA] factor and the download factor) is indeed an author self-selection factor that contributes to the OA citation advantage.

What G & M have shown, convincingly, is that in the special case of having to pay for OA in a hybrid Gold Journal (PNAS: a high-quality journal that makes all articles OA on its website 6 months after publication), the article quality and author self-selection factors alone (plus the availability of funds in the annual funding cycle) account for virtually all the significant variance in the OA citation advantage: Paying extra to provide hybrid Gold OA during those first 6 months does not buy authors significantly more citations.

G & M correctly acknowledge, however, that neither their data nor their economic model apply to Green OA self-archiving, which costs the author nothing and can be provided for any article, in any journal (most of which are not made OA on the publisher’s website 6 months after publication, as in the case of PNAS). Yet it is on Green OA self-archiving that most of the studies of the OA citation advantage (and the ones with the largest and most cross-disciplinary samples) are based.

I also think that  because both citation counts and the OA citation advantage are correlated with article quality there is a potential artifact in using estimates of article or author quality as indicators of author self-selection effects: Higher quality articles are cited more, and the size of their OA advantage is also greater. Hence what would need to be done in a test of the self-selection advantage for Green OA would be to estimate article/author quality [but not from their citation counts, of course!] for a large sample and then — comparing like with like — to show that among articles/authors estimated to be at the same quality level, there is no significant difference in citation counts between individual articles (published in the same journal and year) that are and are not self-archived by their authors.

No one has done such a study yet  — though we have weakly approximated it (Gargouri et al 2010) using journal impact-factor quartiles. In our approximation, there remains a significant OA advantage even when comparing OA (self-archived) and non-OA articles (same journal/year) within the same quality-quartile. There is still room for a self-selection effect between and within journals within a quartile, however (a journal’s impact factor is an average across its individual articles; PNAS, for example, is in the top quartile, but its individual articles still vary in their citation counts). So a more rigorous study would have to tighten up the quality equation much more closely). But my bet is that a significant OA advantage will be observed even when comparing like with like.

Stevan Harnad

Identifier and Metadata Standards for e-Commerce—Responding to Reality in 2011

This paper looks at the reality of implementation of e-commerce standards in the book and journal supply chains, and at where the barriers are to more widespread implementation. It compares this with the situation in other media, and looks at some of the challenges of convergence and divergence. Although the challenges identified are considerable, it finishes by discussing why there may be reasons for optimism about the future.

Why Create a Customization of a Standard? An ACS Case Study

The set of NLM DTDs have emerged as a de facto standard content interchange for STM publishers over the past several years. Recently, the ACS Publishing Division has utilized customized forms of these DTDs made public by the NLM to implement XML-based publishing processes for our chemistry-related journals, books, and magazine publications. In this paper, we look at the drivers behind our decisions of whether customizations should be made, and if so, how much customization is needed, to meet the needs of our publication processes. To frame the discussion of the various customizations, we also offer the concepts of a customization level, a customization implementation method, and a customization profile. At the end, we share some of the successes and lessons from our experiences.

Summit or Abyss

Andrew Pettegree’s The Book in the Renaissance and John B. Thompson’s Merchants of Culture: The Publishing Business in the Twenty-First Century share a number of striking similarities. Both are ambitious and accomplished works of scholarship, handsomely bound, competently designed and edited, a pleasure to hold and read. Hefty in intellectual vigor yet eloquent and accessible to an audience beyond a narrow field of research, they are what Thompson describes as “high-quality books with a scholarly content, often (but not always) written by scholars, [that] have the capacity to sell into a general trade market if they are developed and marketed properly” (page 182). They represent the apogee of the types of scholarly works prized by collectors in the early era of print, collectors such as Fernando Colón, the son of Christopher Columbus, whom Pettegree describes fondly, with praise for his remarkable catalogues and annotations about his book purchases, now nearly as invaluable as the works themselves. While Thompson’s previous book, Books in the Digital Age, focused on university presses and other academic publishers exclusively, in Merchants of Culture they are discussed at the margins of the field of trade publishing, the locus of Thompson’s lens in his present volume. In the early days of print, these works of scholarship were the trade.