Open Access and the Prisoner’s Dilemma

Unilateral Gold OA instead of Green
is the losing choice
in a non-forced-choice Prisoner’s Dilemma
(think about it!)

UniGreen (World): UniGold (World):
UniGreen (UK): win/win win/lose
UniGold (UK): lose/win win/win

Houghton & Swan 2012:

“If OA were adopted worldwide, the net benefits of Gold OA would exceed those of Green OA.

However, we are not in an OA world…

“At the institutional level, during a transitional period when subscriptions are maintained, the cost of unilaterally adopting Green OA is much lower than the cost of Gold OA ? with Green OA self-archiving costing average institutions sampled around one-fifth the amount that Gold OA might cost, and as little as one-tenth as much for the most research intensive university.

“Hence, we conclude that the most affordable and cost-effective means of moving towards OA is through Green OA, which can be adopted unilaterally at the funder, institutional, sectoral and national levels at relatively little cost.” [emphasis added]

Houghton, John W. & Swan, Alma (2012) Planting the green seeds for a golden harvest: Comments and clarifications on ?Going for Gold?

Gross Domestic Clean Water

 Alternative text: why not measure Gross Domestic Clean Water because this is more essential than Gross Domestic Product. If you’re not convinced, try going a couple of days without clean water, in any form. This word picture is dedicated to the public domain.

This is an alternative metric.

Altmetrics – thoughts about the purpose

Should altmetrics take a step back and reconsider what the main purpose / research question is? I should suggest that what we need is an alternative to the current power of the impact factor in assessing the work of scholars. This may or may not involve metrics of any kind. My suggestion for starters is that we need a system that is not as reliant on metrics of any kind.

Having said that, some metrics studies that might actually be useful:
–  does an emphasis on quantity of publication increase duplication of content and/or reduce quality? With respect to the latter, this is what I have heard from senior experts in scholarly publishing and I think both Brown and Harley touch on this in their reports – at least with respect to books, pushing scholars to publish two books rather than one to get tenure means pressure to publish in less time than it takes to write a good book. So pushing for quantity seems likely to correlate with reduced quality (a hypothesis worth testing?)

One advantage to studying the disadvantages of pushing for quantity is that if the hypothesis (quantity correlates negatively with quality) is correct, then that is evidence that can reduce the workload of scholars – something I expect that scholars are likely to support

Other possibilities:
– scholars might want to know about journals:
– average and range of time from submission to decision
– level of “peer” doing the peer review (grad student? senior professor?)
– extent and quality of contents (this has to be qualitative analysis; sampling makes sense)

Shifting from a print-based scholarly communication system to an open access knowledge commons, while retaining or increasing quality and reducing costs, is possible – but it’s not easy. It is worth taking the time to think things through and get at least some stuff right.

CC-BY reflects a small subset of open access. Claims of "emerging consensus" on CC-BY are premature

The Open Access Scholarly Publishers’ Association’s “Why CC-BY page” refers to an “emerging consensus on the adoption of CC-BY”. My comment:

Re: CC-BY – emerging consensus. OASPA refers to an “emerging consensus” that CC-BY is the best license for open access. I argue that the evidence suggests that CC-BY is a peripheral phenomenon and very far from consensus.

From Peter Suber’s SPARC Open Access Newsletter, June 2012 – in brief only 11% of the journals listed in DOAJ use CC-BY, and outside of full gold OA publishing as illustrated by the journals in DOAJ, the proportion of OA that is CC-BY is lower still.

“Libre OA through repositories has been rare because most repositories are not in a position to demand it or even to authorize it. Hence, you might think that libre OA through journals would be common because all journals are in a position to do both. But unfortunately that would be wrong. The power of journals to demand and authorize libre OA means that libre gold could be common, and should be common. But scandalously, it doesn’t mean that libre gold is already common…Only 917 journals in the DOAJ have the SPARC Europe Seal of Approval, which requires CC-BY. That’s only 11.8% of the full set”.

Suber, Peter. SPARC Open Access Newsletter, June 2012

Why open access does not need CC-BY: the Human Genome Project example

The Open Access Scholarly Publishers’ Association explanation of Why CC-BY presents the Human Genome Project as an illustration of why CC-BY is needed for open access. Following is my comment (not yet appearing on the OASPA site, no doubt awaiting moderation).

It is interesting that OASPA’s explanation of “why CC-BY” points to the Human Genome Project as an example of why CC-BY is needed. The HGP ran for 13 years, ending in 2003. Creative Commons is looking forward to its 10th birthday in December. In other words, HGP was completed shortly after CC began. This means that HGP is an awesome example of how science can advance rapidly and in the spirit of libre open access, without any need for Creative Commons licensing at all. [emphasis added]

For HGP details see:

Delightful irony: students for free culture adamantly opposed to license used for Lessig’s Free Culture

Larry Lessig’s book, Free Culture, was the inspiration for the Free Culture movement, released as a paperback as well as a free online book, using the license CC-BY-NC 1.0 (

How ironic that Students for Free Culture consider the Noncommercial license to be “proprietary” and incompatible with free culture?

I wonder how many of them read the free online version – noncommercial license and all? 

Thanks very much to Creative Commons for keeping up the fight for free culture; note that after fulsome discussion, CC has elected to retain the noncommercial license in version 4.0, with no change in definition.

Some Quaint Elsevier Tergiversation on Rights Retention

Preamble: If you wish to sample some of the most absurd, incoherent, pseudo-legal gibberish on the subject of “rights” retention, “systematicity” and free will, please have a look at what follows under “Elsevier Article Posting Policies” below. (And bear in mind that an institution only provides a tiny fraction of any journal’s content.)

Any author foolish enough to be intimidated by this kind of garbled double-talk deserves everything that’s coming to him.

My Advice to Authors: Ignore this embarrassing, self-contradictory nonsense completely and exercise your retained “right” to post your final refereed draft (“AAM”) in your institutional repository immediately upon acceptance, whether or not it is mandatory, secure in the knowledge that from a logical contradiction anything and everything (and its opposite) follows! (And be prepared to declare, with hand on heart, that as an adult, every right you exercise with your striate musculature is exercised “voluntarily.”)

[By the way, as long as Elsevier states that its authors retain the right to post “voluntarily”, Elsevier, too, remains on the Side of the Angels insofar as immediate, unembargoed Green OA self-archiving is concerned. It’s just that the Angels are a bit glossolalic…]

Elsevier Article Posting Policies

Accepted author manuscripts (AAMs)

Definition: An accepted author manuscript (AAM) is the author’s version of the manuscript of an article that has been accepted for publication and which may include any author-incorporated changes suggested through the processes of submission processing, peer review, and editor-author communications. AAMs do not include other publisher value-added contributions such as copy-editing, formatting, technical enhancements and (if relevant) pagination.

Policy: Authors retain the right to use the accepted author manuscript for personal use, internal institutional use and for permitted scholarly posting provided that these are not for purposes of commercial use or systematic distribution…

Permitted scholarly posting: Voluntary [emphasis added] posting by an author on open websites operated by the author or the author’s institution for scholarly purposes, as determined by the author, or (in connection with preprints) on preprint servers…

…Elsevier believes that individual authors should be able to distribute their AAMs for their personal voluntary [emphasis added] needs and interests, e.g. posting to their websites or their institution’s repository, e-mailing to colleagues. However, our policies differ regarding the systematic aggregation or distribution of AAMs to ensure the sustainability of the journals to which AAMs are submitted [emphasis added]. Therefore, deposit in, or posting to, subject-oriented or centralized repositories (such as PubMed Central), or institutional repositories with systematic posting mandates [emphasis added] is permitted only under specific agreements between Elsevier and the repository, agency or institution, and only consistent with the publisher’s policies concerning such repositories. Voluntary [emphasis added] posting of AAMs in the arXiv subject repository is permitted.

Systematic distribution means: policies or other mechanisms designed to aggregate and openly disseminate, or to substitute for journal-provided services, including:

— The systematic distribution to others via e-mail lists or list servers (to parties other than known colleagues), whether for a fee or for free…

— Institutional, funding body or government manuscript posting policies or mandates that aim to aggregate and openly distribute the work by its researchers or funded researchers…

#ami2 #opencontentmining: AMI reports progress on #pdf2svg and #svgplus: the “standard” of STM publishing

AMI has been making steady progress on two parts of AMI2:

  • PDF2SVG. A converter of PDF to SVG, eliminating all PDF-specific information. This has gone smoothly –AMI does not understand “good” so “steady” means a monotonically increasing number of non-failing JUnit tests. AMI has also distributed the code, first on Bitbucket at:

    and then on the Jenkins continuous integration tool at PMR group machine in Cambridge: – see

    [Note: Hudson was open Source but it became closed so the community forked it and Jenkins is the new Open branch]. Jenkins is very demanding. AMI starts by developing tests on Eclipse, then runs these on maven, and then on Jenkins. Things that work on Eclipse often fail on maven, and things that work on maven can fail on Jenkins.

    AMI has also created an Issue Tracker: Here humans write issues which matter to them – bugs, ideas, etc. PMR tells AMI what the issues are and translates them into AMI-tasks, often called TODO. PMR tells AMI he is pleased that there is feedback from outside the immediate group.

  • SVGPlus. This takes the raw output of PDF2SVG and turns into into domain-agnostic semantic content. Most of this has already been done so it is a questions of refactoring. AMI requires JUnit tests to drive the development. SVGPlus has undergone a lot of refactoring (AMI notes changes of package structure, deletion of large chunks and addition of smaller bits. The number of tests increases so AMI regards that as “steady progress”.

AMI now has a lot of experience with PDFs from STM publishers and elsewhere. AMI works fastest when there is a clear specification against which she can write tests. AMI works much slower when there are no standards. PMR has to tell her how to guess (“heuristics”). Here’s their conversation over the last few weeks.

AMI: Please write me some tests for PDF2SVG.

PMR: I can’t.

AMI: Please find the standard for PDF documents and create documents that conform.

PMR. I could do that but it’s no use. Hardly any of the STM publishers conform to any PDF standards.

AMI. If the deviations from the standard are small we can add some small exceptions.

PMR. The deviation from the standard is enormous.

AMI. If you read some of the documents we can create a de facto standard and code against that. It will be several times slower.

PMR. That won’t be useful. Every publisher does things differently.

AMI. How many publishers are there?

PMR. Perhaps 100.

AMI. Then it will take 100 times longer to write PDF2SVG. Please supply me with the documentation for each of the publishers’ PDFs.

PMR. There is no documentation for any of them.

AMI. Then there is no systematic quality way that I can write code.

PMR. Agreed. Any conversion is likely to have errors.

AMI. We may be able to tabulate the error frequency.

PMR. We don’t know what the correct output is.

AMI. Then we cannot estimate errors properly.

PMR. Agreed. Maybe we can get help from crowdsourcing.

AMI. I do not understand.

PMR. More people, creating more exams and tests.

AMI. I understand.

PMR. I will have to make it easy for them.

AMI. In which can we may be able to work faster. We may also be able to output partial solutions. Can we identify how the STM publishers deviate from the standard?

PMR. Let’s try.

AMI. Wikipedia has . Is that what we want?

PMR. Yes

AMI. Is the standard Open?

PMR. Yes, it’s ISO 32000-1:2008.

AMI. [reads}

ISO 32000-1:2008 specifies a digital form for representing electronic documents to enable users to exchange and view electronic documents independent of the environment they were created in or the environment they are viewed or printed in. It is intended for the developer of software that creates PDF files (conforming writers), software that reads existing PDF files and interprets their contents for display and interaction (conforming readers) and PDF products that read and/or write PDF files for a variety of other purposes (conforming products).

AMI. Does it make it clear how to conform?

PMR. Yes. It’s well written.

AMI. Is it free to download?

PMR. Yes (Adobe provide a copy on their website)

AMI. Are there any legal restrictions to implementing it? [AMI understands that some things can’t be done for legal reasons like patents and copyright.]

PMR. Not that we need to worry about.

AMI. Do the publishers have enough money to read it? [AMI knows that money may matter.]

PMR. It is free.

AMI. So we can assume the publishers and their typesetters have read it? And tried to implement it.

PMR. We can assume nothing. Publishers don’t communicate anything.

AMI. I will follow the overview in Wikipedia:

File structure

A PDF file consists primarily of objects, of which there are eight types:[32]

  • Boolean values, representing true or false
  • Numbers
  • Strings
  • Names
  • Arrays, ordered collections of objects
  • Dictionaries, collections of objects indexed by Names
  • Streams, usually containing large amounts of data
  • The null object

Do the PDFs conform to that?

PMR: They seem to since PDFBox generally reads them

AMI. Fonts are important:

Standard Type 1 Fonts (Standard 14 Fonts)

Fourteen typefaces—known as the standard 14 fonts—have a special significance in PDF documents:

These fonts are sometimes called the base fourteen fonts.[34] These fonts, or suitable substitute fonts with the same metrics, must always be available in all PDF readers and so need not be embedded in a PDF.[35] PDF viewers must know about the metrics of these fonts. Other fonts may be substituted if they are not embedded in a PDF.

AMI: If a PDF uses the 14 base fonts, then any PDF software must understand them, OK?

PMR. Yes. But the STM publishers don’t use the 14 base fonts.

AMI. What fonts do they use?

PMR. There are zillions. We don’t know anything about most of them.

AMI. Then how do I read them? Do they use Type1 Fonts?

PMR. Sometimes yes and sometimes no.

AMI. A Type1Font must have a FontDescriptor. The FontDescriptor will tell us the FontFamily, whether the font is italic, bold, symbol etc. That will solve many problems.

PMR. Many publishers don’t use FontDescriptors.

AMI. Then they are not adhering to standard PDF.

PMR. Yes.

AMI. Then I can’t help.

PMR. Maybe we can guess. Sometimes the FontName can be interpreted. For example “Helvetica-Bold” is a bold Helvetica font.

AMI. Is there a naming convention for Fonts? Can we write a regular expression?

PMR. No. Publishers do not use systematic names.

AMI. I have just found some publishers use some fonts without FontNames. I can’t understand them.

PMR. Nor can anyone.

AMI. So the PDF renderer has to draw the glyph as there is no other information.

PMR. That’s right.

AMI. Is there a table of glyphs in these fonts.

PMR. No. We have to guess.

AMI. It will take me about 100 times longer to develop and write a correct PDF2SVG for all the publishers.

PMR. No, you can never do it because you cannot predict what new non-standard features will be added.

AMI. I will do what you tell me.

PMR. We will guess that most fonts use a Unicode character set. We’ll guess that there are a number of non-standard, non-documented character sets for the others – perhaps 50. We’ll fill them in as we read documents.

AMI. I cannot guarantee the results.

PMR. You have already done a useful job. We have had some positive comments from the community.

AMI. I don’t understand words like “cool” and “great job”.

PMR. They mean “steady progress”.

AMI. OK. Now I am moving to SVGPlus.

PMR. We’ll have a new blog post for that.