We’re coming to the Hackathon

The JISC/SWAT4LS/OKF Hackathon starts on Tuesday in London. http://www.swat4ls.org/workshops/london2011/ and see Jenny Molloy’s blog post: http://science.okfn.org/2011/11/09/open-research-reports-trailer/

Here are some of the stars:

On the left is KosherFrog (alias @gfry, Alias Gilles Frydman). (We’ll create some better glasses for him). Here’s his twitter avatar:

In the middle is a patient – or rather the mother of a patient. She is carrying Roo’s salbutamol inhaler. She needs access to medical information.

And on the right is McDawg. He’s Graham Steel. Here’s HIS twitter avatar.

We’ll be creating semantic resources for disease.

Be there.










Cambridge Crystallographic Data Centre disputes non-re-usability of primary data (Am. Chem. Soc charges > 100 USD to view this discussion)

I have been alerted to a discussion in the letter pages of J. Chem. Inf. Modeling (an ACS Journal). I normally read the literature through a paywall window (my home machine has no privileges and so I get a “citizen-enhanced” view of the primary literature. The enhancement is of course massively negative – I can’t read most of this. For most things if I can’t read them they don’t exist – an increasingly common approach. Occasionally I switch on access to the University VPN which allows me to read the fulltext – thereby requiring the University to continue its subscription (in dollars) to this journal. Unless they use the paywall filter academics in rich universities (which is the only real market for scholarly journals) have no idea how impoverished the world is. But many of my readers will appreciate – they are the Scholarly Poor. And what follows can be understood by anyone – you don’t have to be a chemist. Note that many research institutions do not subscribe to JCIM so I expect most readers will have a “scholarly poor lens” on what follows.

  • Earlier this year a paper was published http://pubs.acs.org/doi/abs/10.1021/ci100223t

    Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress

    Alessio Andronico, Arlo Randall, Ryan W. Benz, and Pierre Baldi*

    School of Information and Computer Sciences, Institute for Genomics and Bioinformatics and Department of Biological Chemistry, University of California, Irvine, Irvine, California 92697-3435, United States

    J. Chem. Inf. Model., 2011, 51 (4), pp 760–776 DOI: 10.1021/ci100223t Publication Date (Web): March 18, 2011 Copyright © 2011 American Chemical Society

I can’t reproduce the abstract because although it was written by the authors they have signed over its ownership/copyright to ACS. (ACS in their generosity allow you to read this at the end of the link above). Note that the system is mounted at http://cosmos.igb.uci.edu/ . It contains the rubric:

Note: In as much as this Service uses data from the CSD [Cambridge Structural Database] , it has been given express permission from the CCDC [Cambridge Crystallographic Data Centre] . At the request of the CCDC, no more than 100 molecules can be uploaded to the Service at a time, and the Service ought to be used for scientific purposes only, and not for commercial benefit or gain.

Well – that was a pretty challenging paper, wasn’t it? (Sorry scholarly poor, I can’t tell you what it said – but trust me – or pay 35 USD).

This elicited a response from the director of the (CCDC). If you read the abstract you will see their involvement. (BTW I have no relation to them except geographical proximity and the University has declared that they don’t belong to the University (for FOI) although they are listed as a department). Here is his 1-page response:

  • http://pubs.acs.org/doi/pdfplus/10.1021/ci2002523 Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress. A Response from The Cambridge Crystallographic Data Centre,

    Colin R Groom* The Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K.

He clearly disagrees with their contention. (Scholarly Poor you will have to fork out another 35 USD to read this single page). [2]

And the original authors responded

  • (http://pubs.acs.org/doi/abs/10.1021/ci200460z ) Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress. A Response to the Letter by the Cambridge Crystallographic Data Center

    Pierre Baldi

    J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/ci200460z • Publication Date (Web): 22 Nov 2011

Wow! Some strong disagreement on matters of fact. (Stop whining Scholarly Poor and pay another 35 USD to read this letter – it’s nearly 2 pages!). I’ll reveal that it contains phrases like “simply false”. And you can read the abstract which contains the phrase “significant impediments to scientific research posed by the CCDC.”

So that is a pretty damning indictment. Of the CCDC? Maybe, if you can read the letters. But certainly of the ACS. An important discussion about the freedom of re-use of the scholarly literature is hidden behind a paywall. The letters have been written by scientists and presumably reproduced verbatim by the ACS. What possible justification is there for requiring the charge of 35 USD? There is no peer review involved. But then the ACS charges 35 USD for everything, including an 8-WORD retraction notice. (It’s sort of easier just to charge vast amounts of money than think what you are doing to science).

So I am in a dilemma. How to I bring this discussion to public view. Because that is what a Scholarly Society SHOULD wish. I can’t expect everyone to pay 105 USD. (The part of the first paper that is involved is only two sentences). I have the following options:

  • Do nothing – this will perpetuate the injustices
  • Write summaries of the letters (absurd because it will distort the meaning)
  • Extract paragraphs and publish them under fair use. (There is no doctrine of fair use in the UK and I could be sued for any phrase extracted – I have already laid myself open to this with the phrase “simply false”
  • Urge the authors of the letters to publish them Openly. In doing so they will break the conditions of publication and lay themselves open to legal action or having subscriptions to JCIM cut off
  • Write to the editor of the Journal suggesting it would be in the public interest to publish the letters? In general editors don’t reply – but I know this one. But in any caseI dounbt they would do it and it makes the situation worse
  • Or follow a reader’s suggestion I haven’t thought of

Because I am now going to continue to challenge the CCDC. I have been turned down on FOI ground with a technicality (that the CCDC although listed as a department of the University isn’t part of it for FOI). BTW it took the University FOI 19.8 days to work that out.

If you read the last paper (shut up and pay!) you will see that the authors quote our work on Crystaleye and suggest that it, together with the Crystallography Open Data Base (COD) could and now should replace the CCDC. They say (I have removed all the letter “O”s [1] to avoid direct quoting) 35 USD will tell you where the O’s are meant to be.

As histry shws, thse wh stand in the way f demcracy and scientific prgress end up lsing ver the lng-run. The reactinary attitude f the CCDC staff has started t backfire by energizing academic labratries arund the wrld t find alternative slutins arund the CCDC.

I agree with the sentiments expressed. The only problem is that the authors chose to do it behind a paywall.

I shall continue my campaign to liberate “our” data from the CCDC+Wiley/Elsevier/Springer monopoly. Sancho Panza (http://en.wikipedia.org/wiki/Sancho_Panza ) is welcome to join me.

[1] http://en.wikipedia.org/wiki/The_Wonderful_O James Thurber.

[2] UPDATE: I managed to get it for free but maybe I have a cached copy?

UPDATE: It now seems that most people can get the first letter (“Editorial”) for free but I still have to pay for the UCI response

Scientists should NEVER use CC-NC. This explains why.

There is a really important article at http://www.pensoft.net/journals/zookeys/article/2189/creative-commons-licenses-and-the-non-commercial-condition-implications-for-the-re-use-of-biodiversity-information. (Hagedorn G et al)

[NOTE the OKF has a clear indication of the problems of CC-NC. They should add a link to Hagedorn. See my earlier blog post http://blogs.ch.cam.ac.uk/pmr/2010/12/17/why-i-and-you-should-avoid-nc-licences/ ].

So, you aren’t interested in Biodiversity Journals? Never read Zookeys? (I didn’t know it existed). But in 1 day about 1200 people have accessed this article. Yet another proof that WHAT you publishe matters, not WHERE. And hopefully this blog will send a few more that way.

I can’t summarise all of it. The authors give a very detailed and, I assume, competent analysis of Copyright applied to scientific content (data, articles, software) and its licensability under Creative Commons. Note that “This work is published Under a Creative Commons Licence” – which so many people glibly use is almost useless. It really means “This work is copyrighted [unless it’s CC0] and to find out whether you have any rights you will have to look at the licence”. So please, always, specific WHAT CC licence you use.

The one you choose matters, because it applies the rule of LAW to your documents. If someone does something with them that is incompatible with the licence they have broken copyright law. For example combining a CC-NC-SA licence with CC-BY-SA licence is impossible without breaking the law.

There are so many misconceptions about NC. Many people think it’s about showing that you want people to share your motivation. Motivation is irrelevant. The only thing that matters is whether the court assessing the use by the licensor breaks the formal non-commercial licence. There’s little case law, but the Hagedorn paper argues that being a non-profit doesn’t mean non-commercial. Recovering costs can be seen as commercial. And so on.

We came across this when we wished to distribute a corpus of 42 papers using in training OSCAR3. The corpus was made available by the Royal Society of Chemistry. It was used (with contributions from elsewhere) to tune the performance of OSCAR3 to chemistry journals. Because training with a corpus is a key part of computational linguistics we wished to distribute the corpus (it’s probably less than 0.1% of the RSC’s published material – it would hardly affect their sales). After several years they agreed, on the basis that the corpus would be licenced as CC-NC. I pointed out very clearly that CC-NC would mean we couldn’t redistribute the corpus as a training resource (and that this was essential since others would wish to recalibrate OSCAR). Yes, they understood the implications. No they wouldn’t change. They realised the problems it would cause downstream. So we cannot redistribute the corpus with OSCAR3. The science of textmining suffers again.

Why? If I understood correctly (and they can correct me if I have got it wrong) it was to prevent their competitors using the corpus. (The competitors includes other learned societies. )

I thought that learned societies existed to promote their discipline. To work to increase quality. To help generate communal resources for the better understanding and practice of the science. And chemistry really badly needs communal resources – it’s fifteen years behind bioscience because of its restrictive practices. But I’m wrong. Competition against other learned societies is more important than promoting the quality of science.

Meanwhile Creative Commons is rethinking NC. They realise that it causes major problems. There are several plans (see Hagedorn paper):

Creative Commons is aware of the problems with NC licenses. Within the context of the upcoming version 4.0 of Creative Commons licenses (Peters 2011), it considers various options of reform (Linksvayer 2011b; Dobusch 2011):

• hiding the NC option from the license chooser in the future, thus formally retiring the NC condition

• dropping the BY-NC-SA and BY-NC-ND variant, leaving BY-NC the only non-commercial option

• rebranding NC licenses as something other than CC; perhaps moving to a “non-creativecommons.org” domain as a bold statement

• clarifying the definition of NC

I’d support some of these (in combination) but not the last. Because while it is still available many people will use it on the basis that it’s the honourable thing to do (I made this mistake on this blog). And others will use it deliberately to stop the full dissemination of content.

Harvard Open Access Policy Benchmark Needed

It is important to calculate what percentage of the total annual refereed journal article output of Harvard (participating Faculties) is represented by the c. 6457 deposits to date in Harvard’s DASH Repository since adoption of Harvard’s OA Policy?

That is the objective measure of the success of an OA policy, and hence of whether it provides a model ready for other universities to emulate — or whether it still needs some tweaks (e.g., to make it more like the U. Liege ID/OA policy, which (1) requires immediate deposit with no waiver, (2) only requests (but does not require) that the deposit be made immediately OA, (3) designates repository deposit as the sole means of submitting journal articles for research performance review, and has generated 67,631 deposits to date).

The global baseline rate of making articles OA (without any OA policy) is about 20% (varying by discipline). The target is of course 100%. And about 60% is a benchmark, because that is the percentage of journals that already endorse immediate OA deposit (hence do not require Harvard-style rights retention in order to make deposits OA immediately).

It is extremely important to get a clear idea of exactly how well Harvard’s policy is doing after nearly 4 years: If the deposit rate is near 100%, it is doing as well as or better than all other kinds of OA mandates. If it is close to 60%, that’s still good, but it’s not clear whether its rights-retention clause is the cause, or its deposit clause.

And if it’s closer to 20%, then Harvard’s deposit clause is not working and needs upgrading to ID/OA.

This is all the more important since it is the Harvard model that other universities are likely to follow, come what may.

Stevan Harnad
EnablingOpenScholarship (EOS)

Holiday Service Update

With the end-of-year holiday season upon us, we wanted to let our authors know in advance that they may experience a slight delay in the peer review process of their manuscript if they submit anytime between now and the end of the year. This is because many of our academic editors and external referees will be out of the office at some point during the holiday season. We will endeavor to ensure that all manuscripts submitted to PLoS ONE are evaluated as quickly as possible, but please accept our advance apologies for any delays you experience.

Despite many people being on vacation, the work of the journal continues and so we will continue to receive a large number of emails from authors, academic editors, reviewers and readers throughout this period. Between our offices in the UK and the US, we will have some level of staff coverage every day except for Christmas Day (December 25), but with some team members being out of the office, we may not be able to respond to emails sent to the PLoS ONE inbox (plosone@plos.org) as quickly as usual. We will respond to your message as soon as we can, but in the meantime, you may wish to visit some of the following pages on our websites, which may help to answer your question:

Call to action: 2011 White House RFI on public access (deadline Jan. 2)

The opportunity

As part of the process of fulfilling Section 103 of the 2010 America COMPETES Act, the White House Office of Science and Technology Policy (OSTP) has issued a Request for Information (RFI), asking individuals and organizations to provide recommendations on approaches for broad public access and long-term stewardship to peer-reviewed scholarly publications that result from federally funded scientific research. The RFI poses eight multi-part questions, which can be found at the link below. 

The Right to Research Coalition strongly encourages student organizations, student governments, and individual students to submit responses supporting public access – your comments will be crucial in both showing the need for public access and ensuring the policy is maximally beneficial for students. This is a real opportunity to greatly expand students’ access to academic research, so please take a few minutes to submit a comment. Each response will be important in demonstrating students’ need for access to federally funded research.

The full text of the RFI may be found at: http://www.gpo.gov/fdsys/pkg/FR-2011-11-04/html/2011-28623.htm  

Who should respond?

It is urgent that as many individuals and organizations as possible respond. We strongly encourage you to write in both individually and on behalf of any student organizations that you are a member of. You’re also encouraged to share this call to action with any friends, colleagues, professors, or others in your network who would be willing to submit a carefully thought-out response.

For reference, the RFI specifically calls for comments from “non-Federal stakeholders, including the public, universities, nonprofit and for-profit publishers, libraries, federally funded and non-federally funded research scientists, and other organizations and institutions with a stake in long-term preservation and access to the results of federally funded research.”

If you can’t answer all of the questions, answer as many as possible – and respond to questions as directly as possible.  Responses that reference the questions directly will have more impact than those that are supportive of public access more generally.

How the results will be used

The input provided through this RFI will inform the National Science and Technology Council’s Task Force on Public Access to Scholarly Publications, convened by OSTP.

OSTP will issue a report to Congress describing: 

1. Priorities for the development of agency policies for ensuring broad public access to the results of federally funded, unclassified research;
2. The status of agency policies for public access to publications resulting from federally funded research; 
3. Public input collected.

Taxpayers paid for the research.
We deserve to be able to access the results.

The main point to emphasize is that taxpayers are entitled to access the results of the research our tax dollars fund, especially given how crucial this research is for a complete, up-to-date education. Taxpayers should be allowed to immediately access and fully reuse the results of publicly funded research. 

To discuss talking points in further detail, don’t hesitate to contact us. 

How to respond

The deadline for submissions is January 2, 2012. Submissions should be sent via email to publicaccess@ostp.gov. Please note: OSTP will publicly post all submissions after the deadline (along with names of submitters and their institutions) so please make sure not to include any confidential or proprietary information in your submission. Attachments may be included. 

As ever, thanks for your commitment to public access and the advancement of these crucial policies.

If you have any questions or comments, don’t hesitate to contact:

Nick Shockey
Director, Right to Research Coalition
nick [at] arl [dot] org

What is the basis of the NaCTeM-Elsevier agreement? FOI should give the answer

In the previous posts (http://blogs.ch.cam.ac.uk/pmr/2011/11/25/textmining-nactem-and-elsevier-team-up-i-am-worried/ and http://blogs.ch.cam.ac.uk/pmr/2011/11/27/textmining-my-years-negotiating-with-elsevier/ ) I highlight concerns (not just mine) about the publicly announced collaboration between NaCTeM (The National Centre for Textmining at the University of Manchester) and Elsevier (henceforth N+E). I am now going to find out precisely the details of this collaboration and, when I have the answers, will be in a position to answer the following questions:

  • What is NaCTeM’s mission for the nation? (NaCTeM formally has a responsibility to the Nation)
  • What public finance has NaCTeM had and what is planned in the future?
  • What public money has gone into the N+E?
  • What are the planned the benefits to Elsevier?
  • What are the planned benefits of N+E to NaCTeM?
  • Are there plans to pass any of these benefits to the wider national community

In particular my concerns are:

  • Will the benefits of this work be available only through Elsevier’s Sciverse platform?
  • Are we getting value for money?

It may seem strange – and potentially confrontational – to use FOI to get this information rather than simply asking the University or NaCTeM. But the power of FOI is that the University has specialist staff to give clear unemotional answers. And in particular it will highlight precisely whether there are hidden confidential aspects. If so it will be especially important to assess whether this is in the Nation’s interest. And, with the possibility that this will reveal material that is useful to the Hargreaves process and UK government (through my MP) it is important that my facts are correct.

For those who aren’t familiar with the FOI process each public institution has a nominated office/r who must, within 20 working days, give answers to all questions (or show why s/he should not). I shall use http://whatdotheyknow.com – a superb site set up for this purpose which means that everyone can follow the process and read the answers. FOI officers are required to respond promptly, and I hope that Manchester will do so – and be quicker than Oxbridge who ritually take 19.8 days to respond. Note that I am not expected to give my motivation. I shall request the information in existing documents or known facts – this is not a place for future hypotheticals or good intentions.


Dear FOI University of Manchester,

I am requesting information under FOI about the National Centre for Text Mining (NaCTeM) and the University’s recently announced collaboration of NaCTeM with Elsevier (http://www.manchester.ac.uk/aboutus/news/display/?id=7627 ). The information should be supported by existing documents (minutes, policy statements, etc.). I shall be concerned about the availability of resource material to the UK in general (i.e. beyond papers and articles). I use the word “Open” (capitalised) to mean information or services which are available for free use, re-use and redistribution without further permission (see http:// http://opendefinition.org/ ). In general this means OSI-compliant Open Source for code and CC-BY or CC0 for content (CC-NC and “for academics only” are not Open).


  • What is the current mission statement of NaCTeM?
  • Does NaCTeM have governing or advisory bodies or processes? If so please list membership, dates of previous meetings and provide minutes and support papers.
  • List the current public funding (amounts and funders) for NaCTeM over the last three years and the expected public funding in the foreseeable future.
  • What current products, content and services are provided to the UK community (academic and non-academic) other than to NaCTeM?
  • What proportion of papers published by NaCTeM are fully Open?
  • What proportion and amount of software, content (such as corpora) and services provided by NaCTeM is fully Open?

Elsevier collaboration

  • Has the contract with Elsevier been formally discussed with (a) funders (b) bodies of the University of Manchester (e.g. senates, councils)? Please provide documentation.
  • Is there an advisory board for the collaboration?
  • Has third party outside NaCTeM formally discussed the advantages and disadvantages of the Elsevier collaboration.
  • Please provide a copy of the contract between the University and Elsevier. Please also include relevant planning documents, MoIs, etc.
  • Please highlight the duration, the financial resource provided by (a) the University (b) Elsevier. Please indicate what percentage of Full Economic Costs (FEC) will be be recovered from Elsevier. (I shall assume that a figure of less that 100% indicates that the University is “subsidising Elsevier” and one greater than 100% means the University gains.
  • Please indicate what contributions in kind (software, content, services, etc.) are made by either party and what they are valued at.
  • Please outline the expected deliverables. Please indicate whether any of the deliverables are made exclusively available to either or both parties and over what planned timescale.
  • Are any of the deliverables Open?
  • What is the IP for the deliverables in the collaboration?
  • Are any of the deliverables planned to be resold as software, services or content beyond the parties?
  • Has NaCTeM or the University or any involved third party raised the concern that contributing to Sciverse may be detrimental to the UK community?
  • Please indicate clearly what the planned benefit of the collaboration is to the UK.


I shall post this tomorrow so please comment now if you wish to.