ChemistryOpen – Issue 1/2013 is now available online!

ChemistryOpen 2013ChemistryOpen’s first issue in 2013  again reflects the great diversity of this open-access general chemistry journal. Thematically the contributions range from betulin-based polyurethanes for CO2 adsorption to Cu2O-decorated TiO2 nanotubes. In addition to Full Papers, Communications and a Thesis Summary, this issue presents an Editorial by the Co-Editors-in-Chief and a newly featured Cover Profile.

In their Editorial entitled Show me the Money – How, as a Chemist, Can I Find Funding for Open-Access Publishing? the Editors of ChemistryOpen may answer some questions towards open-access funding and provide authors with ideas for potential funding sources. Fuelled by new policies from funders, more and more chemists are looking to publish their research results in an open-access forum. Of course, high-quality publishing costs money, and as a consequence, the so called “gold road” or “author-pays” model has emerged where an Article Publication Charge is payable by the author. So, the question now arises on how authors can meet the associated cost.

ChemistyOpen’s cover receives a new look in 2013 and features the Full Paper by Knut Rurack and co-workers (BAM Federal Institute for Materials Research and Testing, Berlin, Germany) on fluorinated BODIPY dyes for dual-method surface analysis. The associated Cover Profile lets readers around the world take a closer look at his group in Berlin and offers a glimpse at the motivation behind their work: “When one tries to assess the functionalization degree of a support quantitatively, that is, the concentration of chemical groups across the entire support, one realizes that there are no reliable methods available today.

All articles published in ChemistryOpen are open-access and free to all readers. Click here to access the current issue now!

Heart Health Awareness Month

Before this month comes to a close, let us not forget to honor February as American Heart Month.

According to the CDC, heart disease, also known as coronary artery disease or cardiovascular disease,  claims 600,000 lives in the U.S. each year. Heart disease refers to the plaque buildup in the walls of the arteries, resulting in a heart attack or stroke. Other heart conditions include arrhythmia, congenital defects, heart failure and hypertension (high blood pressure).

Researchers continue to study the best ways to properly care for and treat the beating organ within us. Last September, we discussed cardiovascular heath among women, and highlighted related articles published. Today, in honor of American Heart Month, we bring you recently published research that increases awareness and insight to heart health.

Did you ever feel there was a connection between your heart beat and self-image? PLOS ONE authors have attempted to answer this question by investigating the relationship between self-objectification and the beating heart in a recent article. Using a heartbeat perception task and questionnaire, researchers found that women who were able to hear their own heart beat were less likely to objectify themselves, proving yet another link between heart health and overall wellbeing.

In another recently published study, researchers explored the connection between white blood cell count and heart disease risk in young adults. The authors tested the white blood cell counts for over 29,000 healthy young men over an average of seven and a half years and also screened the participants for signs of coronary artery disease. Their investigation found that a higher white blood cell count correlated with coronary artery disease risk in young men. They concluded that white blood cell count may help in identifying young men with low or high risk for heart disease progression.

In a third article published by PLOS ONE, researchers from the University of Granada investigated heart rate variability and cognitive performance. Participants were divided into a high-fit group and a low-fit group, and the authors measured the effects of three cognitive tasks on the participant’s heart rate variability. The researchers found that cognitive processing has an effect on heart rate variability, and the main benefit of fitness level was associated with processes involving sustained attention.

These articles are just a taste of the PLOS ONE research into cardiovascular health and the prevention of heart disease. As American Heart Month comes to an end, explore more research on the topic here.


Ainley V, Tsakiris M (2013) Body Conscious? Interoceptive Awareness, Measured by Heartbeat Perception, Is Negatively Correlated with Self-Objectification. PLoS ONE 8(2): e55568. doi:10.1371/journal.pone.0055568

 Twig G, Afek A, Shamiss A, Derazne E, Tzur D, et al. (2012) White Blood Cell Count and the Risk for Coronary Artery Disease in Young Adults. PLoS ONE 7(10): e47183. doi:10.1371/journal.pone.0047183

 Luque-Casado A, Zabala M, Morales E, Mateo-March M, Sanabria D (2013) Cognitive Performance and Heart Rate Variability: The Influence of Fitness Level. PLoS ONE 8(2): e56935. doi:10.1371/journal.pone.0056935

Image Credit: natalie419 on Flickr

#rds2013 Managing Research Data

SPOILER ALERT talk outline follows – please wait till I give it on 2013-02-27



























Peter Murray-Rust, University of Cambridge and Open Knowledge Foundation

[1][2] [links mainly to PMR blog]

[note; this talk may upset some and enthuse others].

Neelie Kroes. (Vice President European Commission). “This is personal for me. I am 71; I don’t have to do this job. But I want to. I want to because I am inspired by this new generation.” [PMR: I’m exactly the same]



Note: I concentrate on the LONG-TAIL of scholarship.

Where are we at and who are we? (Scale, “market”)

  1. Values matter; then community; technology and protocols then follow

  2. Our current problems are people problems not technology

  3. Communities and ideas that have worked – demos

  4. What can and should we do?


[#opendataday PMR]

We should demand a global knowledge commons

Midsummer Common (Cambridge) – traditional grazing

also [hackathons]

The City of Palo Alto teams with Stanford University to complete the City’s first hack-a-thon. The challenge, build an application in twenty-four hours to utilize geographical information system data provided by the City.


30-40 people at #opendataday


Values and Principles



We must remember Aaron

Closed data mean people die. (Jack Andraka, 14, invented a new diagnosis for pancreatic cancer)

“This was the [paywall to the] article I smuggled into class the day my teacher was explaining antibodies and how they worked. I was not able to access very many more articles directly. I was 14 and didn’t drive and it seemed impossible to go to a University and request access to journals”.

We are in the middle of a digital revolution. We are fighting for our digital commons against digital enclosure.

“Bliss was it in that dawn to be alive,

But to be young was very heaven!” (Wordsworth)


[Bastille: Wikipedia]

Ideas from Ranganathan in the data age.

16 principles for managing research data. (PMR)




Current problems in managing research data. (Vested interests, academic apathy, intrinsic difficulty, finance)

Karen Yacobucci, Karen Hanson NYU Health Sciences Libraries.

Plot: Panda wants to use bear’s data. There are so many problems (discovery, location, format, metadata that after months Panda gives up).

Walled gardens


Animal Garden video at Serpentine Gallery. Plot: Some animals grow flowers and give them away. Other animals build walls and sell them back. Happy ending? Who knows.

A garden is walled if you cannot fork, or download the whole contents. Unwalled gardens MUST have clear governance.


Solutions and communities that work

Wikipedia (piezoelectricity)


(building a world map for social good)

Melbourne Bicycle Map

Linked Data

Bitbucket commits


We (community) build our own Tools

Crystaleye (a chemistry/solid state repository)

Quixote (compchem repo)

Avogadro. + NWChem


Chemical Tagger (tags chemistry and geo)


Content Mining

Ross Mounce has got an AMI-award



Blue Obelisk

Semantic web for materials science



The Stilettoed Mathematician

Open Science Atop 5 Inch Heels…



Europe must legitimize Text+DataMining


We are creating

Collaborate with us!


My fellow citizens of the world: ask not what [knowledge] will do for you, but what together we can do for the freedom of [knowledge]. (adapted from J F Kennedy)


[1] [Power corrupts, Powerpoint corrupts absolutely]

[2] disclaimer: I asked to be freed from Elsevier sponsorship so I could speak my mind. [4 years wasted]

Many thanks to Columbia, to our group, both at Cambridge and throughout the world

#rds2013 Managing Data and Liberation Software; we must remember Aaron Swartz

Something is seriously wrong with our current values in academia.


The world is changing and the tensions between digital openness and digital possession-for-power-and-gain gets daily stronger. We cannot manage research data unless we manage our values first. In January this year the broken values were made very public with the death of Aaron Swartz.

[picture and text from EFF ]


Aaron did more than almost anyone to make the Internet a thriving ecosystem for open knowledge, and to keep it that way. His contributions were numerous, and some of them were indispensable. When we asked him in late 2010 for help in stopping COICA, the predecessor to the SOPA and PIPA Internet blacklist bills, he founded an organization called Demand Progress, which mobilized over a million online activists and proved to be an invaluable ally in winning that campaign.


I’ve blogged before on Aaron. I never met him though I know people who have. I can’t write with authority, so I’ll quote


Tim Berners Lee,

tweeted: “Aaron dead. World wanderers, we have lost a wise elder. Hackers for right, we are one down. Parents all, we have lost a child. Let us weep.”


And I had the privilege of hearing Tim 2 weeks ago and the central, passionate, coherent part of his speech was about Aaron.

Why does this matter to #rds2013? Because if we make the simple step to recognize that knowledge must be Open and fight daily to make it happen – politically, socially, technically, financially, then the tools, the protocols, the repositories, the ontologies follow.

I’ve said that access to public knowledge is a fundamental human right. And now I find myself remarkably in strange company – with Darrell Issa [the proposer of the anti-NIH RWA act] [I don’t understand US politics]

“[Aaron] and I probably would have found ourselves at odds with lots of decisions, but never with the question of whether information was in fact a human right … Ultimately knowledge belongs to all the people of the world — unless there’s a really valid reason to restrict it.” [PMR emphasis]

If we take that axiom, then we have to build the global knowledge commons. It’s an imperative. And tomorrow I shall announce my own, initially very small, tribute to Aaron. I’ll finish with [parts of] his guerrilla manifesto [2008]. Ultimately this is not about technology, it’s about fairness and justice. [my emphases]

Information is power. But like all power, there are those who want to keep it for themselves. The world’s entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of private corporations. Want to read the papers featuring the most famous results of the sciences? You’ll need to send enormous amounts to publishers like Reed Elsevier.


Scanning entire libraries but only allowing the folks at Google to read them? Providing scientific articles to those at elite universities in the First World, but not to children in the Global South? It's outrageous and unacceptable. 


"but what can we do? The companies hold the copyrights, they make enormous amounts of money by charging for access, and it's perfectly legal — there's nothing we can do to stop them." But there is something we can, something that's already being done: we can fight back. 


Those with access to these resources — students, librarians, scientists — you have been given a privilege. You get to feed at this banquet of knowledge while the rest of the world is locked out.


But sharing isn't immoral — it's a moral imperative. Only those blinded by greed would refuse to let a friend make a copy. 


Large corporations, of course, are blinded by greed. The laws under which they operate require it — their shareholders would revolt at anything less. And the politicians they have bought off back them, passing laws giving them the exclusive power to decide who can make copies. 


There is no justice in following unjust laws. It's time to come into the light and, in the grand tradition of civil disobedience, declare our opposition to this private theft of public culture. 


We need to take information, wherever it is stored, make our copies and share them with the world. We need to take stuff that's out of copyright and add it to the archive. We need to buy secret databases and put them on the Web. We need to download scientific journals and upload them to file sharing networks. We need to fight for Guerilla Open Access. 


With enough of us, around the world, we'll not just send a strong message opposing the privatization of knowledge — we'll make it a thing of the past. Will you join us? 


PMR: I join. #ami2 is developed as Liberation Software. We need software just as revolutions need arms and transport. Liberation software is designed , at least part, to make knowledge free.

And “free” means completely free – free-as-in-speech and free throughout the world.

*** And completely coincidentally the following news broke just after I had written this

It appears that, only by researching the Manifesto, a First Amendment protected publication that largely espoused legal information sharing, did the government even get around to treating [the JSTOR action] as a crime.


Universal Green is the Path From Fool’s Gold to Fair Gold

The price of Gold OA today is absurdly, arbitrarily high.

Most journals (and almost all the top journals) today are subscription journals. That means that whether you pay for hybrid Gold to a subscription journal or for “pure Gold” to a pure-Gold journal, double-payment is going on: subscriptions plus Gold. Institutions have to keep subscribing to the subscription journals their users need over and above whatever is spent for Gold.

In contrast, Green OA self-archiving costs nothing. The publication is already paid for by subscriptions.

So it is foolish and counterproductive to pay for Gold pre-emptively, without first having (effectively) mandated and provided Green.

(That done, people are free to spend their spare cash as they see fit!)

So what RCUK should have done (and I hope still will) is to require that all articles, wherever published, be immediately deposited in their authors’ institutional repository — no exceptions. (If it were up to me, I’d allow no OA embargo; but I can live with embargoes for now — as long as deposit itself is immediate and the email-eprint-request Button is there, working, during any embargo: Universal immediate-deposit mandates will soon usher in the natural and well-deserved demise of OA embargoes.)

(That done, whether or not authors choose to publish or pay for Gold is left entirely to their free choice.)

Paying instead for Gold, pre-emptively, for the sake of CC-BY re-use rights , today, is worth neither the product paid for (Gold CC-BY) nor, far more importantly, all the Green OA thereby foregone (for the UK as well as for the rest of the world) whilst the UK’s ill-fated Gold preference policy marches through the next few years to its inevitable failure.

So it’s not about the price of the Gold. It’s about the price of failing to grasp the Green that’s within immediate reach today — the Green that will not only pave the way to Gold (and as much CC-BY as users need and authors want to provide), but the same Green whose competitive pressure will — (here comes my unheeded mantra again) — drive the price of Gold down to a fair, affordable, sustainable one, by making subscriptions unsustainable, forcing publishers to cut costs by downsizing, jettisoning the print and online editions, offloading all access-provision and archiving onto the Green OA institutional repositories, and converting to Fair-Gold in exchange for the peer review service alone, paid for out of a fraction of the institutional subscription cancelation savings windfall.

The difference between paying for Gold then, post-Green OA — and hence post-subscriptions and double-payment — and double-paying for it now, pre-emptively, is the difference between Fair Gold and Fool’s-Gold.

#rds2013: Why academia must look outward; “closed data means people die”

@McDawg (Graham Steel, indefatigable fighter for openness and patients rights has just blogged a powerful story ( ) of how a teenager has made a medical breakthrough despite the publishing industry’s paywalls. From Jack Thomas Andraka.


“After a close family friend died from pancreatic cancer, I turned to the Internet to help me understand more about this disease that had killed him so quickly. I was 14 and didn’t even know I had a pancreas but I soon educated myself about what it was and started learning about how it was diagnosed. I was shocked to discover that the current way of detecting pancreatic cancer was older than my dad and wasn’t very sensitive or accurate. I figured there had to be a better way!”

He began to think of various ways of detecting and preventing cancer growth and terminating the growth before the cancer cells become pervasive. Andraka’s breakthrough nearly didn’t happen. He asked around 200 scientists for help with his research and was turned down every time. (PMR Of course lots of professional scientists get turned down all the time, and I admire Jack’s perseverance).

Luckily he eventually established contact with a Dr. Anirban Maitra, a Professor of Pathology and Oncology at Johns Hopkins University, who provided him with lab space and served as a mentor during the test’s development.

“This was the [paywall to the] article I smuggled into class the day my teacher was explaining antibodies and how they worked. I was not able to access very many more articles directly. I was 14 and didn’t drive and it seemed impossible to go to a University and request access to journals”.

In an interview with the BBC, Andraka said the idea for his pancreatic cancer test came to him while he was in biology class at North County High School, drawing on the class lesson about antibodies and the article on analytical methods using carbon nanotubes he was surreptitiously reading at the time. Afterwards, he followed up with more research using Google Search on nanotubes and cancer biochemistry, aided by online Open Access scientific journals.

Earlier this month, Andraka had a guest post published on the PLOS Student Blog entitled Why Science Journal Paywalls Have to Go.

Can you read the next sentence without a feeling of anger?

“I soon learned that many of the papers I was interested in reading were hidden behind expensive paywalls. I convinced my mom to use her credit card for a few but was discouraged when some of them turned out to be expensive but not useful to me. She became much less willing to pay when she found some in the recycle bin!”

An encapsulation of why we are suffering. No-one should have to buy articles to find out they aren’t any use (and if you have just come into this area many papers cost over 50 USD).

Also earlier this month, Adraka was invited to the State of the Union Address where he met and spoke with President and Michelle Obama about his work. See ‘Mr. Speaker, The President of the United States…and Jack Andraka!‘ for more details.

“Open access would be an important first step. I would love to see research that is publicly funded by taxes to be publicly available. It would make it so much easier for people like me to find the information they need. If I can create a sensor to detect cancer using the Internet, imagine what you can do”.

And meanwhile academics and publishers collude on paying huge amounts of money into the Academic-STMPublisher complex which generates reputations for academics by preventing access to publicly funded information. I’ve written earlier that “Closed Access means people die”. There was a howl of protest from some academics – I didn’t have evidence. Well Jenny Molloy did. But the statement is self-evidently true. One story like this should be enough.

Closed data from public research is immoral, unethical and unacceptable. Don’t argue. Just fix it. (and I shall do my best).


#rds2013 Current Problems in Managing Research Data

I am going through the various sections in my presentation to . I’ve got to “Problems in Managing Research Data”. Warning: This section is uncomfortable for some. In rough order (I might swap 1 and 2):


  • Vested Commercial interests. There are at least these problems:
    • STM publishers. I’ll concentrate on this because until we have an Open Data Commons we can’t work out how to manage it. STM publishers not only stop me and other getting the data, they stifle innovation. That leaves STM about 15 years behind commerce, civics, and the Open movement in terms of technologies, ontologies, innovation.
    • Instrument manufacturers. Many instruments produce encrypted or proprietary output which cannot be properly managed. In many cases this is deliberate to create lockin.
    • Domain software.
      Some manufacturers (e.g of computational chemistry software) legally forbid the publication of results, probably to prevent benchmarking for performance and correctness of science
    • Materials.
      Many suppliers will not say what is in a chemical, what the properties of a material are, etc. You cannot build ontologies on guesswork nor create reliable metadata
  • Academic apathy and misplaced values. I continue to be appalled by the self-centeredness of academia. The debate is data is “how can my data be given metrics”, not “how can I make data available for the good of humanity”. Yes, I’m an idealist, but it hasn’t always been this way. It’s possible to do good scholarship, that is useful, and that is recognised. But academia is devising systems based on self-glorification. With different values, the publisher problem would disappear. The Super Happy Block Party Hackathon (Palo Alto) shows how academia should be getting out and working for the community.
  • Intrinsic difficulty. Some research data is hard. A lot of bioscience. But they solve that by having meetings all the time on how to describe the data. You can’t manage data that you can’t describe. I’ve been working with Dave M-R on integrating the computational declarative semantics of chemistry and mathematics. That’s completely new ground and it’s hard. It’s essential for reproducible computational chemistry (a billionUSD+ activity). Creating chemical ontologies (ChemAxiom, Nico Adams) is hard. Computational ontologies (OWL) stretch my brain to its limits. Materials science is hard. To understand piezoelectricity you have to understand a 6*6 tensor.

    But that’s what the crystallographers have been doing for 30 years. And they have built the knowledge engines.

  • Finance. The least problem. If we want to do it, then the costs is a very small proportion of the total research funding. And a miniscule amount of what we pay the STM publishers. Open Street map was built without a bank balance.

It’s simple. You have to WANT TO DO IT. The rest follows.

With New REF Mandate Proposal, UK Rejoins OA Vanguard

HEFCE’s post REF-2014 Open Access proposal looks very promising, if I have understood it correctly.

The proposal is to mandate that in order to be eligible,
all peer-reviewed journal articles submitted to REF after 2014
must be deposited in the author’s institutional repository
upon acceptance for publication,
regardless of whether the article is published in a subscription journal or in a Gold OA journal
(no preference, and no restriction on author’s journal choice),
and regardless of whether the publisher embargoes Open Access to the deposit
(for an allowable embargo period that remains to be decided.)

The proposed HEFCE REF OA policy looks much better than the current RCUK OA policy. Let us hope that the RCUK policy will now be brought into line with the proposed HEFCE REF policy.

It is also very reassuring to hear that the policy will be based on collaboration and consultation.

This may help the UK regain its former worldwide leadership position in OA. The new US policy developments (following, a decade later, in the UK’s pioneering footsteps) are extremely welcome and timely, but they still have many rough edges. Let’s hope it will be the UK that again shows how to smooth them out and propel us all unstoppably to global OA.

Readers may find it amusing to look at this 2003 proposal to RAE:

“Being the only country with a national research assessment exercise [1], the UK is today in a unique position to make a small change that will confer some large benefits. The Funding Councils should mandate that in order to be eligible for Research Assessment and funding, all UK research-active university staff must maintain (I) a standardised online RAE-CV, including all designated RAE performance indicators, chief among them being (II) the full text of every refereed research paper, publicly self-archived in the university’s online Eprint Archive and linked to the CV for online harvesting, scientometric analysis and assessment. This will (i) give the UK Research Assessment Exercise (RAE) far richer, more sensitive and more predictive measures of research productivity and impact, for far less cost and effort (both to the RAE and to the universities preparing their RAE submissions), (ii) increase the uptake and impact of UK research output, by increasing its visibility, accessibility and usage, and (iii) set an example for the rest of the world that will almost certainly be emulated, in both respects: research assessment and research access.”

Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier.Ariadne 35.

#rds2013: My reply from Elsevier on publishing supplemental data

Two weeks ago I wrote to Elsevier’s Director of Universal Access about making research data Openly available. – the title is fairly self explanatory. I have just got her reply which I publish in full below. My request was rather long and involved” and “its description of various events do not always align with ours“. (The latter statement is meaningless as there were no events involved). To save readers referring back the essence of my mail was that:

Many closed-access publishers (ACS, RSC, Nature) publish authors’ supplemental data under an apparently licence-free/PD/CC0 approach (though not always explicit). Elsevier either puts data behind a paywall or sends it to the closed database of CCDC based on a subscription model. I asked (I thought clearly)

I am therefore asking you do the following:

·         Announce that all supplemental data accompanying Elsevier papers IS licensed as CC0.

·         Require the CCDC to make all primary CIF data from Elsevier publications CC0. (The author’s raw deposition, not CCDC’s derivative works)

·         Extend this policy to all other experimental data published in Elsevier journals (in chemistry this would be records or synthesis, spectra, analytical data, computational chemistry, etc.). When you agree to this I can give public advice as to the best way to achieve this.

I leave you to judge whether Elsevier has answered any of my requests or (as I read it) sidestepped them and added a list of platitudes. But I am probably very slightly biased as I have tried for 4 years to get any straight answers out of Elsevier. I might just fail to hide my bias when I speak tomorrow.

Dear Peter,

Thank you for your message.  It is rather long and involved, and its description of various events do not always align with ours, but it is an important issue that you raise and I am very happy to respond on behalf of Elsevier. Datasets are sometimes published as supplementary information to journal articles. Authors provide Elsevier with only a non-exclusive license to publish/promote these supplementary datasets and so only the authors can decide to use a CC0 license for these datasets.   

This having been said Elsevier shares your vision for open data and a future in which data are much more broadly managed, preserved, and reused for the advancement of science.  Professional curation and preservation of data is, like professional publishing, neither easy nor inexpensive.  The grand challenge is to develop approaches that maximise access to data in ways that are sustained over time, ensure the quality of the scientific record, and stimulate innovation.

Here at Elsevier we:

  • believe rich interconnections between publications and scientific data are important to support our customers to advance science and health
  • work with others to identify, if needed develop, and deploy standard approaches for linking publications and data.
  • encourage authors to document their data and to deposit their data with an appropriate open
    data centre or service and to make their data available for reuse by others, ideally prior to publication of articles based on analysis of these data, and with a permanent standard identifier to link from the publication to the dataset. 
  • recognise that scientists’ invest substantially in creating and interpreting data, and their intellectual and financial contributions need to be recognised and valued
  • believe data should be accompanied by the appropriate metadata to enable it to be understood and reused. 
  • help to communicate the benefits of data curation and reuse for different stakeholders in the scholarly communication landscape including authors, funders, publishers, researchers, and university administrators.
  • encourage authors to cite datasets that have been used in their research and that are available for reuse via a data curation center or service.
  • deploy our expertise in certification, indexing, semantics, and linking to add value to data
  • champion the importance of long term preservation of data, and accreditation systems/standards for digital curation services. 

You and your readers might find this short video by my colleague, IJsbrand Jan Aalbersberg, of interest.  It is a 5-minute flash presentation from a recent STM Innovation seminar on this topic: .

Last but not least, our policies in this space are similar to those of other publishers.  There are two industry position statements that many of us adhere to, and which your readers may find of interest.  They are: and

In closing, we at Elsevier welcome your thoughts and are committed to working with researchers to realize our shared vision for open data.  I will post this response to your blog comment stream as well.

With very kind wishes,



Dr Alicia Wise

Director of Universal Access

Elsevier I The Boulevard I Langford Lane I Kidlington I Oxford I OX5 1GB

M: +44 (0) 7823 536 826 I E:

Twitter: @wisealic


#rds2013: #okfn Content-mining: Europe MUST legitimize it.

I’m on an EC committee looking at how to make content available for mining. (At least I thought that was the point – it seems it isn’t).

“Licences for Europe –A Stakeholder Dialogue”

Working Group 4: Text and Data Mining

Unfortunately I haven’t been able to attend the first meeting as I have been in Australia, but @rmounce has stood in and done a truly exceptional job. The WG is looking at licences and WG4 is on content mining. Ross reported back on Saturday and was disappointed. It seems that the WG4 has been told it has no course of action other than to accept that licences are the way forward.

This is unacceptable in a democratic system. It is difficult enough for us volunteers to compete against the rich media and publisher community. If I go to Brussels I have to find the money. These WGs are monthly. That’s a huge personal cost in time and money. The asymmetry of fighting for digital rights is a huge burden. Note also that it’s a huge drain of opportunity costs. Rather than writing innovative code we have to write letters to Brussels. And that’s what we have done (I’m not on, but I would have been). Here’s our letter.


We write to express our serious and deep-felt concerns in regards to Working Group 4 on text and data mining (TDM).  Despite the title, it appears the research and technology communities have been presented not with a stakeholder dialogue, but a process with an already predetermined outcome –namely that additional licensing is the only solution to the problems being faced by those wishing to undertake TDM of content to which they already have lawful access. Such an outcome places European researchers and technology companies at a serious disadvantage compared to those located in the United States and Asia.


The potential of TDM technology is enormous. If encouraged, we believe TDM will within a small number of years be an everyday tool used for the discovery of knowledge, and will create significant benefits for industry, citizens and governments.McKinsey Global Institute reported in 2011[1]that effective use of ‘big data’ in the US healthcare sector could be worth more than US$300 billion a year, two-thirds of which would be in the form of a reduction in national health care expenditure of about 8%. In Europe, the same report estimated that government expenditure could be reduced by €100 billion a year. TDM has already enabled new medical discoveries through linking existing drugs with new medical applications, and uncovering previously unsuspected linkages between proteins, genes, pathways and diseases[2]. A JISC study on TDM found it could reduce “human reading time”by 80%, and could increase efficiencies in managing both small and big data by 50%[3]. However at present, European researchers and technology companies are mining the web at legal and financial risk, unlike their competitors based in the US, Japan, Israel, Taiwan and South Korea who enjoy a legal limitation and exception for such activities.

Given the life-changing potential of this technology, it is very important that the EU institutions, member state governments, researchers, citizens, publishers and the technology sector are able to discuss freely how Europe can derive the best and most extensive results from TDM technologies.We believe that all parties must agree on a shared priority, with no other preconditions – namely howto create a research environment in Europe with as few barriers as possible, in order to maximise the ability of European research to improve wealth creation and quality of life. Regrettably, the meeting on TDM on 4th February 2013 had not been designed with such a priority in mind. Instead it was made clear that additional relicensing was the only solution under consideration,with all other options deemed to be out of scope.We are of the opinion that this will only raise barriers to the adoption of this technology and make computer-based research in many instances impossible.

We believe that without assurance from the Commission that the following points will be reflected in the proceedings of Working Group 4, there is a strong likelihood that representatives of the European research and technology sectors will not be able to participate in any future meetings:

  1. All evidence, opinions and solutions to facilitate the widest adoption of TDM are given equal weighting, and no solution is ruled to be out of scope from the outset;
  2. All the proceedings and discussions are documented and are made publicly available;
  3. DG  Research and Innovation becomes an equal partner in Working Group 4, alongside DGs Connect, Education and Culture, and MARKT – reflecting the importance of the needs of research and the strong overlap with Horizon 2020.

The annex to this letter sets out five important areas (international competitiveness, the value of research to the EU economy, conflict with Horizon 2020, the open web, and the extension of copyright law to cover data and facts) which were raised at the meeting but were effectively dismissed as out of scope. We believe these issues are central to any evidence-based policy formation in this area and must, as outlined above be discussed and documented.

We would be grateful for your response to the issues raised in this letter at the earliest opportunity and have asked des Bibliothèques Européennes de Recherche) to act as a coordinator on behalf of the signatories outlined below.



Sara Kelly, Executive Director, The Coalition for a Digital Economy

Jonathan Gray, Director of Policy and Ideas, The Open Knowledge Foundation

John McNaught, National Centre for Text Mining, University of Manchester

Aleks Tarkowski,  Communia

Klaus-Peter Böttger, President, European Bureau of Library Information and Documentation Associations (EBLIDA)

Paul Ayris, President, The Association of European Research Libraries (LIBER)

Brian Hole, CEO, Ubiquity Press Ltd.

David Hammerstein, Trans-Atlantic Consumer Dialogue 


PMR: I and collaegues are now technically able to mine the scientific literature in vast amounts. #ami2 takes about 2 seconds per page on my laptop. Given 1 years * 10 million papers * 10 pages that’s 2.0E+8 – 200 million seconds. That means 5 cpus – a trivial amount – can mine and index this data at the rate it appears – and we get machine-readable tables, graphs, trees, chemistry, maps and masses else. It’s a revolution.

I am legally allowed to read these papers.

But If I try to mine them I will be sued.

The planet and humanity desperately need this data. It does not belong to “publishers”. It’s the world’s right to mine this.