#rds2013: Managing Research Data: What I said

This is a list of tweets from my talk. It’s a very good summary – thanks everyone. I have removed duplicates so that each tweet is a separate topic. They aren’t in true order of my presentation or in time. I’ve removed tweeters comments and also most of the tags (e.g. no #rds2013). In some cases this is a retweet so the original tweeter is removed (sorry). Most are either direct or indirect PMR speech.



Wikipedia best knowledge development of 21st century. @AfrolatinProjec

When [Wikipedia] opened, acad[emia] scoffed but [PMR] said “I believe in Wikipedia, the bits I wrote are right.” @anitawaard

Need proper structure & communities for data. We are building walled gardens ie FB. Public governance that is believable needed @jvinopal

“The challenge is whether @figshare can be not a walled garden” – have public governance @rmounce

[PMR] celebrates this excellent youtube video [NYU HSL on lost data]. Applause for its creators http://t.co/GAIq7StTK6 #opendata @DataAtCU

Data gets lost for lack of proper social structure @planarrowspace

Young people aren’t accepting what we have given them and are changing the world. If you aren’t with them, you’ll be left behind @gailst

[PMR] adapts Ranganathan’s principles for data (“data is for use,” more). Consider applying Ranganathan’s 5 laws http://t.co/fKM6RZUJlF @moncia

Encouraging us to read Aaron Schwartz’s guerrilla manifesto // find it here: http://t.co/VdM3jOS2gC @rmounce

[PMR] sure that Ranganathan would agree “Data belongs to the world.” #sixthlaw @yasmeen_azadi

[PMR] gives a shout out to @McDawg ‘s recent post on Open Access here: http://t.co/4vNHRawZE2 @rmounce

[PMR] Cites the excellent work of @jackandraka – this wouldn’t have been possible w/o #openaccess @mew687

The lack of access to scholarly information means people die @rmounce

Access to public knowledge is a *fundamental* human right @lwillm

Journalists, doctors, everyday citizens like #opendata Science should also provide this @moncia

Global Knowledge Commons needed @carolynthelib

“values matter, the community; technology and protocols then follow” @jvinopal

Publishers in Europe trying to limit text mining initiatives #openaccess @jvinopal

AMI is about taking the literature into semantic form…if the publishers and their lawyers will let it happen. @elotroalex

Open Research Data Handbook http://t.co/IPWeETNvVr. @AndyDrewCreamer

We [scientists] have to work for the benefit of humanity. Please work with us, join us @jvinopal

Will be legal to text mine scientific articles in the UK soon. @ResearchAtCU

Current Problems in Managing Research Data. If we want to do it, we will solve them http://t.co/JW5fzi6JFI @jvinopal

in the UK the Hargreaves Report copyright reforms will be implemented so it will be legal to mine papers :) @greentea166

The only major barrier to getting data out of papers in LEGAL – the lawyers. We have the tools e.g. #AMI2 @ResearchAtCU

[PMR] salute to Wikipedia, Open Street Map…not worried about impact factor @DataAtCU

Target Research data management training at graduate students @anitawaard

[PMR] shoutouts for PhD students @rmounce & @StilettoFiend , @OKFN Panton Fellows for #openscience @lisarnorberg

[PMR] give shoutout to linkedopendata @rmounce

Blue Obelisk open source community for chemistry providing open data & tools, costs just $20 p/a to run @rrkennison

[CrystalEye tool live demo!] Fantastic tool that demonstrates the utility of #opendata @moncia

[PMR] gives examples of open data sharing: http://t.co/JnRU3zg9pq, http://t.co/KVpMC6m8y9 (code hosting). @Wilderbach

We must build or make tools, not buy or rent them @planarrowspace

[paraphrase PMR] a repository should provide benefits to the data originator, it’s like a bee and a flower – symbiosis @robincamille

We the community must BUILD, not buy/rent, our tools. @yasmeen_azadi

[PMR] “I made a video. If you’re bored you can watch it…It will break your heart”:. [You can find it on his blog.] @anitawaard

[PMR] shoutout to https://t.co/QoIsFXzr7h @ashleyrjester

I put my software openly on bitbucket not because I’m mandated to but because it’s helpful, better than repos @rmounce

To help humanity we scientists need to release the grip on our data and let people USE IT! @kcrews

[PMR] singing the praises of #Wikipedia. @DataAtCU

Examples of #opendata success: Wikipedia, Open Street Map (not worried about impact factor) @rmounce

[PMR] Cites OpenStreetMap as an excellent example of excellence that doesnt need cash, just willpower @ResearchAtCU


#rds2013 Managing Research Data: How I put talks together and thanks to CDRS

The Center for Digital Research and Scholarship (CDRS) at Columbia have done a truly magnificent job of capturing the Managing Research Data event. As a result we have access to videos, aggregated tweets etc. http://www.ustream.tv/recorded/29603442 contains my presentation (mins 31-50). Others have also contributed tweets analysis (see graph), tagExplorer (awesome), aggregation etc.

The reason it matters is that in my presentations I never know what I am going to say in detail. I work hard beforehand to get the most likely material into my head and turn it over and over. Normally I have probably 1,000++ “slides” to choose from, organised in directories, and make a list of the most likely directories. I often blog my thoughts beforehand and this helps in several ways: listing the items, possibly getting comments from the world and refining, and also something to fall back on if I can’t present from my own machine. (This happened on my farewell to CSIRO – because there were remote attendees the connection had to be central and I couldn’t use my laptop. But all of what I wanted to say was already on the web or the blog).

I am often nervous before talking. This is a good thing. It means I am taking it seriously. Indeed a very good touchstone is that if I take a talk casually (“I’ve done that before so I don’t have to prepare”) I may give a bad one. Audiences deserve commitment. I have – on 2-3 occasions – been truly terrified (one just a month ago). What matters is getting the right story for the right audience. And because I need to have a feel for the audience it can be very difficult to get it right until very close to the meeting. The “right” talk for the wrong audience can be a poor talk.

Since I use my own laptop (and insist on it) and because I agree that “Power corrupts; Powerpoint corrupts absolutely” (Edward Tufte) I use HTML. HTML has many virtues – it scales to the window, it wraps, and when I download a web resource I can just use it (although font size can be a problem). The disadvantage is that it is difficult to add multimedia without significant editing and it’s almost impossible to distribute the presentation (Powerpoint is a good container format and I don’t understand why the W3C has failed to generate good container approaches for multiple pages – which would then spawn editing tools). BTW I sometimes do PPT when I am forced to as part of a larger presentation – they want my “bit”.

I also edit my presentation constantly and – if I have the chance – I may be editing it during the speaker in front of me. This isn’t lack of preparation, so much as adding little details that reflect the makeup of the audience. If I don’t have this chance I find that I am always working after midnight the night before.

So when I give a presentation I know roughly what I want to say but I have far too much material to cover in the time. Because the slides are organised through links I don’t have to “skip over some” as so many Powerpoint presenters have to do. I note those slides which are essential and mark them so that I make sure I don’t forget them. I then ask the host to signal when I have 3 minutes left and make sure that I have wrapped up OK – I try never to overrun. I usually put the thanks first since I might forget them.

This works well for 30+ min presentations. The experience is slightly like my skiing – just out of control and having to think ahead while talking. That’s not the same as being rushed – it’s that I have to constantly make decisions about directions. (Linear Powerpoint is just click-click-click).

However one problem is that I can’t easily “mount my slides”. That’s also because there are interactive demos. So whenever I get the chance I ask to be recorded. And yesterday has turned out wonderfully – thanks again CDRS.

Yesterday however I had to fit a plenary lecture into 15mins. That’s tight, especially as I was going to be controversial. I agonized about how to do this. I knew that if I did my normal process I would seriously overrun. I therefore thought hard about using Powerpoint with timed transitions. But I just couldn’t feel happy about that. I had reserved a whole day beforehand to prepare. I was still exhausted from travelling back from Australia and not sleeping very well. So I spent the day writing blog posts. I wrote 8 posts in a day-and-a-half. They have the added advantage that people not at the symposium can read them. (Many slide presentations often don’t contain explanatory detail). At this stage I was very worried that the presentation would be woolly and unfocused. But during the blog posts I discovered the story to hang the presentation on. The messages that I wanted people to take away (see next post).

I created a linear list of topics. In 15 minutes you have to be close to linear but there were linkouts. I certainly needed to show some of those (e.g. Aaron Swartz). But a list of 20-odd links isn’t exciting as the main stream. So I interspersed those with images from the linkouts. (Generally my images are meaningful – If I show flowers, then the flowers have a clear message. This time I showed a cow on a common – because I was talking about commons).

So the mainstream is a mixture of images, links and short phrases or sentences that I scroll through. I haven’t done this before but feedback was positive. I start at the top and scroll down manually –there is no “slide” but often a concept fits on one screen. I don’t know how meaningful the final result is to people who weren’t there, but at least it links to the blog posts.

The next post shows the tweet analysis of what I said.


Celebrating India’s National Science Day

Today is National Science Day in India, celebrated in honor of Indian physicist Chandrasekhara Venkata Raman’s discovery on February 28, 1928 of the eponymous Raman effect, which relates to the way that light is scattered when it passes through different materials. Raman earned a Nobel Prize for his work in 1930.

In just the past two months, PLOS ONE has published over 100 papers with authors from India, in subjects as varied as molecular biology, ecology, and medicine. For example, various Indian research groups are working with the wildlife in their country, determining non-invasive methods to photographically identify and “tag” Indian gliding lizards based on their blotch patterns and studying the feasibility of human-lion coexistence in the Indian forest.

On the other end of the spectrum, “Systems Biology Approach Reveals Genome to Phenome Correlation in Type 2 Diabetes” illustrates how coordinated analysis of data from various sources, including patient genetic material and databases about known drug interactions and genetic interactions, can be more powerful than considering each individually.  The results provide further evidence that some previously identified genes are involved in the disease, and also help refine the understanding of how these factors are involved.

The theme of this year’s celebration is “Genetically Modified Crops and Food Security,” so the final article I’d like to highlight is a genetic analysis of the apple scab pathogen, a fungus that can wreak havoc on orchards. This study provides primary information about the pathogen that will be crucial for future research investigating how farmers can overcome it.

These four papers are just a tiny sample of the rich and varied research coming out of India. Happy National Science Day, and feel free to add your own favorite Indian research in the comments.


Jain P, Vig S, Datta M, Jindel D, Mathur AK, et al. (2013) Systems Biology Approach Reveals Genome to Phenome Correlation in Type 2 Diabetes. PLoS ONE 8(1): e53522. doi:10.1371/journal.pone.0053522

Sreekar R, Purushotham CB, Saini K, Rao SN, Pelletier S, et al. (2013) Photographic Capture-Recapture Sampling for Assessing Populations of the Indian Gliding Lizard Draco dussumieri. PLoS ONE 8(2): e55935. doi:10.1371/journal.pone.0055935

Banerjee K, Jhala YV, Chauhan KS, Dave CV (2013) Living with Lions: The Economics of Coexistence in the Gir Forests, India. PLoS ONE 8(1): e49457. doi:10.1371/journal.pone.0049457

Thakur K, Chawla V, Bhatti S, Swarnkar MK, Kaur J, et al. (2013) De NovoTranscriptome Sequencing and Analysis for Venturia inaequalis, the Devastating Apple Scab Pathogen. PLoS ONE 8(1): e53937. doi:10.1371/journal.pone.0053937

The UK’s New HEFCE/REF OA Mandate Proposal

David Sweeney’s new HEFCE/REF OA mandate proposal for consultation comes very close to providing the optimal OA mandate model:

(1) It separates the date on which deposit must be made (immediately upon acceptance for publication, with no differences across disciplines) from the date on which the deposit must be made OA (preferably immediately, but, at the latest, within an allowable embargo whose length will be adapted to the needs of each discipline).

(2) It specifies that the deposit must be made in the author’s institutional repository (whence the metadata can be exported or harvested to institution-external discipline repositories immediately — and the full-text once any embargo has elapsed).

(3) It makes immediate-deposit (but not immediate-OA) an eligibility precondition for submission to research evaluation (REF), thereby (very sensibly) recruiting institutions in monitoring and ensuring timely compliance with the mandate.

(4) It expresses no preference for gold OA publishing, leaving authors free to publish in whatever journal they choose.

(5) It expresses a preference for licensing certain re-use rights, but again leaves this to author choice.

I have been a strident critic of the Willetts/Finch/RCUK policy’s preference for gold over green and its constraints on authors’ freedom of journal choice. This new HEFCE mandate proposal would remedy all that and would make the UK’s OA mandate once again compatible with green OA mandates the world over — indeed, with (3) and (4) it provides the all-important compliance-verification mechanism that most OA mandates still lack.

I hope that once they have seriously reflected upon and understood this new mandate proposal, researchers and their institutions will see that it moots all the objections that have been raised to the Finch/RCUK mandate. And I profoundly hope that David Willetts will realize and understand that too.

I also hope that those who are impatient for immediate, embargo-free OA, CC-BY licenses and Gold OA will allow this HEFCE compromise mandate to be adopted and succeed, rather than trying to force their less urgent, less universal, and much more divisive conditions into the policy yet again.

The price of Green OA (per paper deposited) is negligibly small, compared to Gold OA. And institutional repositories are already created and paid up (for a variety of purposes) but they remain near-empty of their target OA content — unless deposit is mandated.

Green deposit mandates have to have carrots and sticks to be effective. Funder mandates provide the carrot/stick for institutions (funding eligibility — and enhanced impact — if you deposit; ineligibility if you don’t)

Double-paying publishers pre-emptively for gold now is fine — if you have effectively mandated a green deposit mandate for all articles first (and you have the extra cash to double-pay publishers for subscriptions and gold).

But if you have not effectively mandated a green deposit mandate for all articles first, instead double-paying publishers pre-emptively for gold is not only a gratuitous waste of scarce research money, but a counterproductive retardant on OA growth, both in the UK and worldwide (in encouraging subscription publishers to offer hybrid gold and to increase their embargo lengths on green in order to ensure that UK authors must pick and pay for gold).

(Where gold [or a fee waiver] is offered for free to authors (& their institutions) by a journal they freely choose as suitable, authors are of course welcome to choose it — as long as they also deposit their article in their Green OA institutional repository, just as everyone else is mandated to do.)

Global green OA grows anarchically, not journal by journal. If and when competition from green starts causing journal cancellations, journals will be forced to start cutting costs by downsizing, phasing out the obsolete print and online edition and offloading all access-provision and archiving onto the global network of green OA institutional repositories. The institutional cancellation savings will then (single-) pay for post-Green Fair Gold at an affordable, sustainable price (for peer review alone).

To instead double-pay publishers pre-emptively for gold now (in the name of “cushioning” the transition) while publishers promise to “plough back” all Gold OA double-payment into subscription savings (all publishers? all subscribers?) is simply to give publishers a license to keep charging as much as they like and never bother to do the cost-cutting and downsizing that universal mandatory green would force them to do.

If the UK double-pays for Gold pre-emptively rather than first effectively mandating Green for all UK research output, it has chosen the losing option in an unforced Prisoner’s Dilemma: the UK loses and the rest of the world gains. Less an admirable moral stance or idealism or a “front-mover” advantage than an unreflective and somewhat stubborn rush for Fool’s Gold.

The Downside of Open-Access Publishing

Over the past couple of years, many people involved in scientific research and publishing have received increasing numbers of emails with invitations to submit papers to newly established journals, join their editorial boards, or even apply to serve as their editors-in-chief. Personally, I have been alternately amused and annoyed by these messages. A glance at the journal’s name or the associated website has told me that these simply are not serious publications. But the establishment of new journals and publishers at a rapidly increasing pace should be taken seriously, since it affects the scientific record as a whole.

The Internet has profoundly and permanently changed the ways in which information can be disseminated and discussed. And since scientific publishing is precisely about getting new findings out to researchers and readers for discussion, the Internet has changed scientific publishing considerably, mostly for the better — and will continue to do so. Distribution costs can be very low if a journal chooses to publish only online, for instance, but there are still high costs involved for proper peer review and editorial quality control. The introduction, a decade ago, of an open-access model in which authors pay to have their work published offered an alternative way of financing this quality control. But it also opened up opportunities to charge authors a fee to publish their papers with little or no quality control.
Jeffrey Beall, an academic librarian at the University of Colorado, Denver, who is interested in scholarly open-access publishing, calls its more questionable incarnations “predatory.”1 “Predatory, open-access publishers,” he writes on his blog, Scholarly Open Access (http://scholarlyoa.com), “are those that unprofessionally exploit the author-pays model of open-access publishing (Gold OA) for their own profit. Typically, these publishers spam professional email lists, broadly soliciting article submissions for the clear purpose of gaining additional income. Operating essentially as vanity presses, these publishers typically have a low article acceptance threshold, with a false-front or non-existent peer review process. Unlike professional publishing operations, whether subscription-based or ethically-sound open access, these predatory publishers add little value to scholarship, pay little attention to digital preservation, and operate using fly-by-night, unsustainable business models.”
Beall is not the first person to ask whether the author-pays model can be exploited. Ever since it was introduced, questions have been raised about the possibility that publishers would be tempted to lower their editorial standards to attract authors who would be happy to see their work published quickly and without too much scrutiny. But Beall has now compiled a list of publishers and journals that he finds questionable and is encouraging discussion in the scientific community about these entities and the criteria that one might use to identify them.2
Whether it’s fair to classify all these journals and publishers as “predatory” is an open question — several shades of gray may be distinguishable. Some of the publishers are intentionally misleading, naming nonexistent people as their editors and editorial board members and claiming ownership of articles that they have plagiarized from other publications. Other journals and publishers on Beall’s list may be real, though it’s obvious that the people running them are not very professional, and some of the publications may have been created simply because it seemed like a clever business scheme to collect author fees of several hundred dollars apiece to post papers in a journal-like layout at a fraction of the traditional price. Viewed in some lights, such enterprises may not be unethical: thousands of researchers worldwide need to publish, and not all of them can do so in the highest-ranked journals. But it is surely problematic for journals and publishers to pretend to be something they aren’t, misleading authors, readers, and the scientific community at large.
Most of the new open-access journals state that they are international, scientific, or scholarly peer-reviewed journals and offer quick turnaround times. Some of them also cover very broad subject areas — for example, the Academic Research Publishing Agency publishes the International Journal of Research and Reviews in Applied Sciences (www.arpapress.com) and encourages submissions from a wide range of scientific fields. It is difficult to imagine how a single journal could manage to properly validate papers that are so varied.
Until recently, “international, scientific, peer-reviewed journal” has had a fairly specific meaning to the scientific community and society at large: it has meant a journal that checks submitted papers for scientific quality, but also for relevance and interest to its readers, and also ensures that it contains new findings that may advance science. These features render a journal trustworthy and worthy of readers’ time and money. Many observers were therefore understandably disturbed when the journal publisher Elsevier admitted in 2009 that it had published six “fake journals” funded by pharmaceutical companies — in Elsevier’s own words, “sponsored article compilation publications . . . that were made to look like journals and lacked the proper disclosures.” The company had intentionally exploited the word “journal” to give the impression that these publications were honest and reliable.
Of course, the terms “international,” “scientific,” “peer-reviewed,” “journal,” “article,” “editor,” and “publisher” do not have copyrighted or patented definitions and can have varied meanings, especially in the Internet age. Must an article be different from a submitted paper? Isn’t everything published online automatically international? Is there anything wrong with a situation in which the editor and publisher are just one person who has set up a website where researchers can submit their papers and pay a fee to have them laid out in a professional way and made available to all interested parties? Isn’t it a good thing that this vast number of new publishers and journals will make it possible to get all research — whatever its quality level — into the public domain? Perhaps. But describing a simple online-posting service as “an international, scientific, peer-reviewed journal” leads authors and readers to believe that they are submitting to or reading something they aren’t.
We must recognize that no publication or financing model is, in itself, morally superior to others or can guarantee high quality. Various models can produce high-quality content, and all are vulnerable to exploitation. It might make the most sense to concern ourselves less with the publication or financing model used and more with ensuring transparency about a publication’s content and editorial processes. And perhaps we should insist that not all these enterprises can be called “scientific journals.” As a reader, I do not want to spend my time reading vast quantities of low-quality research and would be willing to pay for someone to do the sort of filtering for quality, relevance, and novelty that journal editors have traditionally done. As a researcher, by contrast, I might see it as a waste of time to seek a journal that would publish my research and might be willing to spend money to make it available to other researchers and the public. It would be fair to everyone, though, to be explicit about the fact that these are very different types of publications. With greater transparency, the questionable or predatory publishers who are using either author-pays or subscription models would also be easier to spot — and avoid.

Creative Commons and the Openness of Open Access

The Internet has inspired multiple movements toward greater openness — most prominently, open access, open data, open science, and open educational resources. None of these is based on the belief that there should be such a thing as a free lunch, but each recognizes that the Internet changes the economics of publication and digital-resource sharing so that changes can feasibly be made to traditional practices that are in some ways “closed,” requiring payment for access to information or prohibiting myriad reuses of accessible information. The quality of “openness” applies to both the terms of access and the terms of use. Advocates in each movement — and I am one, serving on the boards of directors of two organizations promoting open access, Creative Commons and the Public Library of Science (PLOS) — share an understanding that an open resource is freely accessible over the Internet. Opinions vary about the terms of use necessary for a resource to be open.

Copyright law supplies the baseline terms of use for almost all information on the Internet. These terms can be altered if the copyright owner grants a license or permission to do something that would otherwise infringe copyright. Traditionally, copyright owners granted licenses to specific persons or entities. More recently, copyright owners seeking to grant permission to everyone have issued public licenses broadening the range of permitted uses, subject to certain conditions. Creative Commons licenses are the most widely used of these public licenses for all kinds of copyrighted works except software, for which free and open-source licenses are most common.
Within the open-access context, debate focuses on whether an article is “open” when it, like this one, is freely accessible over the Internet but still subject to the standard restrictions imposed by copyright law. The question also applies to most articles posted in PubMed Central under the Public Access Policy of the National Institutes of Health or in institutional repositories under most university policies, such as that recently adopted by the University of California, San Francisco.1 The three major declarations of purpose for the open-access movement (the Budapest Open Access Initiative, the Bethesda Statement on Open Access Publishing, and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities) say no: openness requires making the literature freely accessible under liberal terms that permit nearly all reuses so long as the author receives credit for the work when it’s republished or adapted.2
The rationale for seeking open terms of both access and use is as follows. Free access provides the literature to at least five overlapping audiences: researchers who happen upon open-access research articles while browsing the Web rather than a password-protected database; researchers at institutions that cannot afford the subscription prices for the growing literature; researchers in disciplines other than that of a journal’s intended audience, who would not otherwise subscribe; patients, their families, students, and other members of the public with an interest in the information but without the means to subscribe; and researchers’ computers running text-mining software to analyze the literature. In addition, granting readers full reuse rights unleashes the full range of human creativity for translating, combining, analyzing, adapting, and preserving the scientific record, whereas traditional copyright arrangements in scientific publishing increasingly inhibit scholarly communication.
The argument for open licensing must be understood in the context of the baseline terms of use provided by copyright law. Copyright applies to works of authorship. One does not have to do anything to “get” a copyright. It attaches automatically when a work is created and stays intact even if a work is published without the copyright symbol (©). Copyright does not apply to the ideas or facts in the covered work, however, but only to the author’s expression of these.
Copyright law gives the copyright owner the exclusive rights to make and publicly distribute copies of the work, to publicly perform or display the work, and to prepare adaptations of it. Granted initially to the author or authors of a work, these rights can be assigned or exclusively licensed to a publisher or other content distributor if that is done in writing. After authors sign away these rights, they, too, must seek permission or a license from the publisher if they wish to make or distribute copies of their article, unless doing so would be considered fair use. Fair use permits certain uses that have positive social benefit, such as use in research or education, and that do not unduly interfere with the copyright owner’s ability to receive economic benefits from publishing or licensing the work.
Copyright’s terms do not restrict all uses of an article. In addition to fair use, uses of facts such as scientific data are not covered by copyright except to the extent that an author has exercised minimal creativity in their selection or arrangement. This minimal-creativity standard might prevent republication of some tables or figures, but copyright doesn’t restrict the reuse of the underlying data if they’re arranged in a different format or a conceptually new figure.
For a wide range of creators, educators, and researchers who care primarily about broad distribution of their work, copyright’s standard terms are inappropriate because they prevent reuses that these authors wish not simply to permit but to encourage, such as translation into other languages. Creative Commons is an organization that has responded by producing a suite of six copyright licenses that offer standardized terms of sharing to permit a range of uses beyond fair use, subject to certain conditions.3 The four conditions are combined into six permutations reflecting the types of copyright restrictions that people who otherwise choose to share their works for free might like to retain (see tableTable 1Creative Commons Licenses.). The licenses, designed to allow all uses except those prohibited by a specified condition, have been adopted by a variety of institutional and individual copyright owners.
All Creative Commons licenses require that users who republish or reuse a work in a way that would otherwise infringe copyright give attribution as directed by the copyright owner. That’s the only condition included in the Creative Commons Attribution license — the only Creative Commons license meeting the definition of “open access” endorsed by the Budapest, Bethesda, and Berlin declarations. This license is used by leading open-access publishers such as PLOS and BioMed Central, recommended by the Open Access Scholarly Publishers Association, and adopted by the World Bank for its internally published research. Commercial science publishers that have launched publications funded by article-processing charges also use Creative Commons licenses, but they either use a more restrictive license or offer authors choices. The Nature Publishing Group’s Scientific Reports, for example, allows authors to choose from three Creative Commons licenses, including the Attribution license.
Other adopters of Creative Commons licenses impose additional conditions on users. Two of these conditions, called ShareAlike and NoDerivatives, concern adaptations of the licensed work. The Wikipedia community, for example, has adopted the Creative Commons Attribution ShareAlike license, which requires both attribution and that any adaptations be licensed under the same license. MIT OpenCourseWare, from the Massachusetts Institute of Technology, adopted the license with the Attribution and ShareAlike conditions but added a NonCommercial condition, prohibiting commercial uses. The various creators of the online educational materials in the University of Michigan Medical School’s Open Michigan database have adopted nearly the full suite of Creative Commons licenses.4 The broad adoption of these licenses reflects a belief that a work is not “open” until it’s freely accessible on the Internet and under a public license offering more liberal terms of use than copyright law provides. Though options offered by Creative Commons licenses address the needs of copyright owners in various contexts, in the open-access context, the Attribution license in my opinion remains the gold standard.

For the Sake of Inquiry and Knowledge — The Inevitability of Open Access

It’s difficult to have a measured conversation about open access — the term widely used to refer to unrestricted online access to articles published in scholarly journals. People who believe that free and unrestricted access to peer-reviewed journal articles will undermine the viability of scholarly journal publishing disagree sharply with those who believe that only open access can expedite research advances and ensure the availability of that same scholarly literature. Arguments for and against open access tend to focus on implementation details, ignoring the powerful motivations underlying the phenomenon.

The open-access movement cannot be appreciated without an understanding of the complex and interdependent system that produces, evaluates, and distributes scholarly research results. For the past 60 years, five stakeholder communities have contributed to the system that enables the production of peer-reviewed research literature. In the simplest terms: funding agencies and foundations provide funds to conduct research; universities and other research organizations host the intellects who conduct the research, maintain the research facilities, and educate and train future researchers; authors, with no expectation of monetary compensation, write research articles describing their research findings; publishers accept contributed research papers on condition of copyright transfer, facilitate the editorial process, and manage the production and distribution processes needed for disseminating the articles; and libraries use institutional funds to purchase, organize, and preserve this publisher output and make it available for current and future research and teaching.
In a system this interdependent, destabilization at any one point perturbs critically important relationships. The advent of the Internet and digital formats was just such a disruption. Initially greeted with enthusiasm on all sides, the transition to digital formats and network distribution channels did not play out as all the stakeholders anticipated or would have liked. As publishers introduced restrictive contractual business models, raised prices (often disproportionally), experimented with digital rights management, and advocated for federal legislation favorable to their own business interests, other stakeholders became concerned about balance in the system and began to look for alternatives.
Authors in this system write to have impact, not for royalties. A distribution system that controls and constrains access to articles is anathema to researchers who seek wide influence rather than remuneration. Alternative options, which could fulfill the promise of the Internet as a tool for open and compatible digital publishing, gained early support in discussions. In 2002, the Declaration of the Budapest Open Access Initiative1 was the first formal call to action, followed the next year by both the Bethesda Statement on Open Access Publishing2 and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities.3 The central concept of each of these calls to action was simple: peer-reviewed research articles, donated for publication by authors with no expectation of compensation, should be available online, free, and with the smallest possible number of usage restrictions.
A vision of open access to research results is not new. In July 1945, writing in the Atlantic Monthly, Vannevar Bush, then director of the U.S. Office of Scientific Research and Development, described just such an environment in his essay “As We May Think.” A staunch advocate of federal support for research in the physical and medical sciences, Bush challenged his fellow scientists and engineers to turn their postwar attention to the task of “making more accessible [the] bewildering store of knowledge.” Bush’s firm belief, which is still shared by academic authors, was that “a record if it is to be useful to science must be continuously extended, it must be stored, and above all it must be consulted.”
The extent to which access to knowledge is constrained and controlled by publishers’ business models is at the heart of the discontent researchers have for the current journal-publishing system. Peter Suber, a leading advocate of open access, articulates the view from the academy as follows: The “problem is that we donate time, labor, and public money to create new knowledge and then hand control over the results to businesses that believe, correctly or incorrectly, that their revenue and survival depend on limiting access to that knowledge.”4 Today, as in 1945, barriers to access to current and past knowledge are viewed by researchers as profoundly at odds with the advancement of knowledge.
Yet producing high-quality peer-reviewed articles has a cost. The fact that faculty members and researchers donate to publishers the ownership of their research articles — as well as their time and effort as reviewers — does not mean that there are no expenses associated with the production of high-quality publications. For all its known flaws, no one wants to destroy peer-reviewed publication. But the nonpublisher stakeholders in the scholarly communication system can no longer support the prices and access constraints desired by traditional publishers.
Discontent with the system extends well beyond authors. Government agencies have good reason to want the research they fund with taxpayer money to be broadly accessible and rapidly built upon; indeed, some would argue that public funders have an ethical imperative to demand open access. Charitable foundations similarly want to share the fruits of their investments in research and, like governments, need to be able to assess the impact and effectiveness of their funding. Recent policy decisions by Research Councils UK and the European Union5 demonstrate a broad and compelling international interest in increasing access to publicly funded research results.
Over the past decade, researchers, research institutions, and funding entities have been experimenting with channels of scholarly communication that serve as alternatives to traditional publishing. Many academic disciplines now utilize large open-access databases (such as arXiv and SSRN, the Social Science Research Network) to share research articles in the pre–peer-review stage. Hundreds of academic institutions and funding agencies now host open repositories of post–peer-reviewed articles that have been authored by grantees or members of their communities. Search engines, which are increasingly popular avenues to scholarly content, facilitate discovery and document use.
These and other experiments and alternatives to traditional publishing are leading the way to a digital, Internet-based, more open publishing system for peer-reviewed journals. The Directory of Open Access Journals (www.doaj.org) lists more than 8000 open-access journals, many of which are highly regarded according to conventional metrics of excellence. Emerging business models include publication fees paid by authors once an article has been accepted for publication, direct support from research grants, and contributions from research institutions willing to contribute financially to publication systems for more openly accessible articles.
Research culture is far from monolithic. Systems that underpin scholarly communication will migrate to open access by fits and starts as discipline-appropriate options emerge. Meanwhile, experiments will be run, start-ups will flourish or perish, and new communication tools will emerge, because, as the Bethesda Open Access Statement puts it, “an old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds.”
There is no doubt that the public interests vested in funding agencies, universities, libraries, and authors, together with the power and reach of the Internet, have created a compelling and necessary momentum for open access. It won’t be easy, and it won’t be inexpensive, but it is only a matter of time.
Listen to an interview with Dr. Martin Frank and Prof. Michael Carroll on traditional and open-access scientific publishing.

ChemistryOpen – Issue 1/2013 is now available online!

ChemistryOpen 2013ChemistryOpen’s first issue in 2013  again reflects the great diversity of this open-access general chemistry journal. Thematically the contributions range from betulin-based polyurethanes for CO2 adsorption to Cu2O-decorated TiO2 nanotubes. In addition to Full Papers, Communications and a Thesis Summary, this issue presents an Editorial by the Co-Editors-in-Chief and a newly featured Cover Profile.

In their Editorial entitled Show me the Money – How, as a Chemist, Can I Find Funding for Open-Access Publishing? the Editors of ChemistryOpen may answer some questions towards open-access funding and provide authors with ideas for potential funding sources. Fuelled by new policies from funders, more and more chemists are looking to publish their research results in an open-access forum. Of course, high-quality publishing costs money, and as a consequence, the so called “gold road” or “author-pays” model has emerged where an Article Publication Charge is payable by the author. So, the question now arises on how authors can meet the associated cost.

ChemistyOpen’s cover receives a new look in 2013 and features the Full Paper by Knut Rurack and co-workers (BAM Federal Institute for Materials Research and Testing, Berlin, Germany) on fluorinated BODIPY dyes for dual-method surface analysis. The associated Cover Profile lets readers around the world take a closer look at his group in Berlin and offers a glimpse at the motivation behind their work: “When one tries to assess the functionalization degree of a support quantitatively, that is, the concentration of chemical groups across the entire support, one realizes that there are no reliable methods available today.

All articles published in ChemistryOpen are open-access and free to all readers. Click here to access the current issue now!

Heart Health Awareness Month

Before this month comes to a close, let us not forget to honor February as American Heart Month.

According to the CDC, heart disease, also known as coronary artery disease or cardiovascular disease,  claims 600,000 lives in the U.S. each year. Heart disease refers to the plaque buildup in the walls of the arteries, resulting in a heart attack or stroke. Other heart conditions include arrhythmia, congenital defects, heart failure and hypertension (high blood pressure).

Researchers continue to study the best ways to properly care for and treat the beating organ within us. Last September, we discussed cardiovascular heath among women, and highlighted related articles published. Today, in honor of American Heart Month, we bring you recently published research that increases awareness and insight to heart health.

Did you ever feel there was a connection between your heart beat and self-image? PLOS ONE authors have attempted to answer this question by investigating the relationship between self-objectification and the beating heart in a recent article. Using a heartbeat perception task and questionnaire, researchers found that women who were able to hear their own heart beat were less likely to objectify themselves, proving yet another link between heart health and overall wellbeing.

In another recently published study, researchers explored the connection between white blood cell count and heart disease risk in young adults. The authors tested the white blood cell counts for over 29,000 healthy young men over an average of seven and a half years and also screened the participants for signs of coronary artery disease. Their investigation found that a higher white blood cell count correlated with coronary artery disease risk in young men. They concluded that white blood cell count may help in identifying young men with low or high risk for heart disease progression.

In a third article published by PLOS ONE, researchers from the University of Granada investigated heart rate variability and cognitive performance. Participants were divided into a high-fit group and a low-fit group, and the authors measured the effects of three cognitive tasks on the participant’s heart rate variability. The researchers found that cognitive processing has an effect on heart rate variability, and the main benefit of fitness level was associated with processes involving sustained attention.

These articles are just a taste of the PLOS ONE research into cardiovascular health and the prevention of heart disease. As American Heart Month comes to an end, explore more research on the topic here.


Ainley V, Tsakiris M (2013) Body Conscious? Interoceptive Awareness, Measured by Heartbeat Perception, Is Negatively Correlated with Self-Objectification. PLoS ONE 8(2): e55568. doi:10.1371/journal.pone.0055568

 Twig G, Afek A, Shamiss A, Derazne E, Tzur D, et al. (2012) White Blood Cell Count and the Risk for Coronary Artery Disease in Young Adults. PLoS ONE 7(10): e47183. doi:10.1371/journal.pone.0047183

 Luque-Casado A, Zabala M, Morales E, Mateo-March M, Sanabria D (2013) Cognitive Performance and Heart Rate Variability: The Influence of Fitness Level. PLoS ONE 8(2): e56935. doi:10.1371/journal.pone.0056935

Image Credit: natalie419 on Flickr

#rds2013 Managing Research Data

SPOILER ALERT talk outline follows – please wait till I give it on 2013-02-27



























Peter Murray-Rust, University of Cambridge and Open Knowledge Foundation

[1][2] [links mainly to PMR blog]

[note; this talk may upset some and enthuse others].

Neelie Kroes. (Vice President European Commission). “This is personal for me. I am 71; I don’t have to do this job. But I want to. I want to because I am inspired by this new generation.” [PMR: I’m exactly the same]



Note: I concentrate on the LONG-TAIL of scholarship.

Where are we at and who are we? (Scale, “market”)

  1. Values matter; then community; technology and protocols then follow

  2. Our current problems are people problems not technology

  3. Communities and ideas that have worked – demos

  4. What can and should we do?


[#opendataday PMR]

We should demand a global knowledge commons

Midsummer Common (Cambridge) – traditional grazing

also [hackathons]

The City of Palo Alto teams with Stanford University to complete the City’s first hack-a-thon. The challenge, build an application in twenty-four hours to utilize geographical information system data provided by the City.


30-40 people at #opendataday


Values and Principles



We must remember Aaron

Closed data mean people die. (Jack Andraka, 14, invented a new diagnosis for pancreatic cancer)

“This was the [paywall to the] article I smuggled into class the day my teacher was explaining antibodies and how they worked. I was not able to access very many more articles directly. I was 14 and didn’t drive and it seemed impossible to go to a University and request access to journals”.

We are in the middle of a digital revolution. We are fighting for our digital commons against digital enclosure.

“Bliss was it in that dawn to be alive,

But to be young was very heaven!” (Wordsworth)


[Bastille: Wikipedia]

Ideas from Ranganathan in the data age.

16 principles for managing research data. (PMR)




Current problems in managing research data. (Vested interests, academic apathy, intrinsic difficulty, finance)

Karen Yacobucci, Karen Hanson NYU Health Sciences Libraries.

Plot: Panda wants to use bear’s data. There are so many problems (discovery, location, format, metadata that after months Panda gives up).

Walled gardens


Animal Garden video at Serpentine Gallery. Plot: Some animals grow flowers and give them away. Other animals build walls and sell them back. Happy ending? Who knows.

A garden is walled if you cannot fork, or download the whole contents. Unwalled gardens MUST have clear governance.


Solutions and communities that work

Wikipedia (piezoelectricity)


(building a world map for social good)

Melbourne Bicycle Map

Linked Data


Bitbucket commits




We (community) build our own Tools

Crystaleye (a chemistry/solid state repository)

Quixote (compchem repo)

Avogadro. + NWChem


Chemical Tagger (tags chemistry and geo)


Content Mining

Ross Mounce has got an AMI-award



Blue Obelisk

Semantic web for materials science






The Stilettoed Mathematician

Open Science Atop 5 Inch Heels… http://sophiekershaw.wordpress.com/author/sophiekershaw/





Europe must legitimize Text+DataMining


We are creating

Collaborate with us!


My fellow citizens of the world: ask not what [knowledge] will do for you, but what together we can do for the freedom of [knowledge]. (adapted from J F Kennedy)


[1] [Power corrupts, Powerpoint corrupts absolutely]

[2] disclaimer: I asked to be freed from Elsevier sponsorship so I could speak my mind. [4 years wasted]

Many thanks to Columbia, to our group, both at Cambridge and throughout the world