# #rds2013 Managing Data and Liberation Software; we must remember Aaron Swartz

Something is seriously wrong with our current values in academia.

The world is changing and the tensions between digital openness and digital possession-for-power-and-gain gets daily stronger. We cannot manage research data unless we manage our values first. In January this year the broken values were made very public with the death of Aaron Swartz.

[picture and text from EFF https://www.eff.org/deeplinks/2013/01/farewell-aaron-swartz ]

Aaron did more than almost anyone to make the Internet a thriving ecosystem for open knowledge, and to keep it that way. His contributions were numerous, and some of them were indispensable. When we asked him in late 2010 for help in stopping COICA, the predecessor to the SOPA and PIPA Internet blacklist bills, he founded an organization called Demand Progress, which mobilized over a million online activists and proved to be an invaluable ally in winning that campaign.

I’ve blogged before on Aaron. I never met him though I know people who have. I can’t write with authority, so I’ll quote

Tim Berners Lee,

tweeted: “Aaron dead. World wanderers, we have lost a wise elder. Hackers for right, we are one down. Parents all, we have lost a child. Let us weep.”

And I had the privilege of hearing Tim 2 weeks ago and the central, passionate, coherent part of his speech was about Aaron.

Why does this matter to #rds2013? Because if we make the simple step to recognize that knowledge must be Open and fight daily to make it happen – politically, socially, technically, financially, then the tools, the protocols, the repositories, the ontologies follow.

I’ve said that access to public knowledge is a fundamental human right. And now I find myself remarkably in strange company – with Darrell Issa [the proposer of the anti-NIH RWA act] [I don’t understand US politics]

“[Aaron] and I probably would have found ourselves at odds with lots of decisions, but never with the question of whether information was in fact a human right … Ultimately knowledge belongs to all the people of the world — unless there’s a really valid reason to restrict it.” [PMR emphasis]

If we take that axiom, then we have to build the global knowledge commons. It’s an imperative. And tomorrow I shall announce my own, initially very small, tribute to Aaron. I’ll finish with [parts of] his guerrilla manifesto [2008]. Ultimately this is not about technology, it’s about fairness and justice. [my emphases]

Information is power. But like all power, there are those who want to keep it for themselves. The world’s entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of private corporations. Want to read the papers featuring the most famous results of the sciences? You’ll need to send enormous amounts to publishers like Reed Elsevier.

Scanning entire libraries but only allowing the folks at Google to read them? Providing scientific articles to those at elite universities in the First World, but not to children in the Global South? It's outrageous and unacceptable.


"but what can we do? The companies hold the copyrights, they make enormous amounts of money by charging for access, and it's perfectly legal — there's nothing we can do to stop them." But there is something we can, something that's already being done: we can fight back.


Those with access to these resources — students, librarians, scientists — you have been given a privilege. You get to feed at this banquet of knowledge while the rest of the world is locked out.


But sharing isn't immoral — it's a moral imperative. Only those blinded by greed would refuse to let a friend make a copy.


Large corporations, of course, are blinded by greed. The laws under which they operate require it — their shareholders would revolt at anything less. And the politicians they have bought off back them, passing laws giving them the exclusive power to decide who can make copies.


There is no justice in following unjust laws. It's time to come into the light and, in the grand tradition of civil disobedience, declare our opposition to this private theft of public culture.


We need to take information, wherever it is stored, make our copies and share them with the world. We need to take stuff that's out of copyright and add it to the archive. We need to buy secret databases and put them on the Web. We need to download scientific journals and upload them to file sharing networks. We need to fight for Guerilla Open Access.


With enough of us, around the world, we'll not just send a strong message opposing the privatization of knowledge — we'll make it a thing of the past. Will you join us?


PMR: I join. #ami2 is developed as Liberation Software. We need software just as revolutions need arms and transport. Liberation software is designed , at least part, to make knowledge free.

And “free” means completely free – free-as-in-speech and free throughout the world.

*** And completely coincidentally the following news broke just after I had written this http://crooksandliars.com/emptywheel/doj-used-open-access-guerilla-manifesto#comment-2225359

It appears that, only by researching the Manifesto, a First Amendment protected publication that largely espoused legal information sharing, did the government even get around to treating [the JSTOR action] as a crime.

# Universal Green is the Path From Fool’s Gold to Fair Gold

The price of Gold OA today is absurdly, arbitrarily high.

Most journals (and almost all the top journals) today are subscription journals. That means that whether you pay for hybrid Gold to a subscription journal or for “pure Gold” to a pure-Gold journal, double-payment is going on: subscriptions plus Gold. Institutions have to keep subscribing to the subscription journals their users need over and above whatever is spent for Gold.

In contrast, Green OA self-archiving costs nothing. The publication is already paid for by subscriptions.

So it is foolish and counterproductive to pay for Gold pre-emptively, without first having (effectively) mandated and provided Green.

(That done, people are free to spend their spare cash as they see fit!)

So what RCUK should have done (and I hope still will) is to require that all articles, wherever published, be immediately deposited in their authors’ institutional repository — no exceptions. (If it were up to me, I’d allow no OA embargo; but I can live with embargoes for now — as long as deposit itself is immediate and the email-eprint-request Button is there, working, during any embargo: Universal immediate-deposit mandates will soon usher in the natural and well-deserved demise of OA embargoes.)

(That done, whether or not authors choose to publish or pay for Gold is left entirely to their free choice.)

Paying instead for Gold, pre-emptively, for the sake of CC-BY re-use rights , today, is worth neither the product paid for (Gold CC-BY) nor, far more importantly, all the Green OA thereby foregone (for the UK as well as for the rest of the world) whilst the UK’s ill-fated Gold preference policy marches through the next few years to its inevitable failure.

So it’s not about the price of the Gold. It’s about the price of failing to grasp the Green that’s within immediate reach today — the Green that will not only pave the way to Gold (and as much CC-BY as users need and authors want to provide), but the same Green whose competitive pressure will — (here comes my unheeded mantra again) — drive the price of Gold down to a fair, affordable, sustainable one, by making subscriptions unsustainable, forcing publishers to cut costs by downsizing, jettisoning the print and online editions, offloading all access-provision and archiving onto the Green OA institutional repositories, and converting to Fair-Gold in exchange for the peer review service alone, paid for out of a fraction of the institutional subscription cancelation savings windfall.

The difference between paying for Gold then, post-Green OA — and hence post-subscriptions and double-payment — and double-paying for it now, pre-emptively, is the difference between Fair Gold and Fool’s-Gold.

# #rds2013: Why academia must look outward; “closed data means people die”

@McDawg (Graham Steel, indefatigable fighter for openness and patients rights has just blogged a powerful story (http://figshare.com/blog/Open_Access_Is_Not_Just_For_Scientists_It%27s_For_Everyone/72 ) of how a teenager has made a medical breakthrough despite the publishing industry’s paywalls. From Jack Thomas Andraka.

“After a close family friend died from pancreatic cancer, I turned to the Internet to help me understand more about this disease that had killed him so quickly. I was 14 and didn’t even know I had a pancreas but I soon educated myself about what it was and started learning about how it was diagnosed. I was shocked to discover that the current way of detecting pancreatic cancer was older than my dad and wasn’t very sensitive or accurate. I figured there had to be a better way!”

He began to think of various ways of detecting and preventing cancer growth and terminating the growth before the cancer cells become pervasive. Andraka’s breakthrough nearly didn’t happen. He asked around 200 scientists for help with his research and was turned down every time. (PMR Of course lots of professional scientists get turned down all the time, and I admire Jack’s perseverance).

Luckily he eventually established contact with a Dr. Anirban Maitra, a Professor of Pathology and Oncology at Johns Hopkins University, who provided him with lab space and served as a mentor during the test’s development.

“This was the [paywall to the] article I smuggled into class the day my teacher was explaining antibodies and how they worked. I was not able to access very many more articles directly. I was 14 and didn’t drive and it seemed impossible to go to a University and request access to journals”.

In an interview with the BBC, Andraka said the idea for his pancreatic cancer test came to him while he was in biology class at North County High School, drawing on the class lesson about antibodies and the article on analytical methods using carbon nanotubes he was surreptitiously reading at the time. Afterwards, he followed up with more research using Google Search on nanotubes and cancer biochemistry, aided by online Open Access scientific journals.

Earlier this month, Andraka had a guest post published on the PLOS Student Blog entitled Why Science Journal Paywalls Have to Go.

Can you read the next sentence without a feeling of anger?

“I soon learned that many of the papers I was interested in reading were hidden behind expensive paywalls. I convinced my mom to use her credit card for a few but was discouraged when some of them turned out to be expensive but not useful to me. She became much less willing to pay when she found some in the recycle bin!”

An encapsulation of why we are suffering. No-one should have to buy articles to find out they aren’t any use (and if you have just come into this area many papers cost over 50 USD).

Also earlier this month, Adraka was invited to the State of the Union Address where he met and spoke with President and Michelle Obama about his work. See ‘Mr. Speaker, The President of the United States…and Jack Andraka!‘ for more details.

“Open access would be an important first step. I would love to see research that is publicly funded by taxes to be publicly available. It would make it so much easier for people like me to find the information they need. If I can create a sensor to detect cancer using the Internet, imagine what you can do”.

And meanwhile academics and publishers collude on paying huge amounts of money into the Academic-STMPublisher complex which generates reputations for academics by preventing access to publicly funded information. I’ve written earlier that “Closed Access means people die”. There was a howl of protest from some academics – I didn’t have evidence. Well Jenny Molloy did. But the statement is self-evidently true. One story like this should be enough.

Closed data from public research is immoral, unethical and unacceptable. Don’t argue. Just fix it. (and I shall do my best).

# #rds2013 Current Problems in Managing Research Data

I am going through the various sections in my presentation to http://cdrs.columbia.edu/cdrsmain/2013/01/esearch-data-symposium-february-27-2013/ . I’ve got to “Problems in Managing Research Data”. Warning: This section is uncomfortable for some. In rough order (I might swap 1 and 2):

• Vested Commercial interests. There are at least these problems:
• STM publishers. I’ll concentrate on this because until we have an Open Data Commons we can’t work out how to manage it. STM publishers not only stop me and other getting the data, they stifle innovation. That leaves STM about 15 years behind commerce, civics, and the Open movement in terms of technologies, ontologies, innovation.
• Instrument manufacturers. Many instruments produce encrypted or proprietary output which cannot be properly managed. In many cases this is deliberate to create lockin.
• Domain software.
Some manufacturers (e.g of computational chemistry software) legally forbid the publication of results, probably to prevent benchmarking for performance and correctness of science
• Materials.
Many suppliers will not say what is in a chemical, what the properties of a material are, etc. You cannot build ontologies on guesswork nor create reliable metadata
• Academic apathy and misplaced values. I continue to be appalled by the self-centeredness of academia. The debate is data is “how can my data be given metrics”, not “how can I make data available for the good of humanity”. Yes, I’m an idealist, but it hasn’t always been this way. It’s possible to do good scholarship, that is useful, and that is recognised. But academia is devising systems based on self-glorification. With different values, the publisher problem would disappear. The Super Happy Block Party Hackathon (Palo Alto) shows how academia should be getting out and working for the community.
• Intrinsic difficulty. Some research data is hard. A lot of bioscience. But they solve that by having meetings all the time on how to describe the data. You can’t manage data that you can’t describe. I’ve been working with Dave M-R on integrating the computational declarative semantics of chemistry and mathematics. That’s completely new ground and it’s hard. It’s essential for reproducible computational chemistry (a billionUSD+ activity). Creating chemical ontologies (ChemAxiom, Nico Adams) is hard. Computational ontologies (OWL) stretch my brain to its limits. Materials science is hard. To understand piezoelectricity you have to understand a 6*6 tensor.

But that’s what the crystallographers have been doing for 30 years. And they have built the knowledge engines.

• Finance. The least problem. If we want to do it, then the costs is a very small proportion of the total research funding. And a miniscule amount of what we pay the STM publishers. Open Street map was built without a bank balance.

It’s simple. You have to WANT TO DO IT. The rest follows.

# With New REF Mandate Proposal, UK Rejoins OA Vanguard

HEFCE’s post REF-2014 Open Access proposal looks very promising, if I have understood it correctly.

The proposal is to mandate that in order to be eligible,
all peer-reviewed journal articles submitted to REF after 2014
must be deposited in the author’s institutional repository
immediately
upon acceptance for publication,
regardless of whether the article is published in a subscription journal or in a Gold OA journal
(no preference, and no restriction on author’s journal choice),
and regardless of whether the publisher embargoes Open Access to the deposit
(for an allowable embargo period that remains to be decided.)

The proposed HEFCE REF OA policy looks much better than the current RCUK OA policy. Let us hope that the RCUK policy will now be brought into line with the proposed HEFCE REF policy.

It is also very reassuring to hear that the policy will be based on collaboration and consultation.

This may help the UK regain its former worldwide leadership position in OA. The new US policy developments (following, a decade later, in the UK’s pioneering footsteps) are extremely welcome and timely, but they still have many rough edges. Let’s hope it will be the UK that again shows how to smooth them out and propel us all unstoppably to global OA.

Readers may find it amusing to look at this 2003 proposal to RAE:
“Being the only country with a national research assessment exercise [1], the UK is today in a unique position to make a small change that will confer some large benefits. The Funding Councils should mandate that in order to be eligible for Research Assessment and funding, all UK research-active university staff must maintain (I) a standardised online RAE-CV, including all designated RAE performance indicators, chief among them being (II) the full text of every refereed research paper, publicly self-archived in the university’s online Eprint Archive and linked to the CV for online harvesting, scientometric analysis and assessment. This will (i) give the UK Research Assessment Exercise (RAE) far richer, more sensitive and more predictive measures of research productivity and impact, for far less cost and effort (both to the RAE and to the universities preparing their RAE submissions), (ii) increase the uptake and impact of UK research output, by increasing its visibility, accessibility and usage, and (iii) set an example for the rest of the world that will almost certainly be emulated, in both respects: research assessment and research access.”

Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier.Ariadne 35.

# #rds2013: My reply from Elsevier on publishing supplemental data

Two weeks ago I wrote to Elsevier’s Director of Universal Access about making research data Openly available. http://blogs.ch.cam.ac.uk/pmr/2013/02/11/i-request-elsevier-to-make-experimental-data-cc0-and-release-crystallography-from-ccdc-monopoly/ – the title is fairly self explanatory. I have just got her reply which I publish in full below. My request was rather long and involved” and “its description of various events do not always align with ours“. (The latter statement is meaningless as there were no events involved). To save readers referring back the essence of my mail was that:

Many closed-access publishers (ACS, RSC, Nature) publish authors’ supplemental data under an apparently licence-free/PD/CC0 approach (though not always explicit). Elsevier either puts data behind a paywall or sends it to the closed database of CCDC based on a subscription model. I asked (I thought clearly)

I am therefore asking you do the following:

·         Announce that all supplemental data accompanying Elsevier papers IS licensed as CC0.

·         Require the CCDC to make all primary CIF data from Elsevier publications CC0. (The author’s raw deposition, not CCDC’s derivative works)

·         Extend this policy to all other experimental data published in Elsevier journals (in chemistry this would be records or synthesis, spectra, analytical data, computational chemistry, etc.). When you agree to this I can give public advice as to the best way to achieve this.

I leave you to judge whether Elsevier has answered any of my requests or (as I read it) sidestepped them and added a list of platitudes. But I am probably very slightly biased as I have tried for 4 years to get any straight answers out of Elsevier. I might just fail to hide my bias when I speak tomorrow.

Dear Peter,

Thank you for your message.  It is rather long and involved, and its description of various events do not always align with ours, but it is an important issue that you raise and I am very happy to respond on behalf of Elsevier. Datasets are sometimes published as supplementary information to journal articles. Authors provide Elsevier with only a non-exclusive license to publish/promote these supplementary datasets and so only the authors can decide to use a CC0 license for these datasets.

This having been said Elsevier shares your vision for open data and a future in which data are much more broadly managed, preserved, and reused for the advancement of science.  Professional curation and preservation of data is, like professional publishing, neither easy nor inexpensive.  The grand challenge is to develop approaches that maximise access to data in ways that are sustained over time, ensure the quality of the scientific record, and stimulate innovation.

Here at Elsevier we:

• believe rich interconnections between publications and scientific data are important to support our customers to advance science and health
• work with others to identify, if needed develop, and deploy standard approaches for linking publications and data.
• encourage authors to document their data and to deposit their data with an appropriate open
data centre or service and to make their data available for reuse by others, ideally prior to publication of articles based on analysis of these data, and with a permanent standard identifier to link from the publication to the dataset.
• recognise that scientists’ invest substantially in creating and interpreting data, and their intellectual and financial contributions need to be recognised and valued
• believe data should be accompanied by the appropriate metadata to enable it to be understood and reused.
• help to communicate the benefits of data curation and reuse for different stakeholders in the scholarly communication landscape including authors, funders, publishers, researchers, and university administrators.
• encourage authors to cite datasets that have been used in their research and that are available for reuse via a data curation center or service.
• deploy our expertise in certification, indexing, semantics, and linking to add value to data
• champion the importance of long term preservation of data, and accreditation systems/standards for digital curation services.

You and your readers might find this short video by my colleague, IJsbrand Jan Aalbersberg, of interest.  It is a 5-minute flash presentation from a recent STM Innovation seminar on this topic: http://www.youtube.com/watch?v=3KuBToc4Nv0 .

Last but not least, our policies in this space are similar to those of other publishers.  There are two industry position statements that many of us adhere to, and which your readers may find of interest.  They are: http://www.stm-assoc.org/2006_06_01_STM_ALPSP_Data_Statement.pdf and http://www.stm-assoc.org/2012_12_04_STM_on_Data_and_IP_For_Scholarly_Publishers.pdf

In closing, we at Elsevier welcome your thoughts and are committed to working with researchers to realize our shared vision for open data.  I will post this response to your blog comment stream as well.

With very kind wishes,

Alicia

Dr Alicia Wise

Director of Universal Access

Elsevier I The Boulevard I Langford Lane I Kidlington I Oxford I OX5 1GB

# #rds2013: #okfn Content-mining: Europe MUST legitimize it.

I’m on an EC committee looking at how to make content available for mining. (At least I thought that was the point – it seems it isn’t).

“Licences for Europe –A Stakeholder Dialogue”

Working Group 4: Text and Data Mining

Unfortunately I haven’t been able to attend the first meeting as I have been in Australia, but @rmounce has stood in and done a truly exceptional job. The WG is looking at licences and WG4 is on content mining. Ross reported back on Saturday and was disappointed. It seems that the WG4 has been told it has no course of action other than to accept that licences are the way forward.

This is unacceptable in a democratic system. It is difficult enough for us volunteers to compete against the rich media and publisher community. If I go to Brussels I have to find the money. These WGs are monthly. That’s a huge personal cost in time and money. The asymmetry of fighting for digital rights is a huge burden. Note also that it’s a huge drain of opportunity costs. Rather than writing innovative code we have to write letters to Brussels. And that’s what we have done (I’m not on, but I would have been). Here’s our letter.

Quotes:

We write to express our serious and deep-felt concerns in regards to Working Group 4 on text and data mining (TDM).  Despite the title, it appears the research and technology communities have been presented not with a stakeholder dialogue, but a process with an already predetermined outcome –namely that additional licensing is the only solution to the problems being faced by those wishing to undertake TDM of content to which they already have lawful access. Such an outcome places European researchers and technology companies at a serious disadvantage compared to those located in the United States and Asia.

The potential of TDM technology is enormous. If encouraged, we believe TDM will within a small number of years be an everyday tool used for the discovery of knowledge, and will create significant benefits for industry, citizens and governments.McKinsey Global Institute reported in 2011[1]that effective use of ‘big data’ in the US healthcare sector could be worth more than US\$300 billion a year, two-thirds of which would be in the form of a reduction in national health care expenditure of about 8%. In Europe, the same report estimated that government expenditure could be reduced by €100 billion a year. TDM has already enabled new medical discoveries through linking existing drugs with new medical applications, and uncovering previously unsuspected linkages between proteins, genes, pathways and diseases[2]. A JISC study on TDM found it could reduce “human reading time”by 80%, and could increase efficiencies in managing both small and big data by 50%[3]. However at present, European researchers and technology companies are mining the web at legal and financial risk, unlike their competitors based in the US, Japan, Israel, Taiwan and South Korea who enjoy a legal limitation and exception for such activities.

Given the life-changing potential of this technology, it is very important that the EU institutions, member state governments, researchers, citizens, publishers and the technology sector are able to discuss freely how Europe can derive the best and most extensive results from TDM technologies.We believe that all parties must agree on a shared priority, with no other preconditions – namely howto create a research environment in Europe with as few barriers as possible, in order to maximise the ability of European research to improve wealth creation and quality of life. Regrettably, the meeting on TDM on 4th February 2013 had not been designed with such a priority in mind. Instead it was made clear that additional relicensing was the only solution under consideration,with all other options deemed to be out of scope.We are of the opinion that this will only raise barriers to the adoption of this technology and make computer-based research in many instances impossible.

We believe that without assurance from the Commission that the following points will be reflected in the proceedings of Working Group 4, there is a strong likelihood that representatives of the European research and technology sectors will not be able to participate in any future meetings:

1. All evidence, opinions and solutions to facilitate the widest adoption of TDM are given equal weighting, and no solution is ruled to be out of scope from the outset;
2. All the proceedings and discussions are documented and are made publicly available;
3. DG  Research and Innovation becomes an equal partner in Working Group 4, alongside DGs Connect, Education and Culture, and MARKT – reflecting the importance of the needs of research and the strong overlap with Horizon 2020.

The annex to this letter sets out five important areas (international competitiveness, the value of research to the EU economy, conflict with Horizon 2020, the open web, and the extension of copyright law to cover data and facts) which were raised at the meeting but were effectively dismissed as out of scope. We believe these issues are central to any evidence-based policy formation in this area and must, as outlined above be discussed and documented.

We would be grateful for your response to the issues raised in this letter at the earliest opportunity and have asked susan.reilly@kb.nl(Ligue des Bibliothèques Européennes de Recherche) to act as a coordinator on behalf of the signatories outlined below.

Participants:

Sara Kelly, Executive Director, The Coalition for a Digital Economy

Jonathan Gray, Director of Policy and Ideas, The Open Knowledge Foundation

John McNaught, National Centre for Text Mining, University of Manchester

Aleks Tarkowski,  Communia

Klaus-Peter Böttger, President, European Bureau of Library Information and Documentation Associations (EBLIDA)

Paul Ayris, President, The Association of European Research Libraries (LIBER)

Brian Hole, CEO, Ubiquity Press Ltd.

David Hammerstein, Trans-Atlantic Consumer Dialogue

PMR: I and collaegues are now technically able to mine the scientific literature in vast amounts. #ami2 takes about 2 seconds per page on my laptop. Given 1 years * 10 million papers * 10 pages that’s 2.0E+8 – 200 million seconds. That means 5 cpus – a trivial amount – can mine and index this data at the rate it appears – and we get machine-readable tables, graphs, trees, chemistry, maps and masses else. It’s a revolution.

I am legally allowed to read these papers.

But If I try to mine them I will be sued.

The planet and humanity desperately need this data. It does not belong to “publishers”. It’s the world’s right to mine this.

# #rds2013: Managing Research Data : Ideas from Ranganathan

Ranganathan is one of the great visionaries of the C20 and 90 years ago created http://en.wikipedia.org/wiki/Five_laws_of_library_science. These are as true today. I’ve urged that libraries and academics understand the true points of Ranganathan – they aren’t business rules, they are rules for a fair social system for information. In their simple form:

1. Books are for use.
2. Every reader his [or her] book.
4. Save the time of the reader.
5. The library is a growing organism.

I spoke on these 3 years ago and proposed 12 actions points – somewhat off the cuff. There’s a good report here http://frommelbin.blogspot.com/2009/10/peter-murray-rusts-12-point-action-plan.html . I hoped they might spark discussion – but very little (at least back to me – Melbin says I am surprised that there has not been more debate about his address”). Here are the 12 points in no particular order.

1. We should act as citizen librarians towards a common or shared goal.

3. Text mine everything.

4. Put 2nd year students in charge of developing educational technology resources

5. Actively participate in obtaining science grants

6. Actively participate in the scientific publishing process.

7. Close the science library and move it all to the departments.

8. Handover all purchasing to national Rottweiler publishing officers.
9. Set up a new type of university press.

10. We should develop our own metrics system.

11. We should publicly campaign for openness.

12. We should make the library an addictive “game

Most are still true, though given the lack of response I think I’d regard 9 as a lost cause and. I wrote 2 before I knew of Aaron Swartz. 1, 3, 4, 10, 11 are key. Academic libraries have very little time left: 5, 6, 7, 8, 12 will be irrelevant if we have no libraries. So here’s another interpretation of Ranganathan in the data age.

1. Data belongs to the world. We are on a sick planet and data is a critical part of any solution. Data should not belong to people or institutions but to the people of the world and their machines.
2. Data is for use. I wish this was self-evident.
3. Every reader their data. I don’t have a good modern word: I am using “reader” to encompass humans and machines. This means that a reader should be able to access any data they need.
4. Every data its reader. This means that there is potentially at least one person/machine interested in data that you might produce.
5. Save the time of the reader. Make it as easy as possible to discover, understand and use data. Make it as easy as possible to create data.
6. The data community is a growing organism. This is excitingly fact, though not generally in Universities.

The word “reader” is asymmetric. I’d like to add another law such as

1. Every reader is an author and every author a reader. This was not true in Ranganathan’s time – books were physical objects requiring much effort. But now everyone can take part at every level.

# Mandate Institutional Deposit — Then Harvest Where You Please

1. The only substantive issue is how to get peer-reviewed journal articles to be made Open Access (OA), today.

2. Twenty years of evidence shows that — except in the very few subfields that self-archive spontaneously, unmandated — the only way to get those articles to be made OA is to mandate (require) that they be made OA.

3. Institutions are the source of all peer-reviewed journal articles, in all fields, funded and unfunded.

4. Authors who do not self-archive spontaneously, unmandated, can only be mandated to do it once (not multiple times, in multiple places).

5. The only ones that can systematically monitor and ensure that all of their research output, in all fields, funded and unfunded, is self-archived, in compliance with self-archiving mandates are authors’ own institutions.

6. The only way institutions can systematically monitor and ensure that all of their research output is self-archived is if it is deposited, convergently, in their own institutional repository — not if it is deposited, divergently, here and there, institution-externally. (Institutional back-harvesting of its own institution-external content is so unrealistic as to be hardly worthy of discussion.)

7. The metadata of institutionally deposited articles can be — and are being — harvested institution-externally by many harvesters (foremost among them being google and google scholar).

8. The full-texts of institutional deposits are being harvested too (by google and google scholar for sure) — although for most purposes users only need a link to the full-text in the institutional repository.

9. The power and functionality of OA harvesters can and will be enhanced dramatically — but not until much, much more of their target content is OA than is OA today.

10. Till then it’s simply not worth most people’s time to enhance functionality over such sparse content.

11. Which brings us straight back to the need for effective OA self-archiving mandates, systematically (hence institutionally) monitored to ensure compliance.

12. Arxiv’s functionality does not come from the fact that its authors deposit directly in Arxiv: it comes from the fact that they deposit, and deposit reliably (near 100%), unmandated.

13. Ditto for those who share protein or crystallographic data centrally, unmandated.

14. The real problem is all of that vast majority of OA’s target content that is not being deposited — either institutionally or institution-externally — because deposit has not yet been mandated.

15. Immediate deposit of all peer-reviewed research output can be mandated by both institutions and funders.

16. Immediate Open Access to the deposit would be desirable, but access to deposits can be embargoed, if there is a wish to comply with publisher embargoes on OA .

17. This compromise can and should be made, if necessary, in the interest of hastening and facilitating the universal adoption of immediate-deposit mandates by all institutions and funders.

18. (Institutional repositories’ email-eprint-request Button is there to tide over user needs during embargoes.)

19. The other compromise that can and should be made, because it is indeed necessary, is not to insist prematurely on further rights — over and above free online access — that publishers are not yet willing to allow, such as text-mining, re-mix and re-publication rights.

20. First things first: Don’t fail to grasp what’s already within reach by over-reaching for what’s not yet within reach: Don’t let the perfect be the enemy of the good.

21. Mandate institutional deposit — and let harvesters harvest where and when they please.

# Flaws in BIS/Finch/RCUK mandate were not just publishers’ fault

Maybe there will be an eventual realization that the failure of the new BIS/Finch/RCUK OA policy was not just due to publisher counter-lobbying but also to premature and disastrously counterproductive insistence on Gold and CC-BY by certain overzealous OA advocates.

Notice that the new US Presidential OA Directive we all now applaud makes no mention of Gold OA or CC-BY, just free online access (and of course the way it will be implemented will be largely via Green OA self-archiving). That’s exactly what the UK Select Committee proposed in 2004.

Gold OA — and as much CC-BY as users need and authors wish to provide — will come, inexorably. But its coming is only slowed by grit-toothed insistence on having it first, at the expense of the free online access that all (not just some) research needs far, far more urgently than it needs Gold or CC-BY — and that will pave the way for Gold and CC-BY.

First things first: Don’t let the “best” become the enemy of the better.