# A commentary on Sci-hub: 2/n. Why it matters to me and ContentMine

In my previous post , catalyzed by Sci-Hub, https://blogs.ch.cam.ac.uk/pmr/2016/04/30/a-commentary-on-sci-hub-1-scholarly-publishing-is-broken/ I argued that scholarly publishing is completely broken. It’s now lost a huge amount of respect, it’s unwieldly, unfair and mired in bickering. It pays no attention to readers. It’s becoming a write-only system where authors write not to communicate but for glory – self advancement. There’s no clear political goal …

… and no clear technological goal.

And that’s the problem.

Because we desperately need the ability to search and analyze the scientific and medical literature in a 21stC manner.  While we’ve been creating our http://contentmine.org we’ve discovered many researchers who have to “read” 10,000 papers in a day or two. They use 20thC methods – click and read – taking weeks where they should take hours. ContentMine software (completely Open) has been built to solve this problem by filtering out the papers you don’t want – often 90% of the first search. (and it does much more – it can extract complex objects). It’s Open to everyone and it works (see previous posts).

When I came to Cambridge I had the vision of building an “artificially intelligent chemical reader” part of which was the  World_Wide_Molecular_Matrix a system for capturing and sharing versioned semantic chemistry. Bits of it are being built in ContentMine . I built systems where I could draw chemical formulae by speaking to the machine. We’ve built the de facto tool for chemical name recognition (OSCAR) and interpretation (OPSIN). I thought it would take 5 years to create my chemical amanuensis – scholarly assistant. With help from the publishers and scientists it probably would have. Now, after 15 years, it’s still a dream, frustrated by stagnant thinking on all sides, and deliberate opposition (e.g. nullifying European legislation).

So Stackoverflow, Github, Bitbucket, Apache, GNU, Jenkins, OuterCurve, Mozilla and many others are creating the human-machine technology of tomorrow. This encourages innovation from predictable and unpredictable sources. It works – it’s exciting and we are all part of it.

In contrast the Scholarly publishing industry has created nothing in the last 20 years. (The Scholarly Kitchen hailed the “big deal” (a pricing strategy to increase sales) as one of the greatest achievements of schol pub).

20 billion dollars per year – that’s 200 billion since I started at Cambridge – and nothing positive to show for it.

The current technology of the mainstream publishing industry is just awful. Really awful. It’s often built by outsourcing parts to people and companies who do not care how the result is used. The methods used – awful PDF and really awful HTML – are for the publisher’s convenience , not for the reader. And every publishers complains about how awful the tools are. They can’t change, they can’t innovate, they’re locked in. Add that every publisher feels they have to use a different technology to differentiate themselves from the others and it’s a complete tower of Babel. (I have spent 2 years of my life trying to solve this awful mess – and ContentMine can untangle a good deal.)

What’s even worse is that most of the publishers spend effort on STOPPING people reading the literature. The obstacles to getting to a paper grow every month. These include (from my own experience):

• Pixel maps rather than characters.
• “Glass screens” that can’t be copied (Readcube from Nature/Springer).
• Monitoring every download and requiring libraries to stop researchers. (Elsevier, Wiley).
• Automatically cutting off 200 universities for a single click (Amer. Chem Society).

Why does this matter?

Because there is so much we are missing out on. New medicinal knowledge, new ecology, new astronomy, materials, chemical reactions, … and innovation…

I should be able to ask a computer (in speech):

“Find me all chemical compounds that occur in Lantana species south of the Wallace line and compare their chemical and plant evolution. What types of compound might we see in the future, particularly due to invasive species?”

And get a result in minutes… it’s not as hard as it looks. It’s knowledge-driven science.

(Sadly All I WILL get in minutes is a cease-and-desist letter from publishers demanding that I shouldn’t “steal their content”.)

So because we cannot innovate in this area we are 20 years behind the mainstream.

So why do I want Sci-hub? (Note carefully that I haven’t said what I am going to do and, until I do, you cannot judge my intention. I haven’t said I’m going to use it. You’ll have to wait till the next blog post).

I want Sci-hub because it’s technically BETTER than anything else we have. Much better.

And it’s the perfect complement to ContentMine.

Sci-hub has all the world’s scientific knowledge in one logical place. It doesn’t matter that it’s spread over Torrents and other fragmentation – logically it’s all there. And it’s run by someone who knows what she’s doing technically – unlike many publisher sites. And, I assume, she and colleagues will be receptive to technical requests and suggestions. (No one has any chance of getting conventional publishes to innovate).

Using Sci-hub would advance my and ContentMine technology enormously. ContentMine and Sci-hub fit together perfectly – because they are both designed with the 21stC mentality. Because they react to what readers want. Yes, READERS; the marginalised community of scholarly publishing. 21stC projects create a community round them. They are organic and vibrant. They respect machines and humans equally.

ContentMine + Sci-hub could be the greatest search engine in scholarship, especially for science, technology and medicine. Because it’s semantic. Because all the literature is trivially accessible in one place and one format. I don’t know of anything that remotely comes close. We can search and index diagrams – extract 15 million chemical reactions a year. (Even if a publisher tried to develop it they could only use it on “their own” content.)

BUT! BUT! BUT!

But for many, including the law, Sci-hub is forbidden fruit.  Run by She-who-must-not-be-named. The arch-pirate. The criminal. (These terms are used). Peter Murray-Rust cannot use it (and I haven’t). ContentMine cannot mine it (and we won’t). We’ve looked at the legal and political aspects and I’ll analyse these in a subsequent post.

But 21stCCitizens – me, ContentMine, taxi-drivers really really want Sci-Hub.

The only things stopping us are copyright law, prosecutors and an intransigent, uncaring, out-of-touch, money-driven and self-seeking publisher-academic complex.

I’ll deal with the politico-legal in the next post.

# Elsevier defends its value after Open Access disputes | The Bookseller

Not even the opening paragraph is OA. [Note that the paywall is from The Bookseller, not Elsevier.]

Update (May 3, 2016): The article is now OA. Excerpt:

“Elsevier has sought to set aside public criticism of its Open Access (OA) and pricing policies and to restate its value for the academy, emphasising how, as a profit-generating company, it has the means to invest in innovation to serve researchers’ fast-changing needs.

The publisher’s record of success is clear: 2015 results from parent company RELX Group show Elsevier with operating profits of £760m on revenue of £2,070m, with underlying revenue growth of 2% and underlying profit up 3%. The prediction for 2016 is of further profit growth.  But public perceptions of Elsevier have been dogged by accusations of profiteering through excessive charges and reluctance to make its material available through OA, most notably from the online academic protest group The Cost of Knowledge (www.thecostofknowledge.com) which has racked up 16,000 signatories to its Elsevier boycott over five years.

Other widely aired disputes—a year-long deadlock with Dutch universities over institutional subscriptions; the departure of the entire editorial board of journal Lingua in 2015 in a row over OA—have added fuel to the fire for Elsevier’s critics. But director of access and policy Alicia Wise, vice-president of global corporate relations Tom Reller and policy director Gemma Hersh say criticism from a vocal minority is unrepresentative of the publisher’s regular contact with millions of researchers. The trio say that detractors obscure a key fact: that Elsevier is seeking to negotiate the new landscapes of OA and content-sharing in such a way that its economic sustainability, and therefore ability to maintain quality, is not compromised….”

# Digital Solutions in India 2016: Unfolding the Next Chapter in Digital Content Proposition

“More publishers, especially those in the journals segment, are looking for highly automated fast-track workflows for handling large-volume multidisciplinary open access journals, Arora says….Reduced production costs, a shorter publishing cycle, and increased author involvement are the factors behind the creation of e2e, OKS Group’s cloud-based workflow platform…. “From a collaboration and cost standpoint, e2e is ideal for open access publishers,” Khanna says….”

# Call for Contributions to the 8th Conference on Open Access Scholarly Publishing – OASPA

“The Open Access Scholarly Publishers Association, OASPA, is delighted to be holding the 8th Conference on Open Access Scholarly Publishing (COASP) in the US this year at the Westin Arlington Gateway hotel in Virginia, on the 21st and 22nd September 2016. Now established as a key event in the scholarly publishing calendar, COASP brings together the open access community and major stakeholders to discuss critical issues, developments, innovations, and best practices in the industry.  Further information on the event is available via our website.   As in previous years, the Program Committee have set aside one of the sessions within the conference program to provide six Show & Tell opportunities for showcasing new projects, ideas or initiatives relating to open access publishing.  Organisations are invited to submit a proposal to us for one of the six available 10 minute presentations. All proposals should be submitted by 31 May 2016 at the latest to info@oaspa.org.  The Program Committee will then review the suggestions by the end of June.  Please note that while we will be able to cover the registration costs for the authors of successful proposals, we are not able to cover any of the travel expenses that may be incurred in attending the conference …”

# Open Professionals Education Network | Free support for U.S. Department of Labor TAACCCT grantees

OPEN is funded by the Bill & Melinda Gates Foundation and offers free services aligned with the DOL’s [US Department of Labor’s] grant requirements, designed to support your project. Our partners’ long-standing experience in beneficial service areas will help you achieve your project goals while maximizing the use of your resources.

Support areas include:

• Open Educational Resources (OER) practices & policies
• Creative Commons (CC) licensing
• Universal Design for Learning (UDL) and Accessibility
• Evidence-based online technology use
• Effective course and learning design
• Help finding existing OER…”

# A commentary on Sci-hub: 1. Scholarly publishing is broken

Many of you will already have read of Science Magazine’s account of Sci-Hub, the “pirate” site for scholarly publications. “Science” is often seen as one to the “top three” outlets, along with Nature and Cell. Here’s the original:

And here’s a typical commentary which applauds the research in the article but criticizes the accompanying editorial showing that Science has an ethically flawed business model.

This (and following) blog is one of the most important I have written, and I shall choose words carefully. I shall include facts, opinions, and what I intend to do and not do, and why. I am always open to criticism and try to be polite and constructive. My message is already spreading to more than one posting. This one sets the scene.

This blog is nearly 10 years old. I’d like to believe that I have tried to help make scholarly publishing fit for the 21st Century (C21). I’ve seen Tim Berners-Lee’s vision of the semantic web for scholarship – I was there in CERN in 1994 – and it made sense then and even more so now. I (“I” includes many collaborators, but I use”I” to make it clear that the views here are mine and mine alone. Special Thanks to Henry Rzepa, my wonderful ex-group in Cambridge, Open Knowledge, ContentMine, Blue Obelisk, Crystallography Open Data Base (COD),  librarians in Cambridge and others. Please accept this pronoun).

I write and use Computer Programs.

• I write programs and deposit them in Github/Apache/OuterCurve/BitBucket, etc.. People use them, build on them and acknowledge me. I’ll use “Github” as a generic pointer.
• Others write programs and reposit in Github. I use them and build on them and I acknowledge them.
• I offer constructive criticism.
• I’ve set up the Blue Obelisk where chemists can commit programs, make them interoperable.

This represents the pinnacle of what is possible in C21, with very modest/no funding and a collaborative intent. It works. It makes my heart soar. It’s wonderful. I’m proud to be a small part of it. Everyone wins.

There’s a similar ethos in Wiki/pedia/media/data. (“Wikipedia”).  Everyone can be a Wikipedian – all you need is to do it.

• I have used Wikipedia for enhancing my knowledge and have contributed my knowledge to it.
• I have used systems built by Wikipedians and I have contributed systems for use.
• I have been to Wikipedia meetings, worked with the Wikipedians.
• I promote Wikipedia.

I have been on the Advisory Board of Open Knowledge Foundation since it started. I have used OKF resources. I have contributed to them.

And so on… groups that I use and would contribute to if I had the time

• Open Streetmap
• Geograph
• Open Corporates
• MySociety
• Mozilla

Most of these are cash-starved, and find innovative ways to generate enough income to make their primary products free and Open. (“Open” == “Free to use, free to re-use, free to re-distribute”, “Free as in speech”, “Free as in liberty”).

The C21 makes the sharing knowledge communities possible. It’s very very wonderful. If you don’t understand what I am saying then maybe you have to try it. Contribute to Wikipedia, add a photo to Geograph, “Write to Them” to your MEPs, FOI with “What do They Know”.

And you can start to be a C21 citizen at a very early age. The knowledge century is a wonderful place to live.

BUT…

Scholarly publishing in the 21st Century (C21) is completely broken

It’s a 20 Billion USD industry.

That’s

$20,000,000,000 of citizens’ money It’s probably 1000 times more money than the average project mentioned above. Maybe even more. So how is it broken? (If you know and love Github or Stackoverflow use them as a comparison of the wonderful against the broken). I am not going to apportion blame to publishers, libraries, authors, funders. They have all, wittingly or unwittingly contributed to one of the most dysfunctional knowledge systems on the planet. And it matters. It’s not just money. It’s: • Human lives. I coined the phrase “Closed Access Means People Die”. I have been attacked for it. If it makes you feel more comfortable “Open Knowledge saves lives”. • The planet. To work out what is going to happen from anthropogenic (“human-made”) change of all sorts we need as much knowledge as possible. We are being deprived of it. • Citizens. It’s an unacceptably divisive system. Only 1% of the UK population (those in universities) are involved. Most of those are passive. They get told what to do. Citizens – doctors, teachers, politicians, businesses, taxi-drivers are excluded. Yes! Until taxi-drivers have a right to be involved in scholarship we are a divisive society. • Values. It’s distorting values. Ask a librarian/researcher/administrator why scientific publications should be free to everyone and you’ll probably get: 1. “The Funders require it”. 2. “You’ll get more citations if you publish Open Access”. The moral and ethical imperative (“we have a responsibility to make knowledge free to everyone”) often isn’t mentioned. • Community. For me “Open” is not primarily about money, it’s about working together, and being transparent. … and in detail … • It’s criminally expensive. Publishers receive ca$5000 for each paper. It’s largely public or personal (e.g. student fees) money. It actually costs around $300 (administration: reviewers don’t get paid, authors don’t get paid). Maximum. Many people publish for$0 and give their time and marginal resources. That money could be used for research, could be used for teaching. The amounts spent on journal subscriptions in the UK (ca \$1billion/year is similar to the cost of postgraduate education).
• It’s criminally inefficient. Much of the work is carried out by humans when C21 systems could do the same for 5% of the cost. Stackoverflow manages 10 million questions.
• It’s criminally slow. Some papers take years to appear. Postings to repositories take fractions of a second. The great Physics/Maths site arxiv can do this. But many publishers take years to publish a paper.
• It’s elitist and probably corrupt. It stresses “top” journals. I am all for public competition and the best winning, but this isn’t that. It favours “top” institutions (I heard of one large research org that negotiates with a “top” publisher on how many papers they are allowed per year – before the work is done).
• It destroys the real purpose of publication. I believe that science requires that you tell the world (not an elite) – fully (not in summary):
• What you did
• Who did it
• Why you did it
• How you did it (verification and re-use)
• When you did it
• Where you did it
• what you discovered (or didn’t discover)

And invite the world to confirm/refute/help/criticize continually and continuously. Some competition is valuable. But competition has now become an end in itself and is destroying the other values.

I am involved in trying to bring these ideas into scholarly publishing. I have very largely been unsuccessful, when measured against the other Open activities where I have ben able to help create the C21 knowledge community.

• I’ve developed semantics for chemistry (Chemical Markup Language, CML). Chemists, chemical publishers, universities ignore this.
• I’ve developed open data bases (CrystalEye/COD). Publishers and universities ignore these.
• I’ve prototyped semantic publication . Ignored.
• I’ve pushed for a fully Open community of scientific scholarship. The Blue Obelisk. Ignored.
• We’ve developed new tools for University Libraries. (Open Bibliography and BibJSON). Ignored.
• I’ve campaigned for reform of Copyright. Ignored by academia and publishers
• I’ve developed tools for using machines to help everyone read the scholarly literature. Active opposition.

Everyone blames everyone else. Some suffer, some get super-rich. Everyone is losing out.

It must change. Completely. If not from within, then from without.

Sci-hub is one of the external factors that could change scholarly publishing.

Completely.

