2018: best year yet for net growth of open access

Highlights: this edition of the Dramatic Growth of Open Access features charts that illustrate that 2018 showed the strongest growth to date for open access by number of documents searchable through BASE, PubMedCentral, arXiv, DOAJ, texts added to Internet Archive, and journals added to DOAJ.

A Bielefeld Academic Search Engine (BASE) search encompasses over 19 million more items at the end of 2018 – about 60% or 11.4 million are open access. This brings the total documents searchable through BASE to close to 140 million (about 84 million open access)

PubMedCentral added 600,000 items in 2018, and surpassed a milestone of 5 millions items this year (now 5.2 million items)

arXiv added 140,000 items in 2018, bringing the total close to 1.5 million items.

The DOAJ article search grew by more than 800,000 articles in 2018, bringing the total number of articles searchable through DOAJ to about 3.6 million.

2018 was also the best year to date for DOAJ net journal growth. 1,707 journals were added for a current total of over 12,000 journals. Negative growth in 2016 illustrates the impact of the DOAJ weeding / re-application process.

4.5 million more texts are available through Internet Archive, bringing the total close to 20 million.

The following table provides data on total number of items as of December 31, 2018, growth in 2018 by number and percentage, in descending order by growth in percent. In interpreting percentage growth, consider total and numeric growth. bioRxiv nearly doubled in size this year, indicating a fairly new but healthy and rapidly growing service; but this reflects growth of about 20 thousand documents, a small fraction of the 600,000 items added by PMC for a 13% growth rate.

2018 growth (percent)   2018 total 2018 growth (number)
110% bioRxiv # articles  39,570 20,748
74% Internet Archives software 346,320 147,320
39% SCOAP3 # article 25,163 7,121
30% Internet Archive texts 19,570,789 4,570,789
30% DOAJ searchable articles 3,624,154 832,453
29% Internet Archive audio (recordings) 4,909,271 1,109,271
28% DOAB # books 13,253 2,938
25% Internet Archive collections 389,778 76,778
24% Internet Archive videos (movies) 4,701,129 901,129
21% DOAJ journals searchable at article level 9,479 1,670
16% PubMed keyword search: cancer- last year – free fulltext 65,766 9,154
16% DOAJ # journals 12,434 1,707
16% BASE # documents 139,476,029 19,092,606
16% Internet Archives television 1,733,000 233,000
15% DOAB # publishers 285 38
14% PMC journals some articles OA 758 94
13% PMC # items 5,200,000 600,000
13% RePEC books 39,086 4,449
12% RePEc journal articles 1,785,335 193,994
12% PubMed keyword search: cancer- last 2 years – free fulltext 153,875 16,026
11% BASE # content providers 6,732 694
11% Internet Archive webpages (in billions) 345 35
11% RePEC online (fulltext) (downloadable as of March 2012) 2,528,831 249,692
11% PubMed keyword search: cancer- last 5 years – free fulltext 391,691 37,230
10% arXiv  http://arxiv.org/  1,482,864 140,139
10% OpenDOAR http://www.opendoar.org/ # repositories 3,799 335
9% RePEC chapters 51,278 4,360
9% PMC journals selected articles 4,908 414
8% RePEc working papers 858,360 64,235
8% Total Policies (ROARMAP) 960 71
8% PubMed keyword search: cancer – free fulltext 1,027,541 75,655
7% PMC journals immediate free acccess 1,964 132
7% DOAJ # countries 129 8
7% PubMed keyword search: cancer – last year – all results 184,024 11,341
6% PMC journals deposit all articles 2,217 124
6% Elektronische Zeitschriftenbibliotek – Electronic Journals Library  # journals that can be read free of charge 62,681 3,441
5% PubMed keyword search: cancer – last 5 years – all results 839,960 43,565
5% PMC journals actively participating 2,578 132
5% PubMed keyword search: cancer – all results 3,784,638 192,126
5% PubMed keyword search: cancer – last 2 years – all results 357,370 17,970
4% RePEc software components 4,206 178
4% Internet Archive live music (concerts) 192,534 7,534
3% PMC journals all articles OA 1,529 51
3% ROAR # repositories 4,735 138
2% PMC journals NIH portfolio 335 6
-12% Internet Archive images 3,247,253 -452,747

Full data can be downloaded from the Dramatic Growth of Open Access dataverse: https://hdl.handle.net/10864/10660. This post is part of the Dramatic Growth of Open Access series. From 2004 – June 30, 2018 the series was posted on a quarterly basis. As of September 30, 2018, I continue to gather data quarterly but plan to release the series less frequently, most likely on an annual basis.

Canada’s Statutory Review of the Copyright Act, 2018: my individual submission

Update December 10: the original was over the 2,000 word count. Following is the final version under 2,000 words, followed by the original in case anyone is interested in what was cut. 
House of Commons
Standing Committee on Industry, Science and Technology
Individual Submission to: Statutory Review of the Copyright Act
December 10, 2018
Dr. Heather Morrison
Associate Professor
School of Information Studies, University of Ottawa
This is an individual submission drawing on my background as Principal Investigator of Sustaining the Knowledge Commons (SKC), a research program funded through a SSHRC Insight Grant. The goal of SKC is to develop evidence to support the economic transition of scholarly publishing from demand to supply side to support the potential unprecedented public good of a global knowledge commons,  a collective sharing of the knowledge of humankind, free for anyone to access and free for all who are qualified to contribute to. I also draw from my broader interest in and value of the arts and culture, and my expertise in the area of development of information policy to support such values. This submission strongly supports the expansion of fair dealing exceptions to copyright that were introduced in the 2012 Copyright Modernization Act. I present evidence to support the retention of sections 29, 29.1, and 29.2 in their present form. In brief, broad fair dealing exceptions for education (section 29) are inherently generally fair because the majority of works consumed are produced and/or supported by people in the educational sector who do the work for the public good rather than private gain. In the university context, academic researchers and students create the vast majority of works consumed and, with some exceptions, do not expect or receive economic benefit from their copyrightable works. There is a strong and growing trend for academic researchers to make work freely available to everyone as a public good. Provincial education systems develop curriculum, approve and sometimes commission textbooks. Schools and school boards pay for textbooks and the majority of other resources used by students. I acknowledge that there are creators whose work is important to Canada (local authors, artists, musicians and publishers) who do not benefit from K-12 or post-secondary budgets. For this sector, I recommend development of a plan to provide direct support for Canadian creators working outside of the formal educational systems (K-12, universities) to replace the current copyright collectives and to develop new models of creative collaboration to take advantage of recent technological developments to develop new, more effective approaches to support for creativity in Canada. I make this recommendation on the grounds that direct subsidies to creators would be more cost-effective than the current system that is in effect an indirect subsidy. Currently, we very limited support to creators in an indirect and non-transparent way as follows: federal transfers to provinces for education; provincial transfers to universities, colleges, and school boards (supplemented by student tuition in the post-secondary sector); purchase of resources and payment of additional fees or licenses for additional copying to copyright collectives; disbursement of $ from copyright collectives (subtracting administrative costs) to a variety of types of copyright owners, ranging from global for-profit corporations to individual creators. I argue that we should investigate whether it would be less costly and more effective for Canada’s creative community to simply give $ directly to creators through generous subsidies. For clickable links see https://poeticeconomics.blogspot.com/2018/12/canadas-statutory-review-of-copyright.html.


The creative contributions of Canada’s educational sector
(Why broad fair dealing exceptions for education (section 29) are inherently generally fair)
This section will focus on universities, my area of expertise. As noted in the Universities Canada (2018) submission to the Copyright Act Review, there are more than 75,000 faculty members and university teachers in Canada’s university system, making this the largest group of Canadian authors. This data understates the creative contributions of universities as it does not take into account the work of students. Most graduate students and other early career researchers are required to publish and many are prolific researchers and authors. For example, graduate students today are typically required to publish their theses (monograph-length works) online through their institutional repository as open access, that is, free to read. For example, from 2010 – 2018, University of Ottawa students posted more than 10,000 theses in the University of Ottawa’s institutional repository: https://ruor.uottawa.ca/handle/10393/242
Students as well as faculty publish articles in peer-reviewed journals, book chapters, and scholarly monographs. Students are taking advantage of the ease of publishing on the internet to develop their own open access peer-reviewed scholarly journals. Two examples: Stream: Inspiring Critical Thought, currently in its tenth year of production: http://journals.sfu.ca/stream/index.php/stream. And the University of Ottawa Journal of Medicine | Journal Médicale de l’Université d’Ottawa http://www.uojm.ca/
In the classroom, many professors like myself are taking advantage of current technologies to develop pedagogical approaches based on active rather than passive learning. In a passive approach, students absorb information provided in textbooks and lectures. In active learning, students are doing hands-on work including conducting and publishing research. Examples from my classes: students create an open access journal in which they peer-review and publish their term papers and create and publish professional open access blog posts.
As a faculty member and author, my experience is fairly typical. The cost of doing my research is paid for by my salary as a university professor and my research grant funds. Both are heavily subsidized by the Canadian taxpayer, and student tuition fees today accounts for about half of university budgets. As an author, I receive and expect no remuneration when I publish peer-reviewed journal articles or book chapters. As a peer reviewer, I receive and expect no remuneration. I did receive modest royalties from sales of a scholarly monograph, however from a financial point of view I (and many other authors of scholarly monographs), I would be much farther ahead had I devoted the time required to write the book to a minimum wage job. In retrospect, I wish that I had published this material as an open access book or wiki as the publisher is no longer actively marketing the book. By transferring copyright to the publisher, I made my work less accessible and far more difficult to update.
I seek to make all of my academic writing open access (free to read for everyone), a steadily growing trend in academia globally. As of December 2018, there are over 12,000 fully open access, peer-reviewed scholarly journals listed in the Directory of Open Access Journals https://doaj.org/According to industry research (Ware and Mabe, 2015) there are about 34,550 peer-reviewed journals published worldwide; the percentage of these that are fully open access is about a third. Many more journals provide free access to back issues after an embargo period.
The Directory of Open Access Repositories, OpenDOAR, lists over 3,800 repositories worldwide http://v2.sherpa.ac.uk/view/repository_by_country/countries=5Fby=5Fregion.htmlThe Bielefeld Academic Search Engine  https://www.base-search.net/about/en/provides a cross-search service of repositories and journals and lists over 120 million documents from over 6,000 sources, of which about 60% are open access, about 72 million documents. This free access to academic works, supported by academic authors, universities, and research funders is a reflection of the fact that academic research is not inspired by, and does not require, the economic benefits of copyright. The moral rights of copyright (attribution and integrity of the work) are important to academic authors.
The traditional scholarly publishing industry is in the process of transitioning from demand side economics (purchase of books and journal subscriptions) to production-based funding. Today, the largest open access journal publishers by number of fully open access journals are all traditional commercial scholarly publishers (Morrison, 2018). As of the end of November 2018, Elsevier has 347 fully open access journals and offers an open access publishing choice for 2,040 other titles, almost all of their journals (Elsevier, 2018). As of December 7, 2018, the Directory of Open Access Books https://www.doabooks.org/ lists 285 publishers; 3 of the 4 publisher sponsors listed on their website are traditional commercial scholarly publishers (Brill, Springer Nature, and DeGruyter).
There is a related growing trend towards open access to educational materials, in order to lower costs for post-secondary students and school boards and permit for updating and local modification of materials. Some resources for further information:
·       e-campus Ontario https://www.ecampusontario.ca/
·       BCcampus https://bccampus.ca/
·       Open School BC https://www.openschool.bc.ca/k12/
In addition to transitioning traditional formats developed before the internet (e.g. journals and books), faculty and students are beginning to explore the potential of the digital medium and the internet. My most important publications today are published primarily in non-traditional formats. Since 2004, I have maintained a scholarly blog called The Imaginary Journal of Poetic Economicshttp://poeticeconomics.blogspot.com/where I post, for example, contributions like this to government consultations. In 2014, I developed a research blog for the Sustaining the Knowledge Commons https://sustainingknowledgecommons.org/(SKC) project. The SKC blog provides a venue for myself and my student research assistants to publish early findings. This is excellent training for students as it gives them a means and incentive to develop and publish small sub-research projects. Data gathered through the SKC project is published as open data in the OA APC dataverse: https://sustainingknowledgecommons.org/open-access-article-processing-charges-apcs/These new formats require access to technology and hosting services, but there is no longer any need for a publishing intermediary as was the case when academic work relied on the print medium and postal system.
Transition support for creation
As a prolific academic author, I never have been and never will be represented by Access Copyright. The work of Access Copyright is antithetical to the purposes of my work (to serve the public good). I recommend the abolition of Access Copyright and redirection of funding by universities and school boards to directly support open access in academia and the K-12 sector (e.g. funding for open access monographs, journals, and textbooks).
This will not meet all of the needs of Canada’s creative communities. In my opinion, Canada’s artistic creators (authors, artists, musicians, independent publishers and intermediaries who work closely with and for the artistic community) deserve our respect and support, and are not well served by our outmoded approach to copyright collectives. I argue the continuing existence of these collectives is counter-productive as it entrenches outmoded approaches and business models when creators would be better served by developing new types of collectives to take advantage of new technologies to create new relationships with society and consumers.
For example, imagine a collective of Canadian musicians working together to develop packages of music for use in places like coffeeshops and restaurants (perhaps based on genre) that is integrated with the business’ wifi so that customers can:
·       instantly purchase and download a piece of music they enjoy
o   connect with the website of the musician(s)
o   find out about upcoming live gigs
o   purchase merchandise
·       suggest musicians / music to include
I argue that this approach would be far more effective in creating a healthy and productive relationship between our artists and society than the current impersonal, non-transparent approach involving requiring payment of tariffs that positions copyright collectives as impersonal, non-transparent enforcers of rights.
To accomplish this vision, I recommend financial support for artists in the transition phase as well as targeted funding to develop mechanisms for transition such as research and education on the use of new technologies to support more productive artist / society relationships. As I explain in the introduction to this submission, direct support would likely be more cost-effective than the current system of indirect, non-transparent subsidies.
References
Elsevier (2018). Pricing. Retrieved November 27, 2018 from https://www.elsevier.com/about/policies/pricing
Morrison, H. (2018). Global OA APCs 2010 – 2017: major trends. Connecting the knowledge commons: from projects to sustainable infrastructure. Elpub 2018: the 22nd international conference on electronic publishing. Toronto June 22 – 24, 2018. Retrieved December 7, 2018 from https://elpub.episciences.org/4604

Universities Canada (2018). The changing landscape of Canadian copyright and universities: Universities Canada’s submission to the Standing Committee on Industry, Science and Technology’s statutory review of Canada’s Copyright Act / June 2018
Ware, M. & Mabe, M. (2015). The STM report: an overview of scientific and scholarly journal publishing. The International Association of Scientific, Technical and Medical Publishers. Retrieved Dec. 4, 2018 from https://www.stm-assoc.org/2015_02_20_STM_Report_2015.pdf

Following is the original version that I was not able to submit as it was over the 2,000 word count.

House of Commons

Standing Committee on Industry, Science and Technology
Individual Submission to: Statutory Review ofthe Copyright Act
December 10, 2018
Dr. Heather Morrison
Associate Professor
School of Information Studies, University of Ottawa
This is an individual submission drawing on my background as Principal Investigator of Sustainingthe Knowledge Commons (SKC), a research program funded through a SSHRC Insight Grant. The goal of SKC is to develop evidence to support the economic transition of scholarly publishing from demand to supply side to support the potential unprecedented public good of a global knowledge commons,  a collective sharing of the knowledge of humankind, free for anyone to access and free for all who are qualified to contribute to. I also draw from my broader interest in and value of the arts and culture, and my expertise in the area of development of information policy to support such values. This submission strongly supports the expansion of fair dealing exceptions to copyright that were introduced in the 2012 Copyright Modernization Act. I present evidence to support the retention of sections 29, 29.1, and 29.2 in their present form. In brief, broad fair dealing exceptions for education (section 29) are inherently generally fair because the majority of works consumed are produced and/or supported by people in the educational sector who do the work for the public good rather than private gain. In the university context, academic researchers and students create the vast majority of works consumed and, with some exceptions, do not expect or receive economic benefit from their copyrightable works. There is a strong and growing trend for academic researchers to make work freely available to everyone as a public good. Provincial education systems develop curriculum, approve and sometimes commission textbooks. Schools and school boards pay for textbooks and the majority of other resources used by students. I acknowledge that there are creators whose work is important to Canada (local authors, artists, musicians and publishers) who do not benefit from K-12 or post-secondary budgets. For this sector, I recommend development of a plan to provide direct support for Canadian creators working outside of the formal educational systems (K-12, universities) to replace the current copyright collectives and to develop new models of creative collaboration to take advantage of recent technological developments to develop new, more effective approaches to support for creativity in Canada. I make this recommendation on the grounds that direct subsidies to creators would be more cost-effective than the current system that is in effect an indirect subsidy. Currently, we very limited support to creators in an indirect and non-transparent way as follows: federal transfers to provinces for education; provincial transfers to universities, colleges, and school boards (supplemented by student tuition in the post-secondary sector); purchase of resources and payment of additional fees or licenses for additional copying to copyright collectives; disbursement of $ from copyright collectives (subtracting administrative costs) to a variety of types of copyright owners, ranging from global for-profit corporations to individual creators. I argue that we should investigate whether it would be less costly and more effective for Canada’s creative community to simply give $ directly to creators through generous subsidies. For clickable links see https://poeticeconomics.blogspot.com/2018/12/canadas-statutory-review-of-copyright.html.
-->

The creative contributions of Canada’s educational sector
(Why broad fair dealing exceptions for education (section 29) are inherently generally fair)
This section will focus on universities, my area of expertise. As noted in the Universities Canada (2018) submission to the Copyright Act Review, there are more than 75,000 faculty members and university teachers in Canada’s university system, making this the largest group of Canadian authors. This data understates the creative contributions of universities as it does not take into account the work of students. Most graduate students and other early career researchers are required to publish and many are prolific researchers and authors. For example, graduate students today are typically required to publish their theses (monograph-length works) online through their institutional repository as open access, that is, free to read. For example, from 2010 – 2018, University of Ottawa students posted more than 10,000 theses in the University of Ottawa’s institutional repository: https://ruor.uottawa.ca/handle/10393/242
Students as well as faculty publish articles in peer-reviewed journals, book chapters, and scholarly monographs. Students are taking advantage of the ease of publishing on the internet to develop their own open access peer-reviewed scholarly journals. A few years ago while pursuing my doctoral studies I had the pleasure of participating as an editor, reviewer, and journal manager of the student created and led peer-reviewed open access journal Stream: Inspiring Critical Thought, currently in its tenth year of production: http://journals.sfu.ca/stream/index.php/stream.  Similarly, medical students at the University of Ottawa have created and run a student-led open access journal, the University of Ottawa Journal of Medicine | Journal Médicale de l’Université d’Ottawa http://www.uojm.ca/
In the classroom, many professors like myself are taking advantage of current technologies to develop pedagogical approaches based on active rather than passive learning. In a passive approach, students absorb information provided in textbooks and lectures. In active learning, students are doing hands-on work including conducting and publishing research. Following are just a few examples from my classes (master’s level, information studies): a publishing class created an open access journal in which they peer-reviewed and published their term papers; students in an introductory class create and publish their own professional blog and posts, in which they publish independent research; and this fall students collaboratively conducted and wrote a literature review and analysis of current issues on a particular topic in the field.  
As a faculty member and author, my experience is fairly typical. The cost of doing my research is paid for by my salary as a university professor and my research grant funds. Both are heavily subsidized by the Canadian taxpayer, and student tuition fees today accounts for about half of university budgets. As an author, I receive and expect no remuneration when I publish peer-reviewed journal articles or book chapters. As a peer reviewer, I receive and expect no remuneration. I did receive modest royalties from sales of a scholarly monograph, however from a financial point of view I (and many other authors of scholarly monographs), I would be much farther ahead had I devoted the time required to write the book to a minimum wage job. In retrospect, I wish that I had published this material as an open access book or wiki as the publisher is no longer actively marketing the book. By transferring copyright to the publisher, I made my work less accessible and far more difficult to update.
I seek to make all of my academic writing open access (free to read for everyone), a steadily growing trend in academia globally. As of December 2018, there are over 12,000 fully open access, peer-reviewed scholarly journals listed in the Directory of Open Access Journals https://doaj.org/According to industry research (Ware and Mabe, 2015) there are about 34,550 peer-reviewed journals published worldwide; the percentage of these that are fully open access is about a third. Many more journals provide free access to back issues after an embargo period.
The Directory of Open Access Repositories, OpenDOAR, lists over 3,800 repositories worldwide http://v2.sherpa.ac.uk/view/repository_by_country/countries=5Fby=5Fregion.htmlThe Bielefeld Academic Search Engine  https://www.base-search.net/about/en/provides a cross-search service of repositories and journals and lists over 120 million documents from over 6,000 sources, of which about 60% are open access, about 72 million documents. This free access to academic works, supported by academic authors, universities, and research funders is a reflection of the fact that academic research is not inspired by, and does not require, the economic benefits of copyright. The moral rights of copyright (attribution and integrity of the work) are important to academic authors.
The traditional scholarly publishing industry is in the process of transitioning from demand side economics (purchase of books and journal subscriptions) to production-based funding. As recently as 2014, very few of the large traditional commercial scholarly publishers were reflected in the Directory of Open Access Journals (DOAJ). The largest, Elsevier, had 8 titles listed in DOAJ. Today, the largest open access journal publishers by number of fully open access journals are all traditional commercial scholarly publishers. The largest is Springer Nature (including subsidiary BioMedCentral), and second largest is Elsevier (Morrison, 2018). As of the end of November 2018, Elsevier has 347 fully open access journals and offers an open access publishing choice for 2,040 other titles, almost all of their journals (Elsevier, 2018). As of December 7, 2018, the Directory of Open Access Books https://www.doabooks.org/ lists 285 publishers; 3 of the 4 publisher sponsors listed on their website are traditional commercial scholarly publishers (Brill, Springer Nature, and DeGruyter).
There is a related growing trend towards open access to educational materials. For example, provincial K-12 and post-secondary education is in a process of transitioning from support for textbooks through curriculum development, assessment, and purchase, to funding production for textbooks so that they can be open access, reducing the costs of education for post-secondary students and school boards in K-12. In addition to lowering costs, open access educational resources are typically open for transformation. This makes it possible for educators to update sources such as textbooks, link to additional resources, or customize to meet local needs. For example, a good basic textbook developed in the U.S. could be modified to reflect the Canadian context and include local examples, or the reverse for a textbook developed in Canada. Some resources for further information:
·       e-campus Ontario https://www.ecampusontario.ca/
·       BCcampus https://bccampus.ca/
·       Open School BC https://www.openschool.bc.ca/k12/
In addition to transitioning traditional formats developed before the internet (e.g. journals and books), faculty and students are beginning to explore the potential of the digital medium and the internet. My most important publications today are published primarily in non-traditional formats. Since 2004, I have maintained a scholarly blog called The Imaginary Journal of Poetic Economicshttp://poeticeconomics.blogspot.com/where I post, for example, contributions like this to government consultations. In 2014, I developed a research blog for the Sustaining the Knowledge Commons https://sustainingknowledgecommons.org/(SKC) project. The SKC blog provides a venue for myself and my student research assistants to publish early findings. This is excellent training for students as it gives them a means and incentive to develop and publish small sub-research projects. Data gathered through the SKC project is published as open data in the OA APC dataverse: https://sustainingknowledgecommons.org/open-access-article-processing-charges-apcs/These new formats require access to technology and hosting services, but there is no longer any need for a publishing intermediary as was the case when academic work relied on the print medium and postal system.
To summarize this section: the fair dealing exception for education (29) is inherent generally fair because the educational sector is a net creator. Academic faculty are the largest single group of creators of copyrightable works. The creation of copyrightable works by post-secondary students is substantial if not fully known, and the trend is towards more creation of copyrightable works by students. The post-secondary and K-12 sectors are moving towards production-based support of educational resources such as textbooks to provide for free access to enhance the affordability of the educational system. Creation in the educational sector is done primarily for the public good, and the economic benefits of copyright are generally unnecessary, as illustrated by the growing trend towards open access, that is, access to anyone that is free of charge, and the constrictions on readership associated with copyright protection for economic reasons is counter-productive to the creation and sharing of knowledge.
Fair dealing exceptions for research by academics (29.1) and news reporters (29.2)are necessary so that individuals and organizations cannot use copyright in a way other than originally intended, e.g. to suppress criticism or to deny what they have said in the past. For example, my research involves studying the pricing and business models of scholarly publishers based largely on information posted on their websites. This material constitutes the evidence on which my research is based, and I need to be able to publish excerpts of this material to substantiate my claims. Publishers do not always appreciate this research, for example when I document price increases far beyond inflation. Overly strong copyright without this balance would make it possible for publishers to weaken criticism by suppressing evidence.
Transition support for creation
As a prolific academic author, I never have been and never will be represented by Access Copyright. The work of Access Copyright is antithetical to the purposes of my work (to serve the public good). I recommend the abolition of Access Copyright and redirection of funding by universities and school boards to directly support open access in academia and the K-12 sector (e.g. funding for open access monographs, journals, and textbooks).
This will not meet all of the needs of Canada’s creative communities. In my opinion, Canada’s artistic creators (authors, artists, musicians, independent publishers and intermediaries who work closely with and for the artistic community) deserve our respect and support, and are not well served by our outmoded approach to copyright collectives. I argue the continuing existence of these collectives is counter-productive as it entrenches outmoded approaches and business models when creators would be better served by developing new types of collectives to take advantage of new technologies to create new relationships with society and consumers.
For example, imagine a collective of Canadian musicians working together to develop packages of music for use in places like coffeeshops and restaurants (perhaps based on genre) that is integrated with the business’ wifi so that customers can:
·       instantly purchase and download a piece of music they enjoy
o   connect with the website of the musician(s)
o   find out about upcoming live gigs
o   purchase merchandise
·       suggest musicians / music to include
I argue that this approach would be far more effective in creating a healthy and productive relationship between our artists and society than the current impersonal, non-transparent approach involving requiring payment of tariffs that positions copyright collectives as impersonal, non-transparent enforcers of rights.
To accomplish this vision, I recommend financial support for artists in the transition phase as well as targeted funding to develop mechanisms for transition such as research and education on the use of new technologies to support more productive artist / society relationships. As I explain in the introduction to this submission, direct support would likely be more cost-effective than the current system of indirect, non-transparent subsidies.
References
Elsevier (2018). Pricing. Retrieved November 27, 2018 from https://www.elsevier.com/about/policies/pricing
Morrison, H. (2018). Global OA APCs 2010 – 2017: major trends. Connecting the knowledge commons: from projects to sustainable infrastructure. Elpub 2018: the 22nd international conference on electronic publishing. Toronto June 22 – 24, 2018. Retrieved December 7, 2018 from https://elpub.episciences.org/4604
Universities Canada (2018). The changing landscape of Canadian copyright and universities: Universities Canada’s submission to the Standing Committee on Industry, Science and Technology’s statutory review of Canada’s Copyright Act / June 2018  Ware, M. & Mabe, M. (2015). The STM report: an overview of scientific and scholarly journal publishing. The International Association of Scientific, Technical and Medical Publishers. Retrieved Dec. 4, 2018 from https://www.stm-assoc.org/2015_02_20_STM_Report_2015.pdf
-->

The trouble with scientific faith, in this case, in AI

This post was originally posted to the Global Open Access List (GOAL) on July 12, 2018 with the following title:  Why translating all scholarly knowledge for non-specialists using AI is complicated. http://mailman.ecs.soton.ac.uk/pipermail/goal/2018-July/004896.html
To view the full conversation, go to the GOAL archives for July 2018. 
 
On July 10 Jason Priem wrote about the AI-powered systems “that help explain and contextualize articles, providing concept maps, automated plain-language translations”… that are part of his project’s plan to develop a scholarly search engine aimed at a nonspecialist audience. The full post is available here:

http://mailman.ecs.soton.ac.uk/pipermail/goal/2018-July/004890.html

We share the goal of making all of the world’s knowledge available to everyone without restriction, and I agree that reducing the conceptual barrier for the reader is a laudable goal. However, I think it is important to avoid underestimating the size of this challenge and potential for serious problems to arise. Two factors to consider: the current state of AI, and the conceptual challenges of assessing the validity of automated plain-language translations of scholarly works.
Current state of AI – a few recent examples of the current status of AI:
Vincent, J. (2016). Twitter taught Microsoft’s AI chatbot to be a racist asshole in less than a day. The verge. 

https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist

Wong, J. (2018). Amazon working to fix Alexa after users report bursts of ‘creepy’ laughter. The Guardian https://www.theguardian.com/technology/2018/mar/07/amazon-alexa-random-creepy-laughter-company-fixing

Meyer, M. (2018). Google should have thought about Duplex’s ethical issues before showing it off. Fortune http://fortune.com/2018/05/11/google-duplex-virtual-assistant-ethical-issues-ai-machine-learning/

Quote from Meyer: 
As prominent sociologist Zeynep Tufekci put it: “Google Assistant making calls pretending to be human not only without disclosing that it’s a bot, but adding ‘ummm’ and ‘aaah’ to deceive the human on the other end with the room cheering it… horrifying. Silicon Valley is ethically lost, rudderless and has not learned a thing.”
These early instances of AI applications involve the automation of relatively simple, repetitive tasks. According to Amazon, “Echo and other Alexa devices let you instantly connect to Alexa to play music, control your smart home, get information, news, weather, and more using just your voice”. This is voice to text translation software that lets users speak to their computers instead of using keystrokes. Google’s Duplex demonstration is a robot dialing a restaurant to make a dinner reservation. 

Translating scholarly knowledge into simple plain text so that everyone can understand it is a lot more complicated, with the degree of complexity depending on the area of research. Some research in education or public policy might be relatively easy to translate. In other areas, articles are written for an expert audience that is assumed to have spent decades acquiring a basic knowledge in a discipline. It is not clear to me that it is even possible to explain advanced concepts to a non-specialist audience without first developing a conceptual progression. 

Assessing the accuracy and appropriateness of a plain-text translation of a scholarly work intended for a non-specialist audience requires expert understanding of the work and thoughtful understanding of the potential for misunderstandings that could arise. For example, I have never studied physics. If I looked at an automated plain-language translation of a physics text I would have no means of assessing whether the translation was accurate or not. I do understand enough medical terminology, scientific and medical research methods to read medical articles and would have some idea if a plain-text translation was accurate. However, I have never worked as a health care practitioner or health care translation researcher, so would not be qualified to assess the work from the perspective of whether the translation could be mis-read by patients (or some patients).
In summary, Jason and I share the goal of making all of our scholarly knowledge accessible to everyone, specialists and non-specialists alike. However, in the process of developing tools to accomplish this it is important to understand the size and nature of the challenge and the potential for serious unforeseen consequences. AI is in very early stages. Machines are beginning to learn on their own, but what they are learning is not necessarily what we expected or wanted them to learn, and the impact on humans has been described using words like ‘creepy’, ‘horrifying’, and ‘unethical’. The task of translating complex scholarly knowledge for a non-specialist knowledge and assessing the validity and appropriateness of the translations is a huge challenge. If this is not understood and plans made to conduct rigorous research on the validity of such translations, the result could be widespread dissemination of incorrect translations. 
best,
Heather Morrison
Associate Professor, School of Information Studies, University of Ottawa
Professeur Agrégé, École des Sciences de l’Information, Université d’Ottawa
Heather.Morrison@uottawa.ca

Ceased and transferred publications and archiving: best practices and room for improvement

In the process of gathering APC data this spring, I noticed some good and some problematic practices with respect to journals that have ceased or transferred publisher.

There is no reason to be concerned about OA journals that do not last forever. Some scholarly journals publish continuously for an extended period of time, decades or even centuries. Others publish for a while and then stop. This is normal. A journal that is published largely due to the work of one or two editors may cease to publish when the editor(s) retire. Research fields evolve; not every specialized journal is needed as a publication venue in perpetuity. Journals transfer from one publisher to another for a variety of reasons. Now that there are over 11,000 fully open access journals (as listed in DOAJ), and some open access journals and publishers have been publishing for years or even decades, it is not surprising that some open access journals have ceased to publish new material.

The purpose of this post is to highlight some good practices when journals cease, some situations to avoid, and room for improvement in current practice. In brief, my advice is that when you cease to publish a journal, it is a good practice to continue to list the journal on your website, continue to provide access to content (archived on your website or another such as CLOCKSS, a LOCKKS network, or other archiving services such as national libraries that may be available to you), and link the reader interested in the journal to where the content can be found.

This is an area where even the best practices to date leave some room for improvement. CLOCKSS archiving is a great example of state-of-the-art but CLOCKSS’ statements and practice indicate some common misunderstandings about copyright and Creative Commons licenses. In brief, author copyright and CC licenses and journal-level CC licensing are not compatible. Third parties such as CLOCKSS should not add CC licenses as these are waivers of copyright. CC licenses may be useful tools for archives, however archiving requires archives; the licenses on their own are not sufficient for this purpose.

I have presented some solutions and suggestions to move forward below, and peer review and further suggestions are welcome.

Details and examples

Dove Medical Press is a model of good practice in this respect. For example, if you click on the title link for Dove’s Clinical Oncology in Adolescents and Young Adults a pop-up springs up with the following information:

“Clinical Oncology in Adolescents and Young Adults ceased publishing in January 2017. All new submissions can be made to Adolescent Health, Medicine and Therapeutics. All articles that have been published in Clinical Oncology in Adolescents and Young Adults will continue to be available on the Dove Press site, and will be securely archived with CLOCKSS”.

Because the content is still available via Dove’s website, the journal is not included on the CLOCKSS’ list of triggered content. This is because CLOCKKS releases archived content when it is no longer available from the publisher’s own website.

CLOCKSS Creative Commons licensing statement and practice critique

One critique for CLOCKSS: – from the home page:  “CLOCKSS is for the entire world’s benefit. Content no longer available from any publisher (“triggered content”) is available for free. CLOCKSS uniquely assigns this abandoned and orphaned content a Creative Commons license to ensure it remains available forever”.

This reflects some common misperceptions with respect to Creative Commons licenses. As stated on the Creative Commons “share your work” website:  [your emphasis added] “Use Creative Commons tools to help share your work. Our free, easy-to-use copyright licenses provide a simple, standardized way to give you permission to share and use your creative work— on conditions of your choice“.

The CLOCKSS statement  “CLOCKSS uniquely assigns this abandoned and orphaned content a Creative Commons license to ensure it remains available forever” is problematic for two reasons.
1. This does not actually reflect CLOCKSS’ practice. The Creative Commons statements associated with triggered content indicate publisher rather than CLOCKSS’ CC licenses. For example, the license statement for the Journal of Pharmacy Teaching on the CLOCKSS website states: “The JournalPharmacyTeaching content is copyright Taylor and Francis and licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License”.

2. This would be even more problematic if it did reflect CLOCKSS’ practice. This is because CLOCKSS is not an author or publisher of the scholarly journals and articles included in CLOCKSS. Creative Commons provides a means for copyright owners to indicate willingness to share their work. When a third party such as CLOCKSS uses CC licenses, they are explicitly or implicitly claiming copyright it order to waive their rights under copyright. This reflects an expansion rather than limitation of copyright that may lead to the opposite of what is intended. For example, if one third party is a copyright owner that wishes to claim copyright in order to grant broad-based downstream rights, another third party could use the copyright claim to support their right to claim copyright in order to lock down others’ works. A third party that is a copyright owner providing free access today could use this copyright claim in future as a rationale for toll access. This could come into play if in future toll access seems more desirable from a business perspective.

The CLOCKSS practice of publisher-level copyright (see 1. above) is problematic because Creative Commons first release of CC licenses was in December 2002. Scholarly journal publishing predates 2002 (the first scholarly journals were published in 1665), and not every journal uses CC licenses even today. Retroactive journal-level CC licensing would require re-licensing of every article that was published prior to the journal’s first use of CC licensing.

For example, the copyright statements of volume 1 dated 1990 on the PDFs of the CLOCKSS-triggered Journal of Pharmacy Teaching read: “Journal of Pharmacy Teaching, Vol. l(1)1990 (C) 1990 by The Haworth Press, Inc. All rights reserved”. This suggests that all authors in this journal at this point in time assigned full copyright to The Haworth Press, although actual practice was probably more complex. For example, if any authors were working for the U.S. federal government at the time, their work would have been public domain by U.S. government policy. Any portions of third party works included would likely have had separate copyright. Even assuming the simplest scenario, all authors had and transferred all rights under copyright to Haworth Press, the authors would retain moral rights, hence it would be necessary to contact all of the authors to obtain their permission to re-license the works under Creative Commons licenses.

The idea of journal-level CC licensing is at odds with the idea of author copyright. This confusion is common. For example, the website of the Open Access Scholarly Publisher’s Association Licensing FAQ states: “one of the criteria for membership is that a publisher must use a liberal license that encourages the reuse and distribution of content” and later “Instead of transferring rights exclusively to publishers (the approach usually followed in subscription publishing), authors grant a non-exclusive license to the publisher to distribute the work, and all users and readers are granted rights to reuse the work”. If copyright and CC licenses really do belong to the authors, then journal-level Creative Commons license statements are incorrect.

Even more room for improvement

The above, while leaving some room for improvement, appears to reflect best practices at the present time. Other approaches leave even more room for improvement. For example, in 2016 Sage acquired open access publisher Libertas Academica. The titles that Sage has continued can now be found on the Sage website. The Libertas Academica titles that Sage no longer publishes can be found as trigged content on the CLOCKSS website. However, the original Libertas Academica website no longer exists and there is no indication of where to find these titles from the Sage website.
Titles that were formerly published by BioMedCentral are simply no longer listed on the BMC list of journals. For example, if you would like to know where to find Gigascience, formerly published by BMC, you can find information at the site of the current publisher, Oxford. A note on the SpringerLink page indicates that BMC maintains an archive of content on its website. However, if you look for Gigascience on the BMC journal list, it simply is not listed. It would be an improvement to follow the practice of Dove and include the title, link to the archived content, and provide a link to the current publisher.

Solutions? Some suggestions

If journals and publishers were encouraged to return copyright to the authors when a journal is no longer published, or a book is no longer being actively marketed (in addition to using their existing rights to archive and make works freely available), then authors, if they chose to do so, could release new versions of their works. For example, a work currently available in PDF could be re-released in XML to facilitate text and data-mining, or perhaps updated versions, and authors could, if desired, release new versions with more liberal licenses than journal-level licenses that must of necessity fit the lowest common denominator (the author least willing or able to share).

Education, among the existing open access community, and beyond is needed. First, we need to understand the perhaps unavoidable micro level nature of at least some elements of copyright under conditions of re-use of material. For example, if a CC-BY licensed image by one photographer or artist is included in a scholarly article written by a different person that is also CC-BY licensed, the moral rights, including attribution, are different for the copyright holder of the image and that of the author of the article. In academia, attribution and moral rights are essential to our careers.

The intersection of plagiarism and copyright is different in academia. If one musical composer copies another’s work, copyright law is likely the go-to remedy. If a student presents someone else’s work as their own, academic procedures for dealing with plagiarism will apply, regardless of the copyright status of the work. For example, the musician using a public domain work need not worry about copyright but the student using a public domain work without attribution is guilty of plagiarism and likely to face serious consequences. Evolving norms for other types of creators (amateur or professional photographers, video game developers) may not work for academia.

For CLOCKSS, a statement that all triggered content is made freely available to the public, and that additional rights may be available for some works, with advice to look at the work in question to understand re-use rights, would be an improvement.

Your comments and suggestions? 

This is an area where even today’s best practices are wanting, and the solutions / suggestions listed above are intended as an invitation to open a conversation on potential emerging practices that may take some time to fully figure out. Peer review and suggestions are welcome, via the comments section or e-mail. If you are using e-mail, please let me know if I may transfer the content to this post and if so whether you would like to be attributed or not.

This post is cross-posted to the Sustaining the Knowledge Commons research blog and forms part of the Creative Commons and Open Access Critique series. Comments and suggestions are welcome on either blog.

Dramatic Growth of Open Access June 2018

Congratulations to DOAJ for recently surpassing a milestone of over 3 million articles searchable at the article level!

The outstanding growth story by percentage for the second quarter of 2018 was bioRxiv. From March 31 – June 30, bioRxiv grew by 5,290 articles for a total of 28,070 articles, a growth rate of 23% for this quarter and 129% (more than doubled) over the past year.

38 of the limited set of indicators that I track had growth rates this quarter of 2% or more, equivalent of 8% annual growth, more than double the base rate of growth of scholarly journals and articles of 3 – 3.5% (de Solla Price, 1963; Mabe & Amin, 2001).

My best guesstimates of “how much open access there is” are based on the meta-search tool BASE (the Bielefeld Academic Search Engine). BASE harvests metadata from repositories and open access journals using OAI-PMH. BASE now contains over 130 million documents from 6,444 sources. About 60% are open access; collectively, the OA movement now makes available about 78 million open access documents. This quarter, BASE grew by over 13 million documents for a quarterly growth rate of 11%.

The Internet Archive as usual showed robust growth in a number of services – software components grew by 11% this quarter for a total of just over 230,000; audio recordings grew by 8% and are now over 8.8 million; collections also grew by 8% and are now over 325,000; close to a million texts were added this quarter for a growth rate of 6% and a total of over 16.5 million texts; there are close to 200,000 more videos for a growth rate of 5%; webpages and television each grew by 3%. There was a decrease in the number of images this quarter, down 18% or close to 700,000 images (does anyone know why? – if so please comment), in contrast with the annual growth for images from last year of 115% (more than double).

For OA publishing, this quarter SCOAP3 grew by 1,772 documents or 9%. The Directory of Open Access Books added 826 books and 17 publishers, 7% growth this quarter for both indicators. RePEC added over 2,000 books for a quarterly growth rate of 6% (journal articles and total downloadable items each grew by 2%). DOAJ added about 7 new titles per day  this quarter for a total net growth of 624 journals, a growth rate of 6%; DOAJ also by 6% in the number of journals and articles searchable at the article level, and as noted above, DOAJ surpassed a milestone of over 3 million articles searchable at the article level. DOAJ also added 4 countries this quarter.

A PubMed keyword search for “cancer” limited to the last year returned 5% more free fulltext this quarter. However, the same search with no date limit resulted in a slight (1%) decrease in free fulltext (does anyone know why? If so please comment). The same search with date limits of 5 years or 2 years result in a 2% increase in free fulltext. The number of items in PubMedCentral grew by 4% this quarter, adding 200,000 items for a total of 4.9 million (watch for the 5 million milestone coming soon). PMC journal participation grew by 2% this quarter on several indicators: the number of journals actively participating in PMC, the number of journals providing immediate free access, the number of journals depositing all content in PMC, and the number of journals that deposit some content in PMC.

arXiv grew by 3%; ROARMAP OA policy listings by 2%, as did the total number of journals that can be read free of charge listed by the Electronic Journals Library.

Congratulations and thank you to every one of the thousands of journals, repositories, publishers, and related services, and the millions of authors choosing to make your work open access.  Please accept my apologies for not tracking everyone, due to my human limitations. I encourage everyone to applaud and celebrate your own, and your neighbour’s, accomplishments and milestones – and share them with everyone in the OA movement by joining the OATP tag team.

To download the data go to the DGOA dataverse.

This post is part of the Dramatic Growth of Open Access series.

References

Mabe, M., & Amin, M. (2001). Growth dynamics of scholarly and scientific journals. Scientometrics, 51(1), 147-162.
Price, D. J. d. S. (1963). Little science, big science. New York: Columbia University Press.

Dramatic Growth of Open Access March 2018

As usual open access is showing strong growth in many directions; more open access archives, documents, journals, articles, and books. This quarter focuses on the large number of indicators of growth beyond the usual background growth of scholarly journals and articles of 3 – 3.5% per year. Newcomer bioRxiv, with 21% growth this quarter (equivalent to 84% annual growth) is far above this background growth. This quarter, DOAJ added a net total of 378 journals, or more than 4 journals per day, for a total of 11,105 journals. The number of journals searchable at the article level has increased by 236 for a total of 8,045 journals. The number of articles searchable at the article level is just under 3 million.  The number of documents searchable through BASE grew by 3.5 million for a total of just under 24 million (about 60% of these, over 14 million, are open access). BASE added 121 content providers for a total of over 600 content providers. The percentage of PubMed records for a search for “cancer” that retrieve full-text is 27% overall, with a high of 45% for records published in the last 5 years. The percentage of full-text retrieval is rising at a steady rate.

The overall growth rate for scholarly articles and journals has been fairly steady over the past few centuries, in the range of 3 – 3.5% growth annually (Price, 1963; Mabe & Amin, 2001). As noted in the following chart, in the past quarter alone there have been 43 indicators of growth above that level, at least 1% in the quarter (equivalent of 4% annually). 
 

Quarterly growth percentage Item 03/31/18 Quarterly growth numeric
21% bioRxiv articles 22,780 3,958
13% DOAB books 11,685 1,370
10% SCOAP3 article 19,778 1,736
9% Internet Archive Video 4,128,556 328,556
8% Internet Archive Collections 338,578 25,578
8% Internet Archive Recordings 4,094,506 294,506
7% Internet Archive Television 1,607,000 107,000
7% DOAJ # of articles searchable at article level 2,984,612 192,911
6% DOAB # publishers 261 14
5% PubMed keyword search: cancer- last year – free fulltext 59,695 3,083
5% Internet Archive Texts 15,760,271 760,271
5% RePEC chapters 49,294 2,376
5% Internet Archive Webpages (billions) 325 15
4% Internet Archive Images 3,865,878 165,878
4% RePEc journal articles 1,659,120 67,779
4% PubMed keyword search: cancer- last 5 years – free fulltext 367,509 13,048
4% Internet Archive Software 206,098 7,098
4% DOAJ # journals 11,105 378
3% PubMed keyword search: cancer- last 2 years – free fulltext 142,572 4,723
3% RePEC downloadable articles 2,354,480 75,341
3% ROARMAP # OA policies 916 27
3% DOAJ # articles searchable at article level 8,045 236
3% PubMed keyword search: cancer – free fulltext 980,174 28,288
3% BASE # documents 123,932,954 3,549,531
3% PMC journals with some articles open access 682 18
2% DOAJ # countries 124 3
2% arXiv  articles 1,375,438 32,713
2% PMC select deposit journals 4,588 94
2% BASE # content providers 6,159 121
2% RePEC books 35,263 626
2% PubMed keyword search: cancer – last 5 years – all results 810,024 13,629
2% RePEc working papers 807,624 13,499
2% OpenDOAR # repositories 3,517 53
2% Elektronische Zeitschriftenbibliotek – # journals that can be read free of charge  60,129 889
1% chapters (OECD ilibrary) 60,300 840
1% PubMed keyword search: cancer – all results 3,639,629 47,117
1% PMC journals with immediate free access 1,852 20
1% ROAR # repositories 4,643 46
1% RePEc software components 4,068 40
1% OECD ilibrary tables and graphs  175,500 1,650
1% PMC actively participating journals 2,466 20
1% OECD ilibrary working papers  5,600 40
1% PMC journals that submit all articles 2,108 15

References

 
Mabe, M., & Amin, M. (2001). Growth dynamics of scholarly and scientific journals. Scientometrics, 51(1), 147-162.
Price, D. J. d. S. (1963). Little science, big science. New York: Columbia University Press.
This post is part of the Dramatic Growth of Open Access series.  Full data can be downloaded from here.

Dramatic Growth of Open Access December 31, 2017

Highlights

As usual the open access movement has much to celebrate as 2017 draws to a close, and the whole world has much to look forward to from open access in 2018. As of today there are 4.6 million articles in PubMedCentral, thanks in large measure to constantly increasing participation by scholarly journals; sometime in 2018 this is likely to exceed 5 million. DOAJ added a net 1,272 journals (3.5 / day) and showed even stronger growth in article searchability; a DOAJ milestone of 3 million searchable articles in likely to come in 2018. The Directory of Open Access Books nearly doubled in size and now has more than 10,000 books from 247 publishers. Bielefeld Academic Search Engine, the best surrogate for overall growth, continues to amaze with over 120 million documents, growth of 17.3 million in 2017, a 17% growth rate on a very substantial base; a 20% growth in content providers is an indication of the overall growth of the repository movement. arXiv’s growth rate was 10% while newcomer arXiv clones socRxiv grew by 187% and bioRxiv by 151%. REPEC grew by 13%, SCOAP3 by 32%. Internet Archive grew by 31 billion web pages, 4 million texts, 2.4 million images, 800,000 movies, and 600,000 audio recordings. Following are selected details indicating the content numbers at the end of 2017, 2017 growth by number, percentage, and where warranted, by day.

Full data can be downloaded from here: https://dataverse.scholarsportal.info/dataverse/dgoa

Details (selected)

Totals are from December 31, 2017. Annual growth: Dec. 31, 2017 – Dec. 31, 2017

Free journals

Directory of Open Access Journals

10,727 journals

  • 2017 growth: 1,272 journals (3.5 / day), growth rate 13%

7,809 journals searchable at article level

  • 2017 growth:  1,175 (3.2 / day), growth rate 18%

2,791,701 articles searchable at article level

  • 2017 growth: 391,443 (1,072 / day), growth rate 16%

Milestone to watch for in 2018: 3 million articles searchable at article level

Electronic Journals Library 

59,240 journals that can be read free of charge (2017 growth: 3,678 (10 / day), 7% growth)

Free books

OECD ilibrary

11,690 e-book titles (2017 growth 640 (2 / day), growth rate 6%

Directory of Open Access Books

 10,315 academic peer-reviewed books, 247 publishers

  • 2017 growth: 4,713 (13 / day), growth rate 84%, increase of 80 publishers

See also Internet Archive below

    Repositories
    Bielefeld Academic Search Engine

    120,383,423 documents

    • 2017 growth: 17.3 million documents (47,000 / day), growth rate 17%

    6,038 content providers

    • 2017 growth: 1,015 (3 / day), growth rate 20%

    OpenDOAR

    3,464 repositories — 2017 growth 179, (.5 / day), growth rate 5%

    Registry of Open Access Repositories

    4,597 repositories – 2017 growth 232, 1 / day), growth rate 5%

    PubMedCentral

    4.6 million items – 2017 growth 500,000, (1,370 / day), growth rate 12%

    2,446 journals actively participating in PMC – 2017 growth 120, growth rate 5%

    1,832 journals in PMC with immediate free access – 2017 growth 112, growth rate 7%

    1,478 journals in PMC with all articles open access – 2017 growth 52, growth rate 4%

    664 journals in PMC with some articles open access – 2017 growth 95, growth rate 17%

    2,093 full participation journals (deposit ALL articles in PMC) – 2017 growth 120, growth rate 6%

    329 NIH portfolio journals (deposit NIH funded article in PMC) – 2017 growth 5, growth rate 2%

    4,494 selective deposit (deposit some articles in PMC) – 2017 growth 421 (1 / day), growth rate 10%

    33% of articles keyword “cancer” freefulltext within 1 year of publication (41% at 2 years, 45% at 5 years, 26% with no date limiter)

    Milestone to watch for in 2018: 5 million items

    arXiv

    1,342,725 items – 2017 growth 123,501 (338 / day), growth rate 10%

    SocArXiv

    1,814 preprints – 2017 growth 1,183 (3 / day), growth rate 187%

    bioRxiv

    18,822 article – 2017 growth 11,322 (31 / day), growth rate 151%

    RePEC

    2,279,139 downloadable items – 2017 growth 257,605 (706 / day), growth rate 13%

    Internet Archive

    310 billion webpages – 2017 growth 31 billion webpages (85,000 / day), growth rate 11%

    3.8 million video (movies) – 2017 growth 800,000 (2,192 / day), growth rate 27%

    3.8 million audio recordings – 2017 growth 600,000 (1,644 / day), growth rate 19%

    15,000,000 texts – 2017 growth: 4 million (11,000 / day), growth rate 36%

    3.7 million images – 2017 growth: 2.4 million (6,575 / day), growth rate 185%

    SCOAP3

    18,042 articles – 2017 growth: 4,410 (12 / day), growth rate 32%

     This post is part of the Dramatic Growth of Open Access series.

    Dramatic Growth of Open Access September 30, 2017

    Happy Open Access Week!

    In brief:  best guesstimate – there are approximately 70 million OA documents today (subset of BASE’s 115 million, about 60% OA), with OA documents at BASE growing at a rate of about 1,800 OA documents per day. Where do these come from? Thousands of OA archives – with PubMedCentral the largest by far at 4.5 million articles and active participation by thousands of journals. This quarter by the numbers the DOAJ team set a new record with a net growth of 689 journals of 7.7 titles per day. However, percentage wise the most remarkable quarterly growth was all about archives, with BioRxiv and SocRXiv topping the growth list by percentage, and as usual several sections of Internet Archive well up on the growth list. On an annual basis, Directory of Open Access Books was the fastest growing in terms of both # of books and # of publishers.

    To download the raw data, go to the DGOA dataverse.

    Detail

    Bielefeld Academic Search Engine (BASE), in addition to a great OA search engine, provides the best (if rough) guesstimate of how much we are achieving together, added 2.7 million documents this quarter for a total of 115 million. About 60% of the content in BASE is OA, so this is roughly growth of 160,000 open access items over the past quarter, or about 1,800 documents per day, with a total of about 6.9 million open access documents.

    While the growth of open access is always amazing, sometimes it’s more evident by the numbers, other times by the percentage.  By the numbers: this quarter DOAJ net growth was 689 titles – that’s 7.7 titles per day, a record for DOAJ! As of September 30, DOAJ included
    10,114 titles. As the chart shows, growth in DOAJ at the searchable article level is particularly remarkable, growing from just over 60,000 in 2004 to close to 2.5 million articles today. Over at PubMedCentral there are now 4.5 million documents with close to 7 thousand journals actively contributing content.

    By the percentages, it was a particularly good quarter for open access archives. Newcomers bioRxiv and SocArXiv top the quarterly growth by percentage with growth rates of 25% for bioRxiv (equivalent to doubling in a year) and 22% for SocArXiv (just under doubling in a year). bioRxiv now has 15,000 preprints, SocArXiv close to 1,500. As usual growth at Internet Archive was very impressive, 14% growth in texts (now 14.5 million free texts), 12% growth in the recently added collections category (now close to 300,000 collections) and 9% growth in software (close to 200,000). The RePEC book collection grew by 12% to over 33,000*.

    On an annual basis by percentage, Directory of Open Access Books is at the top for growth both in # of books (65% growth, now close to 9,000 titles) and # of publishers (40% growth, 225 publishers). BASE continues to amaze with a 23% increase in content providers over the past year (edging up towards 6,000), and 15% growth in content (now at 115 million documents).

    * The RePEC book chapter category also showed amazing growth, but perhaps this is an artefact due to a recent clean-up project as numbers were significantly down last quarter.

    This post is part of the Dramatic Growth of Open Access series.

    Dramatic Growth of Open Access June 30, 2017


    Correction: DOAJ will soon surpass 2.5 million articles, not a quarter of a billion as originally reported. 

    Highlights

    Open access continues to demonstrate robust growth on a global scale, in terms of works that are made available open access, ongoing growth in infrastructure (new repositories, journals, book publishers), strong growth for new initiatives such as SocArxiv, BioRxiv, the Directory of Open Access Books, SCOAP3, as well as ongoing strong growth in established services such as BASE, PubMed / PubMedCentral, Internet Archive (check out the new Collections including a Trump archive and FactChecker), DOAJ (almost 2.5 million articles searchable at the article level), RePEC and arXiv. Ongoing growth in infrastructure and OA policy give every reason to expect this growth to be ongoing.

    Open Data Version

    Morrison, Heather, 2014, “Dramatic Growth of Open Access”, hdl:10864/10660, Scholars Portal Dataverse, V17,

    Details

    This edition of the Dramatic Growth of Open Access highlights two of the new kids on the OA block – SocArxiv and BioRxiv, modeled on early OA success story arXiv, topping the quarterly growth by percentage with percentage growth of about 30% each! SocArxiv now has 1,200 documents and BioRxiv 12,800.

    Similarly, a relative newcomer, the Directory of Open Access Books, is in both first and second place for annual growth by percentage with 68% growth for OA books and 40% of OA publishers in the past year for a total of 8,172 open access books and 217 OA book publishers.

    SCOAP3, a global initiative to transform high-energy physics publishing to open access, is showing remarkable growth, 39% in the last year and 8% in the last quarter for a total of 15,790 articles funded.

    To celebrate the growth of all OA services two pictures are presented of the growth of the largest collective OA search engine that I am aware of. Together, the 5,000 content providers who contribute metadata to the Bielefeld Academic Search Engine (BASE) have made available over 112 million documents. Around 60% of these are open access, so the number of OA documents in the world can be said to be somewhere about 67 million. BASE also posts their own online statistics table and chart – check it out here.

    I wish I had the time to applaud and celebrate the growth of each and every OA service, but with 5,000 services contributing to BASE (and others that don’t), if I worked on this 365 days a year I would have to cover 14 initiatives every day. So please feel free to help out by applauding and celebrating the services most relevant to you – the journals in your discipline, your institutional repository, the services you find most helpful to search.

    Below you will find tables listing the top services by quarterly (5% or more) and annual growth (10% or more). For the full numbers download the open data version (link above). As usual Internet Archive is well represented, with 5 items in the list of the top 13 services by quarterly growth and the top 18 services by annual growth. Internet Archive also offers 2 intriguing new services under Collections – a Trump Archive with over a thousand videos and a Fact Checker collection with over 400 items, available at https://archive.org/details/tvhttps://archive.org/details/tv

    Of course PubMed and PubMedCentral are up there in the growth charts, in this quarter for total number of items (5% quarterly growth) as well as what looks (to me) like hesitant new steps by a substantial number of journals, with a 26% increase in the number of contributing journals that provide some OA and a 14% increase in the number of journals that provide OA to selected articles. The number of journals providing immediate free access and/or all articles open access continues to increase, so this is clearly growth, not backsliding.

    DOAJ is included in the top growth services with 14% growth in the number of articles searchable at article level. DOAJ now has over 2.49 million articles searchable at the article level and should soon surpass 2.5 million articles.

    arXiv and RePEC are on the list for strong growth in articles, and ROARMAP for growth in OA policies.
     

    -->

    Quarterly growth (percentage) June 2017
    32% SocArxiv preprints 1,200
    29% BioRxiv all articles 12,280
    18% # of academic peer-reviewed books (DOAB) 8,172
    18% # publishers (DOAB) 217
    8% SCOAP3 articles 15,790
    8% Internet Archive Software 178,635
    7% Video (movies)  (Internet Archive) 3,437,542
    7% Texts  (Internet Archive) 12,821,051
    5% Images (Internet Archive) 1,476,743
    5% # of content providers (BASE) 5,621
    5% Audio (recordings)  (Internet Archive) 3,477,033
    5% Webpages (Internet Archive) (in billions) 298
    5% PubMedCentral (number of items) 4,400,000

     
    -->
    Annual growth (percentage) 06/30/17
    68% # of academic peer-reviewed books (DOAB) 8,172
    40% # publishers (DOAB) 217
    39% SCOAP3 number of archives 15,790
    34% Video (movies)  (Internet Archive) 3,437,542
    33% Internet Archive: Software 178,635
    29% # of content providers (BASE) 5,621
    27% Texts  (Internet Archive) 12,821,051
    26% PMC journals some OA 609
    25% Internet Archive: Images 1,476,743
    20% # of documents (BASE) 112,458,360
    17% Audio (recordings)  (Internet Archive) 3,477,033
    17% RePEc journal articles 1,491,037
    14% # of articles searchable at article level (DOAJ) 2,493,835
    14% PMC select deposit journals 4,296
    13% RePEC downloadable 2,143,844
    13% Total Policies (ROARMAP) 872
    13% PMC # items 4,400,000
    10% arXiv  http://arxiv.org/ 1,278,739

     This post is part of the Dramatic Growth of Open Access Series Feel free to copy and share - with love.  Note that images are compressed by the software to reduce file size, and they are also quickly outdated. You are welcome to use the images, but my recommendation is to download the data and make your own graphics. It's easier than you think with tools like modern spreadsheet software.
     

    Critical Data Literacy, why and how: an Open Education Resource (OER)

    This OER was developed for presentation at the Data Power 2017 conference held at Carleton University, Ottawa, Ontario June 22 – 23. This is primarily a framework for how to go about teaching critical data literacy in the student-centered tradition of Freire, supplemented by the work of Tygel and colleagues. A sample introduction developed for Canadian university students, and a few references, are included. My definition of critical data literacy as used in this OER is: 

    critical data literacy is the ability to understand and critique how the beliefs and values of people and groups (including government) influence what data is created, how it is shared and how it used by to tell compelling stories by storytellers whose beliefs and values shape the kind of stories they choose to tell and how they tell the stories. Critical data literacy also means having the ability to create and tell one’s own stories using data. 

    This OER is released under the terms of copy and share – with love, my latest statement on sharing which can be found at the bottom of this post. The Freire tradition of popular education involves starting with the lived experience of students. In this context, following is what I recommend for anyone who wishes to develop a full critical data literacy program based on the framework. I think that this framework could be adapated for teaching at any level, from community-based learning (led by community groups or organizers or as a participatory action research project) to graduate classes (that’s where I teach). Some of the details would change. For example, if you are teaching at a university, some parts of the process are likely to involve formal evaluation (marking), but if you are teaching to the general public or a community group, this would not make sense. Please adjust as needed for your own context.

    The overall approach:

    1. Identify your student group. Think about what kinds of issues or problems they might have that could potentially be helped by data, the kind of data stories they might be familiar with. 
    2. Develop an introduction to critical data literacy. Tygel and colleagues (2015, 2016) found that this was necessary. One way to think about the difference between critical data literacy and basic literacy (reading) is that people who do not know how to read in recent history are likely to be aware of the existence of reading as something that other people do. Data literacy / critical data literacy is not at this point in time as broadly understood as reading.
    3. Plan the 3 phases of the framework that follow directly from the Freire tradition: investigation, thematisation, and problematisation. In these phases, students should lead the learning process (active learning), pursuing problems and questions of their own devising. The teacher’s role is to provide support. 
    4. Plan a systematisation (synthesis) wrap-up approach that makes sense for your student group. In some cases this might be left for the students to decide the approach, and the teacher only helps to guide the students towards this closure. In a formal educational setting, this might involve a pre-determined assignment.
    5. Implement!

    The 5 phases are: introduction, investigation, thematisation, problematisation, and systematization (synthesis). Details follow. The introduction section is the most fully developed as this is the only teaching portion that involves imparting knowledge; all others begin with the student.

    Introduction

    As noted above, it will not be obvious to everyone what data literacy or critical data literacy is or why they should learn about it, as discovered by Tygel and colleagues (2015, 2016). For this reason, an introduction to the topic may be helpful. In this phase one might invite in guest speakers from the community who use data in their storytelling and/or to provide examples of data storytelling. This is also where definitions of critical data literacy could be introduced. In addition to my definition (see above), I like this definition of data literacy from the Data Journalism Handbook  because it includes the element of critical thinking; not every definition that I have seen includes this, to me a significant omission.

    data literacy is the ability to consume for knowledge, produce coherently and think critically about data [emphasis added] (Grey, Bounearu & Chambers (2012)

    Following is a sample introduction developed for an audience of Canadian university students. If you are teaching a different type of student group, I recommend that you develop your own introduction tailored to your group. If you do and you are willing to share this with others, please send me a link (via e-mail to Heather dot Morrison at uottawa dot ca) or as a comment to this post and I will include a link to your work in this post. If you would like to use this introduction as is, please see the link to the full presentation.

    Introduction slide 1

    This slide presents two conflicting stories that are told using basically the same underlying data. One of these (tax freedom day) will be very familiar to the audience, while the other will not as it is relatively new. 

    This slide illustrates two very different perspectives on taxation in Canada. On the left, we see the Fraser Institute’s Tax Freedom Day. The Fraser Institute, a right-wing think tank, uses data to tell their story of over-taxed Canadians, working more than half the year for the government before earning a dime for themselves. The idea of tax freedom day has been very effective in Canada over the past few decades. On the right, we see one of the images from the Broadbent Institute’s report The Brass Tax which was published very recently. The left-wing Broadbent Institute challenges the numbers behind the Fraser Institute’s analysis, argues that Canadian taxation is pretty reasonable compared to other countries, and presents a different picture. In this case this graph illustrates Canada’s progressive approach to taxation and makes the point that people with little to no income pay no income tax and only a small percentage of Canadians age 25 to 54 are in the top income tax bracket, paying more than 30% of income in taxes. These are 2 groups of people with a different vision of what society should be like, using the same underlying data to tell 2 very different stories. If we go directly to the data source, will this eliminate the impact of the storyteller? Let’s see.

    The following two slides might be more effective as a live demo or in-class lab activity. 
     
    One of the underlying datasets used by both groups is the statistics provided by OECD. If you go to the OECD website there are some neat online tools that let us quickly visualize data in different ways. One of the elements of the data story told by the Fraser Institute is that individual families pay too much in taxes. I wondered if there has been any change in the portion of tax revenue contributed through personal and corporate taxes over the years. Here is what I found using the OECD website. It seems that more tax is gathered from personal rather than corporate taxes, but over the past few years the portions don’t seem to have changed much. This is the default view that shows trends from 2000 – 2015. If this had fit what I already believed, I suspect I would have stopped here. But I seem to recall a relative decrease in corporate taxation over the past few decades so I decided to slide the years covered…
    And this is what I found. If we slide the start date of the visualization tool back to 1965, it does appear that there has been a relative increase in tax revenue from personal sources and a relative decrease in tax revenue from corporate sources. This shows how easy it would be for two people with different perspectives on what a data trend is likely to be to go to exactly the same dataset and make a slight change to how the data is visualized to tell two very different stories. 

    Kaulfuss uses OECD data to tell a story about U.S. health care spending on a blog called Beyond Economics. The story  is that the U.S. spends two and a half times the OECD average on health. It doesn’t surprise me that the U.S. spends more than the OECD average on health, but I am surprised that the difference is this much. What I found even more intriguing is the author’s claim that U.S. public spending on health is above the OECD average. Who knew? Disclaimer: what I am doing here is presenting stories told through data, I have not examined the data itself so cannot comment on the accuracy of the story.
     
    Wikipedia has a section called Health Care in Canada. Here in Canada many of us – I include myself – think highly of our public health care system, and I think I see this perspective here. This section states that “most health statistics in Canada are at or above the G8 average” in a paragraph that is followed by the table pictured above. The table draws from a number of data sources and appears to me to demonstrate above-average data literacy skills. However…
     
    When you look at the statistics that are presented and calculate the averages, Canada is above average on 3 of 8 measures. This is not “most”. This suggests a need for data literacy. If you look at the specific measures where we are above average, an argument can be made that being above average in life expectancy is a good thing. However, an above-average infant mortality rate is probably not such a good thing. We are also slightly above average on % of government revenue spent on health, but what does this mean and is it a good thing? Looking at some of the areas where we are below average –such as the  # of doctors & nurses per population & % of health costs paid by government – might give one reason to re-consider our narrative that we Canadians are above average in public health. This illustrates a need for critical data literacy. In other words, our beliefs might be getting in the way of understanding what is our existing data tells us.
    Some approaches and suggestions  for creating a meaningful introduction     
    The reason for the introduction section is because as Tygel and colleagues found there is a need to start with some explanation about what data is and how people use it. There are many potential approaches to introducing the topic such as having guest speakers come to explain how they make use of data and data visualization. 
    Suggested sample activity
    One activity that would fit here is to have students create their own demonstrations. In the case of tax data, students could do a google search for tax data and limit to images. This search will yield lots of material to work on. The idea is to have students find out who created the visualization and what the story behind the visualization is. If this is done for evaluation purposes, I recommend a pass/fail approach because student success will depend a lot on which images are selected. Being there to hear the findings of all the students is sufficient for this learning exercise. A teacher in an area where computers are not readily available could bring in copies of materials to work with. This introductory phase may be more relevant for some student groups than others, for example university students. If this doesn’t seem to fit, you could skip this stage. 

    Investigation, Thematisation & Problematisation

    Two key points to keep in mind in these 3 phases: 1) the core focus should be lived experience not imparting abstract knowledge and 2) teaching involves helping people seek and find answers. This is important because in teaching data literacy one might be tempted by starting with the data, teaching people how to understand and work with data. Keynote speaker Gwen Phillips (and BC First Nations data activist) at the Data Power 2017 provided a brilliant example of why not to start with the data: the existing data might not be what is wanted at all. As Gwen said, we should measure what do want (e.g. youth vitality) not just what we don’t want (e.g. teen suicide). This introduces a challenge to develop new metrics, but one that seems worthy of pursuit. If we start by teaching about existing data we risk missing the opportunity to identify gaps like this.

    Disclosure: in understanding the following 3 phases, it may be helpful to know that although I teach at a university and am very engaged in pedagogy, I do not have an education degree and do not consider myself an expert on pedagogy. If you would like to know more about how to teach in the Freire tradition, I suggest starting with the Tygel references below and if desired supplementing with general educational books and articles covering the Freire tradition. My contributions below are limited to providing a very quick introduction and making the connection with critical data literacy.

    Investigation

    The investigation phase is the first of 3 phases that follow the Freire tradition. The idea is to begin with lived experience, with real-world problems. If this approach is used for self-teaching by community groups independently or with an academic consultant as a participatory action research project, this is closest to the classic Freire scenario and the best example of a pure investigation stage. To modify this for an education setting, students could either choose problems or issues of direct interest to them, for example student debt, or they might brainstorm a particular target group whose problems they are familiar with such as First Nations, a salient issue here in Canada as many of us struggle to implement the recommendations of our Truth & Reconciliation commission. Classroom activities could include a brainstorm session, individual or small group reflection, and/or presentation of the results of the investigation stage.
    Thematisation
     
    Thematisation is the first analytic stage. Before searching for what data is available, the idea is to focus on the real-world issue and figure out what kind of data might help to understand or resolve the issue. Examples based on today’s case studies on taxation and health spending could include learning what sorts of taxes are collected and by which governments, or comparing public collective health spending with individual spending.
     

    Problematisation 

    After thematisation, with some back-and-forth, comes problematisation. This is where we get into research on what kinds of data actually exist that is relevant to the problem, who collects the data and why. Some examples of the types of data sources students might look into at this point if they choose to focus on taxation and spending:

    • Canada Revenue Agency
    • OECD
    • Federation and provincial budgets
    • Academic Research 
    • NGO / Think Tank research (e.g. Fraser Institute and Broadbent Institute) 
    One question that might be raised is whether the existing data is actually sufficient or not, that is, the scope of the inquiry is not focused just on understanding what data is available. but rather what is needed to understand and resolve the problem of interest. 
    Systematization
     
    Finally, in the systematization stage we put what we have together to come up with an action plan. The nature of the action plan might vary quite a bit depending on the students. An activist community group might want to develop an action campaign or an infographic or other data story to facilitate an existing action campaign. One approach to action could involve citizen data collection. In a graduate class on information policy, like the classes that I teach at the University of Ottawa’s School of Information Studies, developing a policy briefing and recommendations for evaluation as academic work might make sense. 
    References

    Fraser Institute (n.d.). Tax freedom day calculator. Retrieved June 9, 2017 from https://www.fraserinstitute.org/tax-freedom-day-calculator

    Grey, J., Bounegru, L., & Chambers, L. (2012). Data Journalism Handbook. OKFN. (as cited in Tygel & Kirsch 2016)
    Kaulfuss, R. (2017). Health care: human right or expensive entitlement? Beyond economics. Retrieved June 15, 2017 from  https://beyondeconomics.org/2017/03/15/health-care-human-right-or-expensive-entitlement/
    OECD (2017), Tax revenue (indicator). doi: 10.1787/d98b8cf5-en (Accessed on 15 June 2017)
    Shillington, R. & Shaban, R. (2017). The brass tax: busting myths about overtaxed Canadians. Ottawa: Broadbent Institute. Retrieved June 9, 2017 from http://www.broadbentinstitute.ca/the_brass_tax

    Tygel, A.; Campos, M.; De Alvear, C. (2015). Teaching open data for social movements: a research strategy. The Journal of Community Informatics 11:3. Retrieved June 19, 2017 from http://ci-journal.net/index.php/ciej/article/view/1220/1165
     
    Tygel, A.; Kirsch, R. (2016). Contributions of Paulo Freire for a critical data literacy: a popular education approach. The Journal of Community Informatics 12:3 pp. 108 – 121. Retrieved June 19, 2017 from http://ci-journal.net/index.php/ciej/article/view/1296.
    Wikipedia (n.d.). Healthcare in Canada. Retrieved June 15, 2017 from https://en.wikipedia.org/wiki/Healthcare_in_Canada 

    Terms:  Please copy and share with love.
    What does this mean? In brief, I have no interest in using intellectual property law to prevent anyone from using or re-using my work with intentions such as furthering the collective knowledge of humanity (truth with justice and compassion), protecting or restoring the environment or making the conditions of life of humanity better. That is what I mean by with love. If your motives in using my work are something other than love, such as making a profit for yourself or a corporation that you work for, subverting truth, justice, or compassion, then note that I reserve all rights under copyright. Please use attribution as appropriate. For example, if you use my work in an academic or journalist context, you need to acknowledge me as author in order to avoid plagiarism (and confusion).

    This post is part of the Creative Globalization series