# Critical Data Literacy, why and how: an Open Education Resource (OER)

This OER was developed for presentation at the Data Power 2017 conference held at Carleton University, Ottawa, Ontario June 22 – 23. This is primarily a framework for how to go about teaching critical data literacy in the student-centered tradition of Freire, supplemented by the work of Tygel and colleagues. A sample introduction developed for Canadian university students, and a few references, are included. My definition of critical data literacy as used in this OER is:

critical data literacy is the ability to understand and critique how the beliefs and values of people and groups (including government) influence what data is created, how it is shared and how it used by to tell compelling stories by storytellers whose beliefs and values shape the kind of stories they choose to tell and how they tell the stories. Critical data literacy also means having the ability to create and tell one’s own stories using data.

This OER is released under the terms of copy and share – with love, my latest statement on sharing which can be found at the bottom of this post. The Freire tradition of popular education involves starting with the lived experience of students. In this context, following is what I recommend for anyone who wishes to develop a full critical data literacy program based on the framework. I think that this framework could be adapated for teaching at any level, from community-based learning (led by community groups or organizers or as a participatory action research project) to graduate classes (that’s where I teach). Some of the details would change. For example, if you are teaching at a university, some parts of the process are likely to involve formal evaluation (marking), but if you are teaching to the general public or a community group, this would not make sense. Please adjust as needed for your own context.

The overall approach:

1. Identify your student group. Think about what kinds of issues or problems they might have that could potentially be helped by data, the kind of data stories they might be familiar with.
2. Develop an introduction to critical data literacy. Tygel and colleagues (2015, 2016) found that this was necessary. One way to think about the difference between critical data literacy and basic literacy (reading) is that people who do not know how to read in recent history are likely to be aware of the existence of reading as something that other people do. Data literacy / critical data literacy is not at this point in time as broadly understood as reading.
3. Plan the 3 phases of the framework that follow directly from the Freire tradition: investigation, thematisation, and problematisation. In these phases, students should lead the learning process (active learning), pursuing problems and questions of their own devising. The teacher’s role is to provide support.
4. Plan a systematisation (synthesis) wrap-up approach that makes sense for your student group. In some cases this might be left for the students to decide the approach, and the teacher only helps to guide the students towards this closure. In a formal educational setting, this might involve a pre-determined assignment.
5. Implement!

The 5 phases are: introduction, investigation, thematisation, problematisation, and systematization (synthesis). Details follow. The introduction section is the most fully developed as this is the only teaching portion that involves imparting knowledge; all others begin with the student.

Introduction

As noted above, it will not be obvious to everyone what data literacy or critical data literacy is or why they should learn about it, as discovered by Tygel and colleagues (2015, 2016). For this reason, an introduction to the topic may be helpful. In this phase one might invite in guest speakers from the community who use data in their storytelling and/or to provide examples of data storytelling. This is also where definitions of critical data literacy could be introduced. In addition to my definition (see above), I like this definition of data literacy from the Data Journalism Handbook  because it includes the element of critical thinking; not every definition that I have seen includes this, to me a significant omission.

data literacy is the ability to consume for knowledge, produce coherently and think critically about data [emphasis added] (Grey, Bounearu & Chambers (2012)

Introduction slide 1

This slide presents two conflicting stories that are told using basically the same underlying data. One of these (tax freedom day) will be very familiar to the audience, while the other will not as it is relatively new.

This slide illustrates two very different perspectives on taxation in Canada. On the left, we see the Fraser Institute’s Tax Freedom Day. The Fraser Institute, a right-wing think tank, uses data to tell their story of over-taxed Canadians, working more than half the year for the government before earning a dime for themselves. The idea of tax freedom day has been very effective in Canada over the past few decades. On the right, we see one of the images from the Broadbent Institute’s report The Brass Tax which was published very recently. The left-wing Broadbent Institute challenges the numbers behind the Fraser Institute’s analysis, argues that Canadian taxation is pretty reasonable compared to other countries, and presents a different picture. In this case this graph illustrates Canada’s progressive approach to taxation and makes the point that people with little to no income pay no income tax and only a small percentage of Canadians age 25 to 54 are in the top income tax bracket, paying more than 30% of income in taxes. These are 2 groups of people with a different vision of what society should be like, using the same underlying data to tell 2 very different stories. If we go directly to the data source, will this eliminate the impact of the storyteller? Let’s see.

The following two slides might be more effective as a live demo or in-class lab activity.

One of the underlying datasets used by both groups is the statistics provided by OECD. If you go to the OECD website there are some neat online tools that let us quickly visualize data in different ways. One of the elements of the data story told by the Fraser Institute is that individual families pay too much in taxes. I wondered if there has been any change in the portion of tax revenue contributed through personal and corporate taxes over the years. Here is what I found using the OECD website. It seems that more tax is gathered from personal rather than corporate taxes, but over the past few years the portions don’t seem to have changed much. This is the default view that shows trends from 2000 – 2015. If this had fit what I already believed, I suspect I would have stopped here. But I seem to recall a relative decrease in corporate taxation over the past few decades so I decided to slide the years covered…
And this is what I found. If we slide the start date of the visualization tool back to 1965, it does appear that there has been a relative increase in tax revenue from personal sources and a relative decrease in tax revenue from corporate sources. This shows how easy it would be for two people with different perspectives on what a data trend is likely to be to go to exactly the same dataset and make a slight change to how the data is visualized to tell two very different stories.

Kaulfuss uses OECD data to tell a story about U.S. health care spending on a blog called Beyond Economics. The story  is that the U.S. spends two and a half times the OECD average on health. It doesn’t surprise me that the U.S. spends more than the OECD average on health, but I am surprised that the difference is this much. What I found even more intriguing is the author’s claim that U.S. public spending on health is above the OECD average. Who knew? Disclaimer: what I am doing here is presenting stories told through data, I have not examined the data itself so cannot comment on the accuracy of the story.

Wikipedia has a section called Health Care in Canada. Here in Canada many of us – I include myself – think highly of our public health care system, and I think I see this perspective here. This section states that “most health statistics in Canada are at or above the G8 average” in a paragraph that is followed by the table pictured above. The table draws from a number of data sources and appears to me to demonstrate above-average data literacy skills. However…

When you look at the statistics that are presented and calculate the averages, Canada is above average on 3 of 8 measures. This is not “most”. This suggests a need for data literacy. If you look at the specific measures where we are above average, an argument can be made that being above average in life expectancy is a good thing. However, an above-average infant mortality rate is probably not such a good thing. We are also slightly above average on % of government revenue spent on health, but what does this mean and is it a good thing? Looking at some of the areas where we are below average –such as the  # of doctors & nurses per population & % of health costs paid by government – might give one reason to re-consider our narrative that we Canadians are above average in public health. This illustrates a need for critical data literacy. In other words, our beliefs might be getting in the way of understanding what is our existing data tells us.
Some approaches and suggestions  for creating a meaningful introduction
The reason for the introduction section is because as Tygel and colleagues found there is a need to start with some explanation about what data is and how people use it. There are many potential approaches to introducing the topic such as having guest speakers come to explain how they make use of data and data visualization.
Suggested sample activity
One activity that would fit here is to have students create their own demonstrations. In the case of tax data, students could do a google search for tax data and limit to images. This search will yield lots of material to work on. The idea is to have students find out who created the visualization and what the story behind the visualization is. If this is done for evaluation purposes, I recommend a pass/fail approach because student success will depend a lot on which images are selected. Being there to hear the findings of all the students is sufficient for this learning exercise. A teacher in an area where computers are not readily available could bring in copies of materials to work with. This introductory phase may be more relevant for some student groups than others, for example university students. If this doesn’t seem to fit, you could skip this stage.

Investigation, Thematisation & Problematisation

Two key points to keep in mind in these 3 phases: 1) the core focus should be lived experience not imparting abstract knowledge and 2) teaching involves helping people seek and find answers. This is important because in teaching data literacy one might be tempted by starting with the data, teaching people how to understand and work with data. Keynote speaker Gwen Phillips (and BC First Nations data activist) at the Data Power 2017 provided a brilliant example of why not to start with the data: the existing data might not be what is wanted at all. As Gwen said, we should measure what do want (e.g. youth vitality) not just what we don’t want (e.g. teen suicide). This introduces a challenge to develop new metrics, but one that seems worthy of pursuit. If we start by teaching about existing data we risk missing the opportunity to identify gaps like this.

Disclosure: in understanding the following 3 phases, it may be helpful to know that although I teach at a university and am very engaged in pedagogy, I do not have an education degree and do not consider myself an expert on pedagogy. If you would like to know more about how to teach in the Freire tradition, I suggest starting with the Tygel references below and if desired supplementing with general educational books and articles covering the Freire tradition. My contributions below are limited to providing a very quick introduction and making the connection with critical data literacy.

Investigation

The investigation phase is the first of 3 phases that follow the Freire tradition. The idea is to begin with lived experience, with real-world problems. If this approach is used for self-teaching by community groups independently or with an academic consultant as a participatory action research project, this is closest to the classic Freire scenario and the best example of a pure investigation stage. To modify this for an education setting, students could either choose problems or issues of direct interest to them, for example student debt, or they might brainstorm a particular target group whose problems they are familiar with such as First Nations, a salient issue here in Canada as many of us struggle to implement the recommendations of our Truth & Reconciliation commission. Classroom activities could include a brainstorm session, individual or small group reflection, and/or presentation of the results of the investigation stage.
Thematisation

Thematisation is the first analytic stage. Before searching for what data is available, the idea is to focus on the real-world issue and figure out what kind of data might help to understand or resolve the issue. Examples based on today’s case studies on taxation and health spending could include learning what sorts of taxes are collected and by which governments, or comparing public collective health spending with individual spending.

Problematisation

After thematisation, with some back-and-forth, comes problematisation. This is where we get into research on what kinds of data actually exist that is relevant to the problem, who collects the data and why. Some examples of the types of data sources students might look into at this point if they choose to focus on taxation and spending:

• OECD
• Federation and provincial budgets
• NGO / Think Tank research (e.g. Fraser Institute and Broadbent Institute)
One question that might be raised is whether the existing data is actually sufficient or not, that is, the scope of the inquiry is not focused just on understanding what data is available. but rather what is needed to understand and resolve the problem of interest.
Systematization

Finally, in the systematization stage we put what we have together to come up with an action plan. The nature of the action plan might vary quite a bit depending on the students. An activist community group might want to develop an action campaign or an infographic or other data story to facilitate an existing action campaign. One approach to action could involve citizen data collection. In a graduate class on information policy, like the classes that I teach at the University of Ottawa’s School of Information Studies, developing a policy briefing and recommendations for evaluation as academic work might make sense.
References

Fraser Institute (n.d.). Tax freedom day calculator. Retrieved June 9, 2017 from https://www.fraserinstitute.org/tax-freedom-day-calculator

Grey, J., Bounegru, L., & Chambers, L. (2012). Data Journalism Handbook. OKFN. (as cited in Tygel & Kirsch 2016)
Kaulfuss, R. (2017). Health care: human right or expensive entitlement? Beyond economics. Retrieved June 15, 2017 from  https://beyondeconomics.org/2017/03/15/health-care-human-right-or-expensive-entitlement/
OECD (2017), Tax revenue (indicator). doi: 10.1787/d98b8cf5-en (Accessed on 15 June 2017)
Shillington, R. & Shaban, R. (2017). The brass tax: busting myths about overtaxed Canadians. Ottawa: Broadbent Institute. Retrieved June 9, 2017 from http://www.broadbentinstitute.ca/the_brass_tax

Tygel, A.; Campos, M.; De Alvear, C. (2015). Teaching open data for social movements: a research strategy. The Journal of Community Informatics 11:3. Retrieved June 19, 2017 from http://ci-journal.net/index.php/ciej/article/view/1220/1165

Tygel, A.; Kirsch, R. (2016). Contributions of Paulo Freire for a critical data literacy: a popular education approach. The Journal of Community Informatics 12:3 pp. 108 – 121. Retrieved June 19, 2017 from http://ci-journal.net/index.php/ciej/article/view/1296.

# Comparing OA article processing fees with academic salaries

Update September 12: data added from Ayalew’s (2012) research indicating that gross annual salary for an associate professor in Ethiopia is 56,400 Ethiopian birr, or approximately $3,400 US (ETB 16.6 = U.S.$1.00). Assuming that at least the equivalent of $400 is paid in tax, that means that a scholarly publisher charging$3,000 for an open access article processing fee is paying a sufficient amount to cover a full-time salary for an Associate Professor in a country like Ethiopia. My recommendation is that research funders supporting work in the developing world should consider carefully before supporting gold open access article processing fees. Do the math. Instead of paying publishers like Elsevier and Springer $3,000 to make a single article open access, why not require green open access archiving and use the funds to support a full-time academic in the developing world instead? My advocacy interests include both open access and sustainable scholarly publishing. One area in need of further attention is the impact of publisher costs (whether subscription or open access) on resources to support academics (salaries and research funding). One concern that I have with the push for payment of gold open access article processing fees (the RCUK approach) is that publishers are likely to set standards based on the UK approach which then impact scholars around the world, because scholarly publishing is global in nature. One area of research in need of attention is comparison of open access article processing fees with academic salaries. This is particularly needed in the developing world, but even in the developed world it is worth noting that the$3,000 OA fee charged by some scholarly publishers is more than many adjuncts in the US are paid to teach a course (see for example this table by the American Psychological Association): http://www.apa.org/workforce/publications/12-fac-sal/table-32.aspx

Considering the high percentage of voluntary labour involved in scholarly publishing (unpaid writing for research articles, peer review and much of the editing), it is possible to publish using models that involve extremely low dollar costs. See, for example, Shieber’s post An efficient journal detailing how the Journal of Machine Learning publishes at an average of $10 per article. Valuable as open access is, the research and writing needs to be done and we academics should be asking ourselves whether we want universities and research funders to be paying a$3,000 article processing fee, or whether we would prefer DIY at $10 or so per article and directing the funds to support academic salaries and research grants instead. Results of a research collaboration by the “Laboratory for Institutional Analysis (LIA) from the Higher School of Economics (HSE) in Moscow, Russia, and the Boston College Center for International Higher Education (CIHE) in the United States in collaboration with experts from 28 countries around the world” links here finds that even after adjusting for differences in currency using the purchasing power parity (PPP) method, the local equivalent of$3,000 is more than a month’s salary for an academic at the top rank for 7 of the 28 countries studies (25%), and more than a month’s salary at 11 of the 28 countries studied (more than a third). Even the PLoS ONE fee of \$1,350 is more than month’s salary at the PPP equivalent at the top academic rank in 3 countries (China, Russia Federation, and Armenia). Method: go to Quantitative Data, download the data for academic salaries (only PPP provided), sort by rank (I used top rank and rank 3).

This is very preliminary data, shared in the spirit of open research and also as an illustration of the kind of research that is needed before global approaches to paying for open access are even considered. This data would appear to suggest that at some countries, not necessarily even the world’s poorest countries, even local equivalents of open access using article processing fees at the rates of PLoS ONE or Springer would cost more than a month’s salary for a top ranked academic.

As a next phase, I’m thinking of looking into the actual data not adjusted for currency differences to compare the actual academic salaries with open access article processing fees.

Reference

Ayalew, E. (2012). Salary and incentive structure in Ethiopian Higher Education. In: Altbach, P. ; Reisberg, L.; Yudkevich, Ml; Androushchak, G., and Pacheco, I. , eds. Paying the professiorate: a global comparison of compensation and contracts.  Routledge: New York and London, 2012. [Note: I have access to a copy of this book through the excellent collection at the University of Ottawa’s library. How many academics in less affluent areas would have ready access to a work like this?]