61

IMG_4520.jpg

We are living at a time where the relevance of scientific expertise is called into question in many different ways. One challenge comes from the very grounds on which research is built: data. In particular, the idea that with so much data being accumulated in the form of digital “Big Data”, scientific methods are now “driven by data” and are therefore ultimately obsolete. Current popular discourse around “Big data” and related analytics (AI) treats data like neutral facts, and asserts that the accumulation of data, combined with automated systems of analysis, suffices to create reliable knowledge about the world.  

 This raises many hopes but also worries. If Big Data are all that is needed to produce good science, why do we still have scientific controversies, uncertainty and disagreement? What data are relevant to establishing a fact? What about cases where the same data are used as evidence for incompatible claims, as in the case of climate modelling, coronavirus contagion trends or [as we saw in the recent US elections] polling predictions?  

To answer these questions, we need extensive investigation on how researchers across different fields actually use Big Data to produce new knowledge. For example, how biologists, environmental scientists and medical researchers share data on health and environment is crucial to understanding how the spread of disease is affected by seasonal changes. Data extracted from social media and other non-scientific sources informs research in public health; and data extracted from plant science laboratories and field trials around the world are used to improve crop varieties and make sure that enough food is produced to feed the planet. In all these cases, big data are not neutral facts that can be stacked up like Lego bricks to produce knowledge. In other words, data do not speak for themselves, but rather need enormous efforts and expertise to be marshalled into reliable and intelligible knowledge claims. Hence humans – and the methods humans use to analyse data -- play an ever more crucial role in transforming big data into knowledge.    

big_data_matrix.jpg

 This is because what is taken to count as data in the first place depends strongly on context, premises and goals of the investigation. Take for instance the apparently simple and crucial matter of counting death tolls due to the coronavirus pandemic. Such numbers can refer to people who test positive for infection and die in hospital, or include people testing positive who die elsewhere; but could also include people who die of complications caused by the virus or by the social conditions (for instance, lack of medical care or companionship) caused by the pandemic.  All such numbers have validity depending on the measuring system used, the interests of analysts, the goals of related research.  

 Does this mean that data are not trustworthy, that empirical research is tantamount to opinion, and data-intensive scientific methods in the 21st century are not credible? I do not believe this to be the case. Rather, these findings mean that now more than ever, it is indispensable for scientists to identify and counter bias, errors and problematic interpretations of the data. There are two key “ingredients” for this. First, one method does not suffice. Rather, scientists use a plurality of methods with varying combination of models, theories and data; different types of instruments and design; many different forms of contributing expertise. All of these are chosen and integrated in ways closely tailored to the object, goals and context of the research at hand.  

 Second, the application of different methods needs to be flanked by robust systems for research dissemination and scrutiny, including novel forms of data sharing, research venues and institutions devoted to fostering scientific exchange, and publishing platforms encouraging open assessment  - in other words, what we now often refer to as “open science”.  This includes regular engagement with a wide set of expertise, including qualitative social sciences & humanities, as well as non-academic expertise and citizen science. It also includes keeping track of the histories of data: as we increasingly rely on sophisticated forms of data, we need to remember where data come from in the first place, and how they have been processed (Leonelli and Tempini 2020).  

 Through these two ingredients, scientific methods confer the ability to triangulate data sources, compare and scrutinize findings, and systematically and fairly question any interpretation. In other words, scientific methods enable us to contextualize data, what data are taken to represent and why, and the implications of potential uses of data as evidence. This in turn helps researchers to catch misleading/wrong/irresponsible interpretations, and produce reliable knowledge.  

So the opportunity to use big data and Artificial Intelligence is indeed transforming research methods, but in ways that differ from the simplistic vision of data “driving” the research process. Rather, data play a crucial role within research that is ever more diverse and inclusive, as well as more interdisciplinary and distributed. International dialogue among researchers and various publics is foundational to knowledge production, and relies on the availability of well-maintained infrastructures, such as databases and repositories, as well as skills coming from a variety of disciplines and forms of expertise, including especially the social sciences and the humanities. Researchers need to be trained to critically question what role human experience – marshalled in very specific ways - plays in scientific reasoning, and how this is communicated and accounted for within and beyond science. Framed in this manner, scientific methods can continue to challenge facile narratives of relativism and post-truth, and can help to escape what could appears to be a disastrous combination of having too many data but no agreement on what counts as facts.  

Sabina Leonelli


References: Leonelli S and Tempini N (2020) Data Journeys in the Sciences. Springer. (volume available to download in Open Access format thanks to funding from the European Research Council). 

 

Back to the Index

Back to the Index