 Okay. We ought to be streaming now. Wait for a second as usual. Can you catch up? Yes. Okay. Good. All right. Yes, we're live. Welcome back everybody. Day four time flies. Well, I at least I assume it's been flying for you. It's actually even in spite of the effort of chairing. It's been flying for me too just because I've been having so much fun getting to see all these getting to see all these talks. This is really a kid in a candy store four days for me. So thank you all for being back for day four. As per usual, we'll be starting with a keynote. Our last keynote address. Really excited to be able to introduce to you Sabina Leonelli, who I'm sure many of you know, University of Exeter where where she's professor of philosophy and history science, co-director of the fantastic again is center, the Exeter Center for the Study of the Life Sciences. And someone who's been doing really just as I think I said in a promo thread on Twitter just absolutely unmissable work is a required reading on the role of big data in science, the way that data has been changing the production of scientific knowledge. So I'm really excited to think now a little bit about how we can rethink HPS through the lens of digital studies of science. So please take it away. Thanks so much. Thank you very much, Charles. It's really a great pleasure to be here. It's a fantastically exciting event and it's really wonderful to see these different communities coming together and talking to each other at last. So thank you very much to you and all your team for putting this together so effectively. And I think one of the things I'm particularly loved about the conversations over the last few days is the fact that they really pushed to the fore some of the social aspects of science and particularly the study of communication. Many of the talks I've heard unfortunately didn't manage to hear all of them because I was teaching to this week but many of the talks I've heard have really focused also on some metal level aspects of how scientists communicate, how we as researchers communicate and I think that's partly what I'm going to be focusing on also in my talk. And one of the things I really should premise my talk with is the fact that we're going to change gears slightly here and this is because I'm not going to be talking as much about a wonderfully quantitative work but I'm going to be talking about qualitative work I've been conducting looking at what people are doing in different scientific domains when they're working with large amounts of data. And so it will be a meta-level account and I guess it's going to bring back also at a meta-level the importance of bringing together qualitative and quantitative aspects when we are doing digital studies of science. There's been many talks of course appointed to this but I think mine is going to be another step in that direction. So what I'm going to be doing is I'm going to be talking a little bit about some past work that I've been doing looking at dimensions of data intensive research. Then I'm going to be looking at some of my current research topics and then look a little bit into the future and see what of the things that we've been learning through these studies may be applicable as a reflection for all of us here who are involved in this kind of work and many of these thoughts I will start to hopefully unravel as we go along in this kind of big tour of different projects and different ideas and different initiatives but hopefully by the end we will have enough thoughts that we can have a good discussion. So first of all I have been doing quite a lot of work in the past on the role of big and open data within the sciences and the interest that prompted that kind of work where the fact that for instance the consideration that data has acquired the new status within research and I think the very existence of this meeting proves that it has become something that becomes publishable in their own rights and there is renewed attention to try and reuse available data and try and decrease the waste of data or the loss of data. A lot of attention of course to data-driven research planning and discovery to the role of semantics and standards in enabling the movement of data and therefore their reuse and of course to use data in machine readable formats to fuel AI and artificial intelligence tools. And this has been accompanied by the emergence of new institutions and new modes of dialogue around the use of data. To some extent this is really at the moment allowing us to reinvent or at least to rethink how we exchange scientific results, how do we make inferences from results and how do we collaborate across different disciplines and countries. And of course data and data sharing questions have always had that power within the sciences. This is certainly not a new phenomenon but the ways in which for instance the discourse of open science is now being mobilized and the structures and the interests and the ways of financing that sharing at the moment are very particular. One of the things that has also become very topical to think about in this era is how do we incentivize and reward these kinds of shifts towards a more data-driven or data-centric way of working and how do we go beyond the hyper-competitive publishing climate that we've all grown up with where whether you're in the humanities or in the sciences there is a premium put on people who spend less time tinkering with data, and thinking about ways to productively collaborate and more time in producing sole authored or first author publications in high-level journals. And of course more broadly there has been an attention there continues to be an attention to how these new emphasis on the role of data can become an opportunity to improve the relationship between science and society and in fact can become a platform to debate what counts as science in the first place for instance whether technical work is actually part of scientific work what is the role of scientific infrastructures in knowledge production and the role of scientific governance of course and how results should be credited and disseminated in the first place. So my focus since many years now has been on trying to think about how big and open data become mobilized and whether looking at the ways in which these data are mobilized tells us something about how comprehensive how reliable these data are and how they could be reused. And of course that is a very widely shared position for anybody working in contemporary AI that AI needs to be grounded on data that are shared across different contexts and can be reused for a variety of purposes. And this is a major challenge of course particularly when we are looking at large complex and heterogeneous data sets which have been put together by linking many different locations and many different sources and research done on different phenomena without even talking about the interest that that the divergent interest that can underpin such research. And so in trying to understand these issues I've been doing a lot to work on what I call in data journeys so the ways in which data get mobilized across contexts to try to understand these challenges. So one of the ways which I've been doing this I'm sorry some of you have seen this slide before but just to summarize is to look at how data move first of all from sites of data creation, two sites of data mobilization that can typically be any kind of data infrastructures from databases to repositories, data banks etc etc and then two sites of data interpretation. And I've been particularly interested in data journeys where the sites of data interpretation actually differ from the original sites of data creation. So I've been focusing my work philosophically and empirically on how data need to be decontextualized to be able to be mobilized in particular ways. Data cannot travel with absolutely any information that accompanies their production but typically there is a selection of which bits of information may be relevant about data provenance may actually be relevant to their potential reuse and this process of decontextualizing data is really crucial to putting together data resources and of course many of you have provided wonderful examples of this already in the last few days. But very importantly this process of decontextualization typically happens and certainly needs to happen in my view with already an idea in mind, a vision for what potential recontextualization could happen with the data. So what would be needed so that people who are based in different locations and have a different background from the people who have originally produced the data can fruitfully reuse the data and do that in a way that still makes the data set reliable and does justice to the efforts that were produced in generating the data in the first place. So in doing this some of this work started by looking at how data get mobilized in model organism communities which are large communities of researchers typically very highly distributed around the globe who however focus on the study of particular organisms. I devoted lots of work on the distribution of data and the reuse of data on Arabidopsis Taliana and this is a little graph that shows you different elements that are involved in the journey of such data and the reuse of data through a particular database and platforms and I've been paying attention specifically to the relationship between data platforms and types of data reuse and the ways in which the original samples and specimens from which data were garnered were stored. So this would be stock centers for organisms and in the case of plants it would be seed centers and seed banks and also I've been looking in a lot of detail and continuing to study this at systems of semantics. So how do we frame keywords, terminologies, concepts, assumptions that are underpinning not only the ways in which we are sharing data but the ways in which we are recontextualizing them. So what assumptions are linked to the dissemination of data? What keywords, what ontologies are used, computational ontologies to disseminate this data these have huge implications on the systems that we're using and the ways in which they're being reused and in doing that I paid a lot of attention to the fact that what we're looking at in these kinds of systems is fundamentally a type of distributed reasoning. We are looking at systems where there's potentially thousands of people who are indirectly collaborating, different groups of people who are working on different infrastructures, different data resources, different types of repositories, they relate to each other because the data that they're caring for link to many other different types of data and services which are available in the wide data ecosystem but very often these people are not working directly together, they may not know each other, typically they don't know each other and yet each of these groups of people in a sense is custodian for one essential element within this data infrastructure and the ecosystem that is important in trying to understand the ecosystem and what kind of meaning this whole system is assigning to the data. So we're looking at a situation where there's no single individual here who can really understand the whole system, so it's a fundamentally distributed system and this is of course fascinating implications in epistemic terms for how we're thinking about knowledge but also in ethical terms and so I've been very interested in the last few years in also thinking about what does this distributed quality of data infrastructures mean in terms of accountability relating to people's work on this system which of course is very relevant for us in SPS because the question becomes when we're starting to put together data infrastructures and to think about specific kinds of data reuses and visualizations, what does this contribution to the ecosystem of knowledge and data in this field actually contribute to future work and in which way are we accountable for the quality and the reliability of the kinds of data infrastructures and interpretations that we're putting out and of course this relates to an extent to the broader controversy around reproducibility which is raging in scientific fields. I really wonder how many of you would think about this question in relation to your own work and in digital studies and of course we can come back to this question hopefully in discussion. Now one of the more specific studies I've been looking at is how does one think about data linkage so the idea of making different data repositories interoperable with each other so that they still are independent from each other they can still be used autonomously but if somebody is interested in trying bringing together data sets that sit across these repositories then it's actually possible to do so and I've been looking specifically at situations where you're trying to implement data linkage in databases that are targeting data taken from non-humans and databases that are targeting data taken from humans and in fact one of the interesting cases I was looking at is the use of data taken from yeast which you would argue is a rather humble organism compared to the complexity of the human body and this data would in fact be reused for cancer research in human. One of the things that I noticed at that point being extremely important in cases such as this when you're doing a rather complex passage of data linkage is the extent to which the responsibility for the data curation and the expertise used for data curation and annotation are distributed and what does actually mean in terms of transworthiness of the infrastructure that are therefore produced. Again one of the interesting things I kept noticing is that these issues were much more easily handled for databases that still may comprise a lot of data but were managed by relatively few people and were addressing a relatively self-contained community so in the case of fission yeast for instance this is a scientific community which is relatively well contained still contains a few hundred researchers but they tend to more or less know each other at least by name many of them are related through genealogy they've been trained in the same labs and that actually provides a cohesion to the data collection method it also meant that it was pretty much the only community I've worked with where there was a 50-50 split between a curators or database taking responsibility for annotating the data and people that actually are producing data taking some responsibility for how they would appear in the database and contributing some crowdsourcing to the database. Another case that we've been looking at and this was with Niccolò Tempini my collaborator here in Exeter was the case of how does one use databases in a complex field which is a translational field as a site for trusted expertise that allows you to then immediately put this data into action in a very concrete manner and this was the case of cancer genomics and the role played by database like Cosmic which is a database devoted to somatic mutations in curating data from a very wide variety of sources from publications to an experimental work and making them available to people working at the interface with the clinic and personalized medicine so that this data could in fact be actioned into the production of clinical diagnostics and eventually also the diagnosis of individual patients. Another example that we looked at is the very very important role that thinking about information security can have when producing transworthy data infrastructures and to look at this we worked with the secure anonymized information linkage data bank which is based in Wales this is one of the most prominent data banks certainly in England arguably in the world which is dedicated at least ostensibly to the anonymization of a very sensitive health related data so what they've been doing now for almost 20 years is to garner data coming from medical practitioners data coming from cancer registries from clinical trials and from individual research projects and bring them all into their data system and anonymize them at different levels of anonymization as required by the level of sensitivity of this research and this in fact made it possible for researchers that were interested in reusing that data to actually go to this data bank and make an agreement for how they could reuse the data and even if it was such sensitive data and therefore very very difficult to handle. One of the things we found here is that the information security system set up by an infrastructure like this ended up having a very very strong epistemic role in addition of course to being infrastructureally very important to actually allowing people access to the data physically what these systems managed to achieve is to provide and maintain a reliable chain of evidence when it came to the reuse of this data and in fact very often people involved in this data bank ended up providing assistance to researchers that were interested in reusing their data in reframing their own research questions in thinking about other data types that may be relevant for those questions and therefore really becoming an integral part of the research effort rather than just being people who take care of storing the data somehow. Another example we looked at which was particularly fascinating to me was the attempt which is commonly referred to as a data mashup to bring together data from very disparate sources in full awareness of the fact that this generates all sorts of epistemic problems because the communities that produce this data are operating on very different assumptions but with the idea that if one can find some parameters that are common to all these data sets and assume to some extent that these parameters are invariant then it actually makes sense to try and just smash this data set together and try and see what kind of inferences one can make out of them. So the case we looked at was the case of a so-called medmi project which was a platform that was putting bringing together environmental data, health data, socioeconomic data and climate data in the attempt to try and allow for for instance the mapping of the spread of seasonal diseases in England and how this would affect health services and the provision of hospital beds for instance. And what we saw here was that on one end this was predicated on the idea that things like the measurement of locations of patients and also of the spreads of potential pathogens could be pinpointed rather accurately and so by juxtaposing data sets that apply to the same location you could actually start to see interesting correlations for instance between the amount of patients that were being hospitalized because of any kind of pulmonary or respiratory disease and the type of weather conditions and a type of pathogens present in the area at that point in time. What we found here is that actually these invariants which sometimes can look so obvious like location but of course time as we all know is that another big one are in fact highly varied. There are almost infinite ways of measuring locations and many of these ways are instantiated in these kinds of data sets and so researchers busy in this project actually took an inordinate amount of time to reanalyze and curate their data to be able to actually provide reliable inferences. And again this speaks to the incredible amount of hidden manual very often work and the kind of judgments and informed judgments that are necessary when putting together these kinds of very large data resources. I also looked at the question of how does one automate some of these issues and specifically the ways in which imaging data are now being automated in research on plants and that brought us to think a little bit more about what does it mean to relate data to models and to which extent data actually are representing parts of the world and to which extent are is the modeling of data actually responsible for doing this. I'm not going to get into much detail about this now but you know my position is that actually data do not represent at all but it's data models that are doing the representational job but this was certainly another very interesting window into the huge complexity and the enormous amount of constant calibration needed to produce imaging and even in a situation where the conditions under which imaging is produced like in this case one of these automated experimental stations are highly standardized. So what kind of lessons did we learn from some of this work? Well certainly we learned that the technical know-how necessary to manage data at this kind of scale is to some extent elusive and certainly represents a whole new set of skills and expertise and I'm sure many of you will agree with me having done quite a lot of this work which are very very difficult to bring together at the same level with the other types of expertise that are needed for this kind of work and indeed again we come back to this problem of this being highly distributed work where you need people who actually have specific expertise on data management and what the best practice might actually mean in a particular domain and here for instance we just meet a kind of very broad typology of the kinds of data portals that people working in plant science really should be at least aware of when they're thinking about managing their data and as you can see it's a very long list and the specific examples of each of these typologies will vary very quickly in time as some infrastructures become come out of use and others are invented and so it's actually very difficult to keep up with these kinds of developments but at the same time of course you also want to have people who retain the domain knowledge that is absolutely necessary to be able to contextualize this data and we also of course ended up thinking a lot about what kind of incentives are underpinning these kinds of data ecosystem and we found that the incentives at this point are wrong and in fact very worrying when it comes to the robustness of the ecosystem and there is of course a lack of recognition for data creation and donation and that limits the amount of data metadata that are actually available online and that again means that the kind of data collection we can find online are representing very highly selected data types from a very small proportion of available sources and again we've seen that time and again in previous presentations this week I mean thankfully in the case of HPS I think everybody is trying to be very careful in really qualifying the extent to which their original sources can speak to broader issues but of course there's always this creeping problem of representation when doing this kind of data analysis like the wish that the data that we're analyzing in the first place could actually provide a bigger sample of the world than what we actually have in our hands. There's also generally a lack of business models to try and develop an update especially update online databases and this entire in turn limits the comprehensiveness, the usability and the reliability of the contents and so one of the things that we kept finding we keep finding all the time in fact the things that I think things have gotten much worse over the last 10 years is that the selection of data used for particular research domains and questions is based on convenience on the tractability of the data themselves on the social economic conditions of data sharing for instance when people insist on using Twitter data because it's much easier and it can be done to some extent for free rather than using say Facebook data and and these are not really epistemic choices and they're not necessarily methodologically justified choices and of course the fact that we don't really have yet reduced structures and criteria to value this work within academia in general and I think again this also impacts the HTS community to a large extent it means that again we have worries around the quality of data that we're using and how trustworthy the infrastructure that have been set up are given that people are not really acknowledged for doing a good work with them very often and generally we also saw a lot of misalignment between the IT solutions that and engineering and domains are offering and the research needs of people who are trying to put data to work in the field. What this meant is really thinking about the digital landscape as a highly fragile landscape where there is an exponential growth of data quality concerns the sustainability of the landscape that we're working with is really unclear and is certainly limited and this of course also connects to the fact that data travels and data journeys are constantly reshaped by institutional national disciplinary cultural boundaries and at the very same time they challenge those boundaries all the time so we are looking at a landscape that actually is very highly dynamic in all sorts of ways and this again makes it very difficult to keep it well maintained without proper resources and of course this also goes for the sustainability in a more ethical sense because protecting the rights of individuals and communities which may be affected by data reuse requires both local investments but also a long-term shared vision for what it means to actually care for the data subjects that we are dealing with in our work. There's also of course a risk of conservatism when we are kind of keeping reusing the same data sources and what looms large looms large certainly in the uses of data in the sciences is the fact that the vast majority of data certainly in domains like agricultural domain but also in the health domain are private privatized or commodified in various ways and that means that they are either inaccessible to people working for publicly funded institutions or they're actually just very difficult to handle and very expensive to handle so let me say something about the kind of work that I'm doing at the moment that I've been doing the last year or so which builds on some of these insights and tries to apply them in a variety of different ways so one of the things I've come to realize at least in my own little corner of the field is that one of the things that science and technology studies people have been saying for decades and I think in this audience we all pretty much give for granted so the idea that you know what we're looking at here is a social technical problem it's an issue where technical considerations conceptual considerations are completely intertwined with the social conditions under which they are taken and achieved and so I realized that this really was not in any way acknowledged and in any particularly deep way when it came to the setup of many data infrastructures and approaches to data language and data reuse and so I decided to try and focus a little bit more on one of these areas and think about how specifically plant data are now being linked between different infrastructures around the world and try and see whether all the discourse which had followed quite closely around the standards the semantics and the technical features of the software and the systems used to try and enable this kind of data linkage was matched by an attention to the social implications of linking this data and the ways in which this data actually cross national boundaries the ways in which this data in fact belong to particular heritage traditions and you know how this set of consideration was in fact intersecting with the technical realm and so we started this project called Film from Data to Global Indicators as part of the Alan Turing Institute projects to look at how plant data can be reliably linked to those data infrastructure around the world and how this can be done responsibly. So here's an example of one of the cases that I've looked at already since a few years. I did my field study of this case in 2017 and I'm now working with people based in different parts of this case to try and further this. So this is a field in Nigeria at the International Institute for Agriculture in Ibadan close to Lagos in the south of Nigeria and this is one of the main world institutes for sustainable agriculture and is one where a lot of data are collected on crop trials including in this case a trial on Kassava which is a route you can see it here in this picture and which is a very important food staple for much of the population in the global south and what we're looking at here is the ways in which data collected from these kinds of field trials on different varieties of Kassava end up informing research on Kassava variety and improving Kassava to be more resilient to the environment and to plant pathogens but also the commercialization of Kassava is actually implemented. So one of the things that I looked at in a lot of detail and collaborated with is the development of the crop ontology which is a system semastic system that captures information about plant traits. You see an example of this year there's a particular big Kassava route and these are the type of terms associated to some of the plant traits in this case and also looking at the ways in which these kinds of information and this kind of criteria for what people should be looking at when they're collecting data are implemented through the use of field books which researchers and technicians on the ground can actually use to collect data directly as they're going across the fields and as they're looking at newly dug Kassava routes and then exporting directly in a kind of open data manner to servers that when they'll disseminate this data all around the world as I'm indicating here. So from Nigeria and of course there's many many other trials of this type going on around the world, these data are immediately made available to all sorts of institutions that are working on these same issues and to both commercial and public organizations that are interested in this. So what have we learned here? Well I mean in keeping with the questions and the issue around how this is really a socio-technical issue what we learned is that the idea that many people have in this field then one should try and produce some sort of environmental intelligence so try and use AI to monitor environmental conditions and try and produce better results out of the interaction between humans and the environment including agriculture that this idea really needed also a large extent of social intelligence and for instance you know even in very technical questions around how does one put together data sources that speaks to different levels of organization of the same plant and even environmental factors and this is very important when you're trying to examine gene environmental interactions in this kind of research it was important to try and mix quantitative observational and imaging data these meant that one needed common trait descriptors and these meant that they need to be some sort of agreement among stakeholders about what appropriate trait descriptors would be it also meant that there need to be metadata to inform the further contextualization of this data and again there need to be agreement on what kind of metadata to use but of course this is a very difficult kind of consensus to achieve when the cultures of data exchange in this field are so widely diverse both across the scales one is looking at one goes from a little field somewhere in a small hold farming community in Nigeria all the way to multinational corporations that control seed and production in the majority of countries and of course across borders with all any nation having different perceptions of how they want to think about agricultural development and also many international organizations being highly involved in trying to regulate all of this and with their own stakes in this and of course the domain between the relationship between public and private agencies in this space and so actually finding sharing access and reuse agreements amongst stakeholders becomes very fraught and so is defining appropriate data governance to define what constitutes lawful and adequate data use in this case this of course is complicated by the fact that there is a strong interest right now in heritage crops particularly those that come from the global south and have been less studied and they're probably going to become much much more relevant for sustenance and food security in the global north due to climate change and in these cases it's very important to try and acknowledge and reward the provenance of the data the work made by indigenous communities in producing a particular and specimens in particular breeds and also to allocate the responsibility for what kind of uses are made of this data and being able to pinpoint mistakes and concerns all of this is crucial to try and identify who is being excluded by the systems who is not being rewarded appropriately and who is not being credited appropriately but of course as you can imagine given the scales and the different interest involved in the system it becomes very complicated very quickly so there is a strong recognition that trying to compare and integrate data from across the globe is absolutely crucial to producing good results in the area of agriculture and precision medicine and there is a lot of emphasis on trying to develop global data infrastructures and related semantics but of course that raises these questions around how do we acquire consensus to do this so what kind of infrastructures are we looking at very concretely we've been working with infrastructures which are almost by definition transdisciplinary they need to involve experts from technical side as well as from the domains as well as on the territories and the types of crops that are involved here as well as from the humanities and social sciences as in my case who are looking at this from the perspective of what actually constitutes a sustainable infrastructure and these are also deliberately transnational initiatives of course and there's many of them I've cited them here and if you want to have a look at them these are very much at the level of the FAO and United Nations and similar initiatives trying to really bring together different types of data stakeholders and have them communicate with each other now of course there are huge governance challenges in trying to do that there is this underlying idea to many of these initiatives that what we're looking at is an idealization of global plant data resources as a common good that should be harmless for the survival of humans and of the planet but of course there's no such thing as a global data resource these are all highly local data resources and whether or not it makes sense to conceptualize them as a common good is an incredibly fraught question and open data seem to be very important in this space the fact that you can freely share the data and yet this is very tough to reconcile with the fact that you've got to recognize the rights of indigenous groups and local breeders particularly in a situation where these groups may not be very happy to share their data especially not with multinational corporations and indeed also to reconcile this idea of openness with the fact that the vast majority of these data still are produced by pantheirs which are sponsored by agro tech companies and they're privately managing completely inaccessible so that the governance is very often pointed to as a key to address these issues but the question becomes governance among whom established by whom and in this sense again we come back to this question of this being a huge socio-technical problem even the idea of what counts as data production in relation to crops is a very fraught issue because you can think for instance that data production is the result of growing plant specimens that's when you're producing data right or you could say no no this is about when you select the specific strains that data are going to be about everything else doesn't matter or you could say no actually it's about when you design the field trials because this is when you're actually setting up what will count as data for you ultimately you're setting up your instruments and your methods or you can say no no it's about the measurement tools you're using this is really what's producing the data so that's what you need to focus on or you can talk about who is designing data storage and data infrastructures for this data all of these ways of thinking about data production actually potentially could be equally valid but depending on how you answer this question you have different answers to the question of who is the legitimate owner of the data and who has actually control over their use right and the lack of clarity to key questions like this is what in fact leaves the door open to bioprospecting and to what some people are calling digital feudalism so having countries in the global north whether publicly sponsored researchers or private companies that go to the global south and basically predate and appropriate all of their resources relating to plants and this of course is certainly nothing new it builds on certain long exploitation and discrimination that is built in the very food production system that we use every day everywhere in the world so to try and think about this question in a slightly more database way this is going to be the only time where I'm showing you an attempt to do a in our own group a very specific data intensive exercise we started to think about how to map this data infrastructures and we started with the idea of trying to map them geographically and this of course this is all very preliminary this is what I'm doing with Hugh Williamson and Michelle Durengs here at the University of Exeter and it's very much in flux also because we're still trying to find more funding for this but the idea was of course try and locate them geographically most importantly try and locate these initiatives diachronically so thinking about the time at which different initiatives which are which have a lot of responsibilities for how data are being mapped and put together have been developed when did this start who was involved in these initiatives which were key points of change within these initiatives when were they discontinued and so on and so forth so you can see an example for example one potential visualization of the timing of some of these initiatives here and of course we are compiling profiles of each institution and platform that we're looking at so that we can start to put these elements together of course this is far from being something that we can already do a lot of data intensive analysis on but the intention is eventually to be able to do this and partly this is because there's been a lot of work actually done within the plant sciences themselves on mapping data infrastructures and initiatives much much better than anything we may be doing here which is already anyhow very limited to the ones that we are intersected with but what is really lacking is a more historical perspective on these kinds of issues Helen Currier the University of Cambridge is doing wonderful work trying to supply some of this and many other people are thankfully participating but I think this is a space where data intensive approaches could really be helpful to to further these kinds of studies so when it comes to these studies of research initiatives and data-centric initiatives around the world in different domains what kind of solutions did we come up with and what kind of things have been discussing with these different stakeholders when thinking about what could be done to improve the current situation well first of all of course the idea that data stewardship should be valued and should try and foster critical data reuse so rather than just black boxing and whatever is being happening to the data and producing nice visualization that actually cannot be unpacked and disaggregated is really important to provide tools to be able to go back to analyze steps of data processing and data visualization and potentially question them partly because of course you know as I've been saying the context of a data use is going to be really important in determining what matters about the data and what we really want to look at and of course also trying to build responsible practices into technical specifications of data infrastructures there are principles that are being put forward certainly in in the general sciences around this the so-called care and trust principles we can come back to this in discussion if you want and but this seems to be a very important thing to try and do we are right now in the middle actually of carrying out a big workshop where we're trying to put together stakeholders to try and push this a little bit along and the idea that data providers and users should really be involved in the development of data infrastructures is absolutely fundamental that came out from pretty much every study I've done in a qualitative sense in this area and there are initiatives in relation to crops that are trying to stimulate that so-called communities of practice and there is in fact a big exercise at the FAO right now to try and see what this kind of long-term involvement could be and how it can be incentivized and of course more generally and this really brings us back to what we are doing in HPS is trying to encourage as explicit as it may as possible over what are the overarching goals and in fact what are the concepts of human development that underpin data sharing practices and certainly in the case of crop data this is absolutely essential because as some of you would be aware there are very particular ways of thinking about what constitutes agricultural development that tend to take all of the attention and not even be questioned by people who are working in this field and that has potentially really dramatic and bad results so now let's think a little bit briefly about looking into the future and I think it's pretty clear from what I've been saying so far that one of the things that has gotten more and more relevant for me in thinking about these problems is the question of inequity in justice and exclusion in digital systems and from digital systems and also the fact that the ways in which the system trying and and compensate capture environmental variability is extremely diverse and very very often very problematic because variability happens on many many different scales and this needs to be recognizing to these systems and these two things I think are related the fact that we have limited ways of recognizing environmental variability is actually in my view deeply linked to the kind of exclusions and digital divides that are happening in the contemporary in digital realm and so and like in the coming year I will be starting a five-year research project looking specifically and what does it mean to think about open science in a situation where you have very highly diverse research environments where this notion would actually signify different things from different people especially when it comes to the sharing of data systems and I've done some work over the last year as you would expect and probably many of you about that preoccupation on how does one think about these ideas in the case of the current COVID crisis and the pandemic and certainly it's important to mention this at the moment because the pandemic has brought an incredible acceleration of digital transformation those of us who were already in a digital realm somewhat have been absolutely you know pushed forward enormously and there is a lot of evidence of the penetration of digital services decentralization of infrastructures around the world and particularly in the global north and of course this is further amplified by the launch of the FactG networks at the same time there is also a very strong recognition of the fact that this is in fact amplifying the digital divide to an incredible extent because people who have been excluded from digital transformations at the beginning of the pandemic are now finding themselves excluded from any kind of social or medical assistance because now many of these services start to pass like the first digital passport if you've been immunized start to pass through the digital systems and so the World Economic Forum has put out this wonderful idea of the great reset we need to think about digital platforms so that we have a new social contract that honors the dignity of every human being and this was specifically to try and address the marginalization that exists in this sphere but in fact even here we've seen enormously an emphasis on the technical as a great solution as an alternative to tackling the much more difficult questions around social conditions and so across different countries at the beginning of the pandemic we saw a rush towards data science solutions like tracing up some smartphones trying to do data aggregation across countries and of course this increased capacity in already very powerful big technology corporations like Apple and Google who provided the technology for some of this and again it decreased even more a capacity in low-resourced environment that arguably would need it most which brought me to have a preliminary assessment of this great idea of the great reset as a combination of surveillance capitalism and some sort of lip service to social responsibility certainly not something that has really helped much to improve the situation and again I think this is pointed us to the fragility of the system there's huge limits to data access still even when it comes to COVID related data we've seen it happening from the medical front line social services tracing and data interoperability and linkage are not working very well and even now it also highlighted a problematic relationship between different governments governments and corporations and the role of international agencies like the WHO and of course the dire consequences of using digital platforms for surveillance and the lack of trust of people who are exposed to these kinds of systems and I think particularly one thing that I want to really highlight here again it's absolutely trivial hopefully for this audience but I think it's really worth repeating again the fact that when we're doing digital studies there is this lure of neutrality that comes from using some of these technologies and you know hopefully we're all very aware of the fact that data science and digital studies are not neutral they're everything but and unfortunately especially data science as a field keeps very often selling itself as a neutral field that can be put to the service of different masters needs must right and I think this is really problematic we've had a big discussion very recently in the Harvard data science review on this point where I published a paper that was then commented on by lots of different experts in the field coming from different domains and it's quite an interesting exchange if you want to have a look at it trying to consider what does it mean to move away from the lure of neutral and value free data science and instead work collaboratively with domain experts and communities towards forging socially beneficial solutions in a very explicit way so let's spend just a few minutes on implications for HPS I think I've been hopefully pointing to that as I was going along and taking you on this tour of data studies and in research and one of the sets of arguments I've been making over and over and and again apologies for those of you who have heard it before is that situating data analyzing data is a practice of valuing right and it's unavoidably so the procedures through which data are processed and ordered crucially affect their interpretation I hope that at least the quick examples I gave are enough to give you a sense of this and it will be also part of your experience and of course databases do not store some sort of ready made facts it is the value it is the ways in which a evidential values attributed to data that determines the epistemic significance of data towards knowledge claims and of course this evidential value is not just determined by scientific or intellectual considerations it also depends on other forms of value in data that can range from effective forms you like a certain data type more than another economic for instance access to data personal considerations and cultural considerations one argument I've been giving in giving a philosophical reading of this work is that the triangulation of data which is typically seen as one of the great solutions to try and make datasets better does not actually reliably counter the kind of bias that we see in the data landscape and because it doesn't really counter necessarily the bias introduced by the diverse methods of data collection storage dissemination visualization because these are already sitting within a highly layered and unequal landscape and of course in all of this one of the conclusions for me is that pluralism in methods and standards enormously contributes robustness to data analysis and reduces the loss of system specific knowledge and also a big lesson for me in doing this work has been the role of interdisciplinarity and wide engagement on data sources and analysis of course multidisciplinary teams are indispensable many of you are already working in this way it's certainly been my experience that I you know understand very little of many aspects of these things I'm trying to analyze and research and so I hugely benefit from a wide network of collaborators and friends and peers that provides me with input and feedback on what actually is important here it is a particularly important when trying to understand the context and social significance of the data and to do that of course is also very important to try and engage beyond a professional research per se in many cases this can add data to what one is doing and also cannot robustness to existing data and how to contextualize and validate interpretation of data so I want to come back very briefly in the context of APS to the kind of solutions I've hinted at before and when thinking about more scientific efforts in trying to manage and reinterpret big data and see how we fare when we look at those same criteria so the first solution I'd put forward and first these are in no way meant to be comprehensive or exhaustive or anything like that right but the first idea was the idea that data stewardship is needed to foster critical data reuse what does it mean for us well first of all I think for many people in HPS still probably all of you excluded but there is a strong need to recognize what counts as data and why and this I think should happen in a relational framing so really recognizing what is it that you're using as evidence and acknowledging the fact that in fact whatever it is that you're using evidence in fact is really counting as a form of data for your research and therefore that some of these considerations actually apply rather than being something that one can just discard having of course a disclusive debate on which data get to travel and why this is particularly important when we're looking at historical archives where of course we have a huge path dependence about what is being recorded which figures are being given prominence which kinds of journals for instance are being tracked and which not you know the relevance of languages here is very important so for instance the word that Christophe Malatter was presenting two days ago is absolutely wonderful in that way but basically try and keep questioning sources and also the ways in which data have been processed in time and this also means when we are processing data ourselves try to make our own data processing as truckable as possible right provide people who are looking at our data visualizations or at our papers with some indication about how they can deconstruct our arguments and our visualizations and go back to thinking with us and potentially thinking differently from us about how we're using our data sources and this is something I try to do for qualitative data so I've been publishing of course with a lot of work on the ethics of this publishing some of the transcript that came out of the interviews I've been carrying out on Zenodo and one of the big repositories and held by CERN and I will certainly continue to do this on a larger scale as I continue empirical research in these kinds of fields but it's just a small step of course and there's a lot of work still to be done about how do we set up our own infrastructures to make data journey truckable and think about how much standardization do we really need and how this sits compared to local solutions adopted by specific specific specific communities and domains in dealing with the same data and of course back as Turner for instance talked about this when he was talking about the ambiguity of standards and in fact how fruitfully can be to really build on that when we're thinking about data infrastructures and of course think about values as much as possible think about the labor systems that we're using for responsible sharing and the implications that sharing certain data can have in a much broader sense than just our research and this brings me to the second point which is building responsible practice into technical specifications of data infrastructures well I think that actually there continues to be a relatively little work in HPS on the ethics of data sharing thankfully many of you touched upon this in presentations during this conference but I think even in HPS there actually still is a strong attention on the technical side which is not necessarily matched by attention to the side of responsible innovation what does it mean to actually share this data ethically and one of the big components of this of course is to think very carefully about which services we're actually using what are we relying upon when we're doing this work are we using amazon web services are we using google cloud what kind of software what kind of platforms what implication does this have I think Saradevi started to point in that direction on her talk about Twitter as a platform and but I think there's a lot of work to be done especially in HPS to really start to get a better awareness of what this means and of course that extends to publications and the whole question around a potential open access and dissemination of our own research results in our own journals and in our own book series and it's very important to recognize these early and places at the heart of technical work and there are questions around how principles that have been trialed in the space of the natural sciences could actually apply to work in HPS and I think that's a long discussion I'm not going to spend much time on this now because I'm almost at the end of my talk and of course there is a question around a potentially being careful around discussions around open science but also being careful to programmatize the idea that open science is almost democratic and a wonderful thing and in fact I think this is not always the case and especially when it comes to data sharing we need to pay a lot of attention to what we are sharing and how third solution that we are looked at is the long-term involvement of data providers this is of course particularly relevant for those of us who work with living actors people who are still alive and want to document their work or we're investigating research practices which are now ongoing so what could the direct engagement with these objects of our work look like what benefits would it bring and I think these are very important questions at least to ask even if one has no resources to really put this in in motion in one's own work certainly what I've seen in my own work over the years is that when I studied many of my investigations by just looking in right just being a participant observant to some scientific initiatives very often I've become part of them and now I would consider many of them to be collaborations rather than just case studies and that's an interesting shift which comes with all sorts of interesting accountabilities problems but also advantages and of course a big question for all of you who are historians is how does that apply to historical sources if at all and of course a bigger question here is how do we choose the topics that we're going to be investigating and who do we talk to which audiences do we have in mind which implications do these choices of audience and public actually have fourth point and this hopefully should be the strongest point for our field is to try and encourage debate over overarching goals and concepts of human development underpinning data sharing practices and of course I think in digital studies we're all pretty good at trying to think about this in relation to some of the sources we're using and maybe the practices that we're analyzing the question is how does this apply to our own work of course and I think this is really an important question for all of us to think about and with this I'll finally stop talking thank you very much for your attention and I look forward to a discussion. Fantastic thank you so much. While I wait on people to post their questions as the as the tape delay catches up with us I wanted to ask this is related to something you were just touching on at the very end so I've been interested in these kinds of questions for a while now especially these questions about involving various kinds of stakeholders and I wanted to ask you so because I think you're better positioned than probably anybody I know to answer this question how have you and how do you think we ought to engage these processes of building trust because I think that's so often at the heart they see someone you know a population see someone coming out of an academic research unit looking to talk about big data right and the shields go up and they probably should right so what has that what has that trust building process been like and and and I'm wondering what what kind of thoughts you might have to share about it. Yeah I mean thank you very much Charles it's a complex question of course because one has to be careful too because it is true that partly because of the way in which research is organized at the moment and valued the work that we do in terms of looking at the meta level of what's going on in research is actually not valued at all as it should I would argue and I think many of you will agree with me and so we're always starting from as the underdog in this kind of collaboration certainly you know as anthropologists would call this studying up very often particularly when we're involved in collaborations with a very prominent scientist or kind of or a local context and I think well first of all I mean of course it completely depends on the very specifics of the research that's partly why I wasn't trying to give two detailed precepts here because it will change enormously given the situation one is in. I can give a couple of examples from my own experience so certainly the fact that I have built up kind of you know even just a track record of working with some communities and spending in fact many many years in sitting in committees and helping out and kind of you know being part of the service structure of some communities ended up giving me also more credibility when it came to have discussions with people around the topics I was interested in and what they would think about this and how we could think about working on this together and so indeed there is a lot of personal work I think involved in setting up these networks of trust but once these are you know a few of them are kind of on their way and that seems to then become easier just because you're slightly better recognized as somebody who can be useful in that way and it turns out at least for pretty much all of the domains I've been working with and people can be doubtful or a little bit worried indeed as you said when they see people who do digital studies who start to muck around their field but my absolutely overwhelming experience is a lot of interest and gratitude almost and about the fact that there was that interest and you know there was an opening to try and discuss things and of course I've also had my share of big critiques and big clashes with people some of them were very useful to me because I think potentially I really could have done things better to prevent that and to have preliminary discussions that would have avoided the need for that kind of business in other cases just in the nature of the game and certainly now that I'm doing research fields are very politicized there's absolutely no way you can avoid it right so there's also that question and but again there's also that also then becomes a question around building up an awareness and reflexivity about who are the audiences that interest you right because I think for me that has really expanded in the course of the last few years so I started you know my big aim like maybe 10-15 years ago was to be able to talk to people that were doing the semantics for data infrastructure because I thought this is amazing they're changing the way with science is done this is really where fear is blah blah blah like you know so very coherently with my own philosophical framework I spent it a lot of time and trying to understand what they were doing and talk to them and understand how they were thinking and try and think about what this meant philosophically and but as I got more and more exposed to some of the implications adopting those frameworks in terms of you know the inequity that data resources can cause that also shifted my attention to public and to be honest is also a question of very often being invited to sit in committees you know it's bizarre like at one point you start to realize in fact that there are more public than you thought and so it's a whole is I regarded as part of the exploratory work of research that is wonderful that you keep discovering new people that you didn't know existed and have jobs that you didn't know existed and in fact turned out to be really really important for the kind of things you're interested in so you know it's I'm sorry it's not a very systematic answer but at least a flavor no that's true no that's that's super helpful thanks thanks yeah I think I think we often tend to we tend to underestimate some of the uh is advantage as well for to put it briefly the advantage of showing up right especially and as well as in these kind of service-based context that that may not get the kind of uh recognition that that other that other scholarly engagements do uh let me get to questions so uh Gigenio Petrovic asks do you think that uh the same kinds of ethical issues in equity in justice and exclusion regarding data in biosciences or in medicine concern also the data that are used in digital humanities considering that you know perhaps digital humanities arguably have a significantly lower impact on the future of humanity than than the biosciences well and thanks a lot for the question I think that's for all of you to answer in a sense I was I was trying to be provocative and just get you to think about this and I think when it comes to the work I'm involved in which is in a sense you could qualify it as digital studies of science there there are very very big questions and I think in fact that field is co-extensive with the scientific work itself and in fact I may even go as far as to say that in in some in some respects it has even more responsibility because you know while of course many people would just not care about what we're doing of course but there's also a sense in which we very often end up speaking for a certain field when examining particular questions and we end up providing a particular reading or what's going on for instance when people are analyzing metabolism and across different crop species what is the relevance of thinking about a particular structural gene environment interactions when producing a crop that's going to be drought resistance you know like what and how do we think about modeling in COVID vaccination something that many people are doing at the moment as work I mean these are all examples of work that can be done very usefully with the help of digital methods which help you to scope what's going on in the world help you to spot different trends identify who are the interesting actors extend your vision in so many different ways but at the same time when you are presenting it or at least when you know I think in HPB when we do this kind of work we also are really providing an important source of representation for a full domain and almost by definition that work involves a lot of exclusion and a lot of selection right and I'm thinking also I mean the work of my collaborators Mike Dietrich and Rachel Ankeny who used a lot of digital study type of approaches to and look at the mapping of the use of different organisms around the world I mean that has had a huge impact because people look at that work and think ha look these researchers are doing this work and they say that my organism is particularly important or my organism hasn't been important enough for blah blah blah blah right and these as implications for sure so I think digital humanities can have a very big impact in that sense of course some fields will be less impactful than than the others it's also a question about whether you want to participate in in in projects which have those connotations but I certainly don't think that these are humanities are excluded from this whether the same ethical principles and questions that arise from the sciences really are valid from digital studies that I'm less sure about and I think there's a need to be a lot of work on this as I said I really don't think there's been enough work on this question I think when it comes to HPS I've tried to point to some of the specificities of what's going on in my field and I'm sure there's there's much much more that can be said on this and but yeah I mean it would be great to see more discussion happening in that sense I think sometimes we almost get programmed that because we're all humanist we're good people and anyhow nobody cares about what we do and so that's fine ethics is not that relevant because it's in our DNA I hate that expression but I used it anyhow but but yeah but you see what I mean well I think actually potentially there's even bigger risks precisely because we tend to overlook this that's really nice that's really nice thanks yeah Rose Trapas asks when you get to the next question Rose Trapas says if you've been able to affect any changes in these communities or systems that you've been working with and how did that work what was that was the change generation process like yeah I got the question actually very very recently yesterday as a matter of fact from a group of students so and I think that's a very interesting question especially because it points to one of the problems with incentives the fact that impact whichever way you want to define it in these kinds of fields is really difficult to track right and that means that responsibilities and accountabilities are also very difficult to track precisely because things are so distributed that's precisely the problem is a double faced problem on one hand responsibilities becomes more diluted more difficult to track at the very same time potential positive change also becomes a problematic to track right so I mean could I say that the work I've been doing has made a big impact is very hard to say I mean I suppose the one the work which I did which I'm pretty sure had some impact in a kind of more blam verifiable way because we're using it for the research excellence framework in my university at the moment is when I ended up in doing a lot of work for the European Commission on the basis of the insights I've gotten from my research but in fact doing that work meant for me spending an inordinate amount of time thinking about how does one present the results which are originally thought for an hps audience for a public which is completely different and is interested in what constitutes sustainable data infrastructures right which I would argue and I've argued quite vocally was incredibly useful for me as a researcher because it really did shift some of the ways in which I was thinking about my philosophical frameworks it shifted some of the ways in which I was thinking about my historical sources so you know I would recommend that kind of engagement as a way to become a better researcher really like not just because you know you're doing good or you're doing bad or whatever it is that you're doing in the world but yeah I mean of course then you have other types of interactions that give you some sense of impact whatever you want to define it so I tend to do a lot of collaborations even with small startups and people who are working for instance on data-centric systems in genomics or sometimes in clinical spaces and I tend to talk actually I'm quite happy to talk you know like in a very independent basis as an independent experts with groups of people working in companies in the private sector to think about how some of these really big ethical issues apply to the sector because I think when it comes to these those studies if you don't do that it's almost a problem because this is where certainly for contemporary data work most of our materials are filtered through if it doesn't come directly from there and I've been writing reports for governments specifically that were then used to produce policy so for me you know these are things that I'm interested in doing but I think you know like what I'm hoping for the project that I'm starting gonna start next year is that we're gonna start to get a better sense of you know these kinds of engagements and what they lead to because we're actually for the first time and really bringing in people who are working on different data infrastructures around the world and especially in the global south as a partners in the project from the very beginning and so part of this will be to see actually how the group develops and what changes in us and then and you know and in the different people involved from the different domains as a result of this exercise. I am you probably many of you will hate me for this but I'm very allergic to quantification of impact I really don't think it works very well I know that many it's very important for this community but I've yet to see an example where this works particularly well for this kind of diffuse distributed practices and so yes so I haven't particularly thought about or been willing to think about how we could quantify this but of course it's a big discussion and that's something that's a lot of governments and organizations are looking for now so any insight from all of you and this will be great. Great thanks more questions next up is from Christoph Malatech who says extremely inspiring talks Sabina thanks so much I have a question about the data versa repository type services that we're starting to see up here for academics so it's a real jungle out there with services offered by large institutes some of the best funded universities like Harvard some traditional publishers some online publishers what are your thoughts about these? Yeah I mean in short is a freaking nightmare that's my thought about this in the sense that I mean you know I I just find it rather you know let's just say paradoxical not to use a bad word but the fact that you spend 15 20 years thinking about data infrastructure so I have to get information about them trying to dig into their history understand how they're being set up and then you still don't understand what's going on in your own field which I think would accurately depict my situation so you know I try to follow this up every time I start a new project I try to have a month where all together with the team we think about okay what are we now what's around what's available to us what can we choose and I mean it's again point is a direction I will be benefiting from because you know I've been doing this as I am at one point I had a slide where I was referring to this paper we did for the plant community together with and this was a paper that I co-authored with some of the top experts in that community that were dealing with data infrastructures I spent 15 years in like investigating the community and the data infrastructure and mapping them and it was ridiculously difficult to put together that paper to the point that we ended up just with some headline kinds of data infrastructures because going anywhere further than that was already a nightmare and then four years after that I received emails with people complaining saying yeah but this and not but that and in fact this really was an accurate and etc etc so mapping this it's a real it's a real difficulty what I tend to do now is when it comes to open data I quite like for now the Zenodo approach because I think I mean my impression and again you could think very differently but my impression is that in this field our data curation practices are still very much in fluctuation there is very little if any agreement on what they look like in fact there's been very little discourse I mean I think this conference will hopefully kick off the discourse around how on earth we're doing this but everybody's doing their own thing and so having a repository which is open access is financed by the European Commission which means that is hopefully as long term as it can be and and provides you free hosting for whichever data type you want to put out in whichever form you want to put out is pretty important as a first step of course I mean I'm the first to say that this is really not solving the problem because this stuff is almost by definition not machine readable like you know there's no standards for metadata much and but again the services I've seen in the UK been developed for social science work for instance have not really been great I have to say I mean I found that very restrictive for my research so it's a struggle certainly when it comes to cloud services even worse because of course we're all working in institutions which make very particular decisions which you know I've tried to influence and you know I've chaired the research data management of my university in the last year and a half and I still haven't managed to influence which provider we were going for and which people were trying to get some you know quotes from when it came to choosing cloud services and anything like that because of course this is about 8 it so you know we are totally excluded from this thing so it's hard because there you enter into this realm of incredible labyrinth of codependences where our own space of agency is very limited but again I think you know there's there's still things we can do I mean you know we can use certain kinds of reference tools for instance which may be open access at least for now we can try you know we can think about as I as I referred to before one of the things that I think is really most accessible to us now is thinking about what we publish because contrary to the infrastructure question where because we're very small field we sort of depend on a lot of what other people are doing ultimately I think publication is where a there can be change happening I mean I've been part of discussions about this for quite a while also because I've been very active involved in open access open science discourse and you know for me for instance in the historian historians community I think there's a disappointing lack of attention to for instance the question of open access and what it means and what it signifies there's a lot of backlash against plan S for instance but I've seen less discussion on why we actually want it and I can tell you I really wanted I want to be able to publish for free and to for my work to be distributed to all the people that may be interested in this you know partly because I'm interested in having different publics so I think you know there's no obvious solutions there either because it's a very fraught discussion no business model there either I mean the whole thing is difficult but at least to make sure that our own scholarly societies our own colleagues are aware of this problem and keep discussing it I think will really and make a difference that's great thank you um let me see a scroll down to check on votes uh yes next next uh next question up is from you on monthly news it's absolutely amazing talk very helpful to understand your your involvement and your your deep analysis of data I'd like to see how your conclusions impact studies on modeling of data and algorithmic opacity you think that opacity of models of data built mostly with AI based algorithms related to your discussion on responsibility and accountability yeah so thanks that's a very good question also because I've got a few PhD students who are very deeply involved in this because we are running a cent a doctoral center for AI and environmental intelligence here so it's a it's a constant question now um I mean my feeling is actually that um there's only so much we can do about the opacity and explicability of algorithms and for reasons that are kind of obvious that there's a sense in which um you know I mean they're almost you know these algorithms developed by themselves in a sense so there's only so much we can do and explain exactly how they obtain certain results and the system is so incredibly distributed so um you know the idea you know the old idea of scientific understanding that I worked on when I was younger like in in thinking that you can capture and really you have the cognitive ability to understand um like the whole system that you're working with I think we just have to throw it out of the window this is just not gonna happen um but um what I think is useful is to um first of all for people who are involved in this kind of work to take it very seriously because I really do believe that people who are technically involved in producing the maths and the models for AI are absolutely in the front line of thinking about these issues and I teach I'm just finishing teaching our 200 strong postgraduate class and data scientists and I just finished writing a textbook actually specifically for that audience partly because I really do think that a lot of the change will need to come from there because these are actually the people who have the enough level of technical understanding to potentially be able to see the implications of some of the very very technical decisions I mean which prior do you put a particular model you know how do you set up the maths like all these kinds of things I mean these are important components and and the other thing I've been um thinking about a lot and trying to recommend also in writing is a governance system where people who are dealing with producing this kind of algorithms and applying them are in regular communication with a broader set of stakeholders and this is not because that means that we will be able to predict exactly what kind of impact benefits and harms can be uh done through these technologies I mean that's completely impossible it's never happened in these three sides of technology and I really would like to see a system that tells me exactly how people are going to be using my technology in five years time I mean that would be amazing it's not going to happen but I think the closest um tool that we have is to try and multiply the feedback that we get when we're setting up certain systems and indeed exercise our ability to try and get walk other people through what that system can do so I think this maybe gets close to the explicability idea except that I don't think about explicability as you know you sit down and tell me exactly what's happening in every part of the algorithm and what's going to happen as a result I don't think that's possible and feasible in most of these systems but the ability to get a group together of people who are working on these technologies and people who are using them on a regular basis that's very important and try and see how things are moving and I mean what I witness is that this has been beneficial to the groups I've been working with uh just this very you know simple stupid kind of um social trick of just trying to get people to articulate what they're doing and what they think at this moment the potential implications could be um but yeah I mean I I certainly am a bit skeptical though I think it's very important um for modellers to think about um technical ways of fixing these issues in the system I think the trick is really constantly comparing these technical fixes with broader perceptions of what the social implications could be so never to quite focus only on the technical side but embed it all the time in a broader context and of course that requires a lot of resources and it requires time which a lot of people who are developing algorithms um you know have a lot of pressure about so it's it's obviously a very utopic idea of trying to rethink the system at the same time you know I've been arguing that even with the in the context of the covid crisis which arguably really is an emergency um there has been a lot of work that was done far too fast lots of algorithms implemented without really properly doing this kind of work and that was a disaster was basically a waste of time so it would have been better to take a bit of time to set it up and more appropriately and set up these feedback mechanisms more appropriately that's great thank you um next question coming in actually so what was he we have six we have six minutes and we have two we have two questions that are actually nicely uh uh slightly related to one another so let me start yeah let me start with uh with one from Stefan Lindquist who asks uh so I'm interested in cases where there's a questionable ontological or perhaps uh ethical position that influences the way that data are collected and then becomes reified further along in the in the data journeys as it strikes means a potentially common phenomenon it's also a place where philosophers of science might might make interventions so could you comment on how this how this process could be effectively critiqued perhaps yeah that's a very live question thank you very much because right now we are dealing with specifically agricultural systems and this is a space where on one end I've been very interested in some local practices and set up with specific local communities like my colleagues in Nigeria for instance where there's been a real and serious attempt to try and engage local breeders and local communities in thinking about what counts as a relevant plant trade which of course you know for many people is also a basis of livelihood there are considerations around how do different plants taste when they're cooked you know there's considerations around the role of plants culturally the role of plants as a food system and very often these are exactly the considerations that a more standard taxonomic systems don't take into account and so already there I was very interested in looking at the ways in which these different standards start to clash but partly that's why I got involved and interested in the crop ontology because I thought that was an interesting situation where you're trying to at least engender a dialogue between these different taxonomic systems and see what comes out of it and whether you can actually use in a productive way synonyms for instance and the multiplicity and the potential for multiplication that you have in a digital system and in an internet system to acknowledge different perspectives at least and make it possible not to lose them completely when you're putting together a set of standards right so that was one but of course you know one looks at these kinds of initiatives I'm still very interested in I think they're wonderful many of them then end up a subsiding under an umbrella set of initiatives that are supposed to provide this kind of transnational global interoperability and these initiatives lo and behold they all come from the global north they all come from very particular institutions you know they're very particular partners and they have a particular interest that they need to satisfy to be able to work right and this is you know you may think that this is just a cynical remark but in fact it's not just a cynical remark it's a very important remark because the fact is that you know whether we like it or not the majority of plant crop trials for production of new food varieties and crop varieties and food security are done by private companies so let me jump in because you've just perfectly set up the last question so let me feed the last question into exactly the thought that you're having right now so Arlie Belvo asked would with strengthening data standards and infrastructures for data sharing and collaboration protect and empower communities and indigenous groups so is there a risk that that kind of system that's being abused by more powerful organizations like Amazon or Monsanto so there's a question about gaming the system here right if we had a conversation some yesterday about the risk of making the google search algorithms public making them gameable is there the same kind of worry exactly right here so I think this is exactly where you're going so of course I had that yeah no thank you very much of course right I mean there's no question and I mean I think you know I think it's it would be very problematic for anybody who works in digital studies not to start their work from the assumption that we are in an in the context of a highly exploitative aggressive data system where which is absolutely vastly dominated by specific private interests and I'm not against the fact that they're private per se I'm against you know the the way in which these have been fed into a kind of completely neoliberal kind of ultra capitalist way of thinking about data value so of course in that kind of system there are problems right and in the area I'm working in which indeed as Charles was saying this is exactly where I was going the previous answer right so in the area I'm working in and not only you have this issue on the side of data facilities but you have the issue of corporate interests and very particular ontologies of what it means to think about agricultural development coming from the history of agriculture in all the international organizations national international national boards national standards all the rest of it that regulate not just agriculture but the selling and trading of seeds you know the standards for fertilization all of those things right so yeah so it's kind of um you know so I think in in that situation and that's where again recognizing the distributed the distributed nature of these systems is really important and that doesn't mean that you don't recognize power relations in all of this that's where I really detached myself from network network theory I mean I'm not interested in just looking at networks and flattening things out and I think it's very important for data studies to not ever flatten power relations out because they're kind of absolutely central to these kinds of systems and what do you do as a philosopher if you're interested in changing the ontology or some of these discussions in changing the standards if I knew I'll be really happy but I certainly been trying with some of these things for a while so now for instance we are working with um agricultural development and many of the standards there that rely on this ideal genetic gain as a measure for what does it mean to have development in the area which basically is the high yield um notion of agriculture which I think you know we're all familiar with the fact that this may be exactly controversial at least in some um in some environments and yet it is the absolutely predominant idea that drives most of the ways in which different national governments are thinking about cultural development so yeah um I think it's an interesting way to infiltrate the system to try and think about how from the technical level we could try and de-rify some of these issues and try and and add multiplicity and more ambiguity indeed like into the systems and in the awareness of the fact that um yeah the interdependencies are huge in cases like this and nobody of us you know I mean it'd be great if some of us managed to really break a system completely in that way I you know I think it's probably going to be a little unlikely but it's also true that you know when Barry's myth um basically became the protagonist at least for a while or the development of biontologies the kind of platonic assumptions that he managed to impart on that field which of course had been a very strong critic of now for a while I've really lived in that field and I've had lots of implications so hey these really are spaces where philosophers, historians, social scientists can make an impact if they arrive in the right moment at the right time and on that cautiously hopeful note I'm afraid I have to bring our discussion to a close but thank you so much this was a really fantastic talk thanks so much for sharing for sharing your work and your thoughts about a hopeful if it hopeful if problematic future I'm excited to see how things unfold over the next