 We're going to be talking about machine learning, datasets and humanities of search. And we're hoping to have a bit of discussion and get some ideas from you about how and where JISC might be able to support these activities, either with practical work that we might undertake or in the way that we provide support out to community. So JISC used to be a funding body and in the past we would award grants to organizations to undertake research and development and implementation projects. That's no longer our mission. Our focus now is much more on providing digital products and services. However, the current JISC strategy does recognise that universities and colleges in the UK want JISC to use its convening power and its unique position to help drive forward agendas and to help the sector act strategically and to facilitate knowledge exchange and to support organisations in recognising and adopting good and best practice. So that's what we'd like to do in the area of machine learning and machine actionable data. Having said that, it's worth noting and others at this conference have said it over the course of the last few days that AI and machine learning is currently being referenced everywhere across education and cultural heritage as part of the exploratory work that Pisa has been doing. We'll be hearing from him shortly. Horizon scan of this topic suggests that there are at least 32 UK and international organisations or initiatives that are looking at AI and ML in the education space. Which makes the question that we want to discuss today even more pertinent to where is it that JISC can operate so that we ensure that we are not duplicating effort or creating even more complexity? And just about actually about horizon scan exercise excluded all the commercial companies that are operating in the AI space and who are developing products or services for educational use, which there are many of course. So machine learning is going to be the focus of this session, but we recognise there are other aspects to AI. JISC actually has its own national centre for AI and tertiary education. But at the moment they're primarily looking at enterprise level applications and how technology can help institutions, improve services to students and other stakeholders. They are practically looking at the use of bots or chat bots, i.e. the computer programmes that simulate human conversation through voice commands or text chats or both. So humanities data sets and collections as actionable data, the sorts of things that we believe the DC community is interested in. It's not really on their radar at the moment. So that's why the teams that I'm responsible for in JISC are looking into this topic area. We work in the digital resources directorates at JISC and more specifically in the area of content and discovery. And we provide services such as Archives Hub and Library Hub and Historical Texts and Journal Archives. And those services, they handle large amounts of metadata and content. So we are quite well placed to look at that data and explore how we might make resources more discoverable or to what extent those descriptions are culturally representative, diverse and inclusive. Our colleagues working on the Archives Hub are actively engaged in some work right now and are working collaboratively on one of the AHRC funded tank projects. I'm not quite sure whether Jane or Adrienne is on the call. So, but we might be able to hear a bit more about that when we get to Q&A part of the discussion. But building on that practical work, we want to understand what generally how these technologies can be employed most useful for those undertaking humanities research, teaching or learning. Our focus at the moment is to support JISC higher education members, but obviously whatever we find out and whatever help and support we can provide, it might easily be applicable to human beneficial for other sectors as well, including FE, of course. So this session is really about developing a conversation. We'd like to know what you think about these technologies and their application to humanities research and teaching in relation to archival and library collections and especially in relation to making more data available. Well, one question we might consider through the session is what do we need to do if we want to develop useful data sets in a practical way for processing by machines and do we want to develop what is sometimes referred to as crowned truth sets? And I'm pleased to impact that concept a bit more for us. And we're of course aware that ethics are central to the uptake of machine learning methodologies in education and in memory institutions and those concerns and how they affect the design of archival and library infrastructures that's got to underpin all the discussions that we have and I'm sure we'll return to that topic as part of the Q&A. But right now I'm going to hand over to Peter and over to you. Thanks very much, Neil. Yeah, so I'm Peter Finley. I'm digital portfolio manager in Neil's team in JISC. I'm going to be sharing some slides with you. I'm a 50-ish old year old male, white male and I have quite a green beard with a bushy mustache and some very short-cropped hair. I wear glasses. I'm wearing a blue shirt with a pink check and some blue trousers. And I'm sitting in South London and it's quite warm. So hopefully I won't melt away. So I'll now share my screen with you in the usual manner and into presentation mode. Sometimes that takes a moment. So as usual, hopefully everyone can see that. Yeah, that was good. Good, good. Okay, so, okay, machine learning, data sets and humanities research. The question, one of the questions that I was given to consider, you know, what is, do people need data sets to aid their work through the new technologies of AI and the subset machine learning and through sort of research and considerable amount of reading and learning. I think that what we're looking at really is explainability or explainable machine learning data set development. Now, whether that's right or not is a question. So I suppose that's what we're here for really to try and determine that. But I think that's the kind of overarching. So people want to know more about these technologies, I think is probably our ground truth. So I was thinking about, there's a lot of commentary about concerns about these technologies and people are quite concerned about ethical aspects. So it was interesting to come across this blog post by Clifford Lynch, who's the executive director of CNI in the States. And he suggests that memory organizations can develop skills and workflows to help them in this domain. And that it's really about trying to improve the processing and access to digital collections, which has been kind of hampered in many ways, and certainly in terms of description because of shortage of labor. So one of the things that obviously concerns people is that it will take away labor. But I think the suggestion here is, is that these technologies help humans and that we should utilize them in that way. But there are course consequences to all these kinds of decisions. And one of the consequences, you may have to accept that there will be emerging issues of quality and consistency. And that may fall short of what we expect as humans. But I think if we think of these things as aids and tools, then perhaps that might be a way forward. Interesting blog post incidentally. Link at the bottom of the screen. So while we're here today, well, I think there's an increasing awareness that we need to have our collections ready for computational use. There's increasing demand that there should be so. And of course, Google demands it because Google essentially is a big AI engine and it's demanding all the time that we make our material available to the behemoth so that people can find them. And of course that so often is still the first place that people go, whether they be students or academics or researchers. That's often the first starting point. And we really need to bear that in mind. And of course, Google has done a lot of work in this domain and is using algorithms extensively and has recently developed new technologies which will enhance their capability to provide more refined results. But we have questions about those refined results. What does that actually mean? So that's one of the considerations. Questions about how we can actually utilize explainable machine learning techniques to help us improve about providing access to the collections for also those who want to work with them so at the end of the academic work. And how can we show that our data is ready for machine actionable collection use and access? And we've often spoken about collections as data, but increasingly that is becoming important. So we still have lots of stuff sitting on shelves and that's all to the good and important in itself but there's a course increasing drive to make a material into data that can be consumed by machines, also by humans. So we started off really with doing a kind of horizon scan and it was interesting to know there's two organizations and counting engaged in this space, as Neil said. And there's question there about well, do we need another one to be doing this stuff? And if so, what exactly should be focused on? Few of those were state size and there was a couple of big reports that came from OCLC and also from the Library of Congress, which I think most many people who are interested in this topic will be familiar with. And they have lots of calls to actions and part of what we're doing is responding to that call to action as one of the institutions that might do something in this space. I thought it'd be useful just to briefly mention for those people who are not so familiar with this space, basic types of machine learning, we've got that in the text domain and you have the supervised and the approach which basically requires you to provide some form of labeling of the input data. So some kind of pre-classification of the data. And then ultimately that outputs, the machine then learns to classify that data further. So we call that supervised and that's, I suppose you might say the humans are more in the loop there. And again, those reports talk a lot about human in the loop activity in this domain. And then there's the unsupervised where there's essentially a whole set of unclassified data and we build a system or increasingly, we also buy those off the shelf, those lap boxes potentially from big providers. And they take certain actions based on the algorithms which have been developed and they will then do that classification work in an unsupervised manner. And there's a lot of questions, of course, about that. But there's questions about both sets. And of course also there's the whole domain of computer vision, which is, so the big domain I suppose are the text analysis and then there's the computer vision side of, which we hear quite a lot about because people are concerned about how things are identified in images and we are concerned about that as well because of course in archives, we're dealing an awful lot with images and we know that these technologies have been useful in writing recognition and those kinds of things. And so all of these things have concerns wrapped around them. In terms of our work, we sort of engaged in developing what we call a thought leadership framework. I think thought sharing framework might be a useful label for it, but it's really about how do we work with our members to try and explore some of these kinds of issues. So I think in this project, and it follow a kind of framework of looking at data, looking at the subject matter, evaluating what we can do with that data. Do we clean it? And then what steps do we need to take to make it machine actionable and bearing in mind things like, sounds like the fair principles and also thinking about Tom Pavilion's notions of machine actionable collections as a frame. And it's really about being people together. So today's part of that activity. So we're interested in finding people who have questions, so humanists with questions, librarians and archivists of course with collections who want to put them in the game. Also looking at our own data availability, which Neil mentioned, and how can we work together to develop explainable, and of course explainability is really for humans. So it's really, that is really about people. And that's an exploring process that we will go through. So there's an understanding process, basic understanding of what we're doing, and there's a kind of exploration together and engagement then following from that. So building some kind of community of perhaps a community of interest, gathering more requirements, what is it that we really need and want and what would be most useful? We talked a lot about the idea of mapping the ethics to pipeline, so making those ethical considerations very practically based and trying to understand more in terms of practice, what do you have to consider when you want to make an ethical approach. So it's an overarching theme to the whole piece really. And then we want to be able to demonstrate and show people what explainable machine learning actually looks like. And that's kind of explore and decide phase. And then from that, I think there'll be outputs, like things like guidance, practical advice, guides, case studies, workflows, pipelines, reports, and also contributions. Gisp, of course, we contributed to sector policy. You know, just Neil mentioned the tank program and we're all kind of engaged with that. And we obviously we talked to other funding bodies and you know, so we're interested in exploring those things from, but I suppose with our members are kind of where we focus. So to do all of that, we also want to work up in the right way. So we're setting up a task and finish group to help us make some of the decisions around all of this. I'll say something more about that in a moment. So what's the objective of the project? If you want to call it that, I tend to think these things more as extended initiatives because they may not have a defined start and end points. You know, some parts may, but at the moment, we're still at very much in that explore phase. So we don't necessarily know where we'll wind up. So what's it really about? Well, it's about trying to get me away from this notion that things go into a black box and they come out the other and we don't know why. What's the impact of doing those things? We need to understand more about that. And humans obviously have to be right in the loop. And so it's really taking a look at these things through the lens of academics and researchers. So bringing them together, together with collection providers to, you know, work on practical examples and explain all that ML in research, libraries and archive. And to describe that essentially and take steps, you know, in response to what we learn. And the agenda, as I said, is to be defined by a task of Finnish group. I mean, I've already been speaking to some people and I've interviewed various people that come across. We often work with people who are engaged in district humanities research and activity and we are in some of those networks. So we've been talking to people about some of this already, but we want to take those conversations further and we hope that people want to join a group. And it may be initially, you know, a sort of defined group, but I hope also that we might develop a kind of, you know, sort of panel approach so that we can talk, have conversations, ongoing with people and that people can come in and out of the conversation, which I think is what community development really is about. So, and who's it for, who cares about it? Well, it's really, I think we're focusing on the non-specialist academic. So I mentioned people involved in district humanities research for a course they're a much more specialist, you know, using computational approaches, but there's lots of people who are interested in those things but don't necessarily know where to start. So that's kind of where I think we might focus because there's big organizations like Turing that are doing things at quite, you know, a significant or high level. So where can we fit so we can do things that are useful to support and supplement that work? So it's really understanding what the humanist want or working in humanity want from these technologies and it's really, you know, to also get beyond some of the complex language and make things more digestible. And again, in terms of transparency, that might help. But also we're very aware that there's a lot of work going on to actually embed some of these technologies in library systems. So when you start conducting research, you might already be interacting. Some commercial companies are already working with vendors of library systems or are working with other commercial companies to bring AI technologies to bear on search, you know, when you search the catalog. So that's kind of quite impactful potentially. So it'd be good to understand more about that. And of course, librarians and archivists in the content team that's who we have tended to work with their constituents, I suppose, or constituents. And they have, you know, they're very keen to ensure data integrity and the fair principles. I'll put a link into them if you don't know what they are. But they're really about trying to think about the data that you put out into the environment and it's consumption by machines. And so that's a big consideration, you know, increasingly it's machines who will come to a library and ask questions of the data that's held in your systems. So we need to consider that. So, you know, more questions that the task and finish group might want to consider is, you know, so what is responsible machine learning in research, libraries and archives? What's already been done? So I've already been looking at some things, but there's lots of statements, declarations and talk it's already out there. And what can we do to bring some of those things perhaps together and make them more useful? Is that something that we might do? And who do we do that with and who for? So that some of those things still need to be determined. As I said, focus on ethical issues that will obviously be very important. What's that about? Is it about developing standards or statements of values? Again, it needs to be practically based and thinking about real data. How do we document and evaluate all these things and take actions to make that machine learning more transparent? And I think that would underpin that word more or underline it. I was talking to a colleague the other day who was suggesting that it's really about more transparency because some things are very transparent. Some things are not at all. And we might not never get to perfect transparency, of course, it's worth bearing in mind. I should point to a link to the Archives Hub blog and there's been some excellent articles written by the team on machine learning relating to archives and work that's being undertaken by that team with just some dismembers. I really would point to those. So I think they're really, really good summaries of important matters. Further questions, is there access to data in the first place? So what do good machine actionable collections actually look like? And that's defined in the OCLC report. Again, links at the bottom there. Responsible operations is a very worthwhile read, I think. And how can we support creation and wider access to those kinds of collections? And thinking about, when I think I already alluded to this, machines and algorithms as patrons, as I say, increasingly machines will come to your data through your catalogs and your systems. And what are the use cases for that? How can we consider those? And how can we support the creation and reuse of domain specific training data sets or ground truth sets? Or is that something that we should be doing? Now, some of the literature says, yes, that it's a very useful activity, but quite often it seems that the ground truth sets are developed through lots of processing and lots of processes going on through projects or project funding. Question is, can you create those things as ready state things that you provide out on a platform or in an environment? Is that an approach that you can take? Or does it rely on having done lots of project work to get a ground truth set out that can then be shared on with other projects, perhaps questions again. So moving on, final slide on questions to consider. Lots of questions around the library infrastructure or library and archival infrastructure, so I should put archival in there that can support this kind of work. And we hear a lot about infrastructure, but what do we really mean by that? And so questions again, what is the role of the library and collection provider? And how can they work more readily with researchers and across disciplines as well? I think it's increasingly important to consider that, particularly bringing also people, specialists in like people, how specialists and in machine learning, computer science and so forth, how do you bring those together in effective ways with people who have questions in their own research. Questions for us about whether we should do some small grant funded challenges to explore the embelling of machine learning results in discovery interfaces as an example, could be other things. And what kinds of skills and expertise and need to be developed to help people to both understand these things more readily, but also potentially impact them, influence them, ask questions about them, those sorts of things. Of course, Jisk is not, as I said, it's not the only agency. SILIP, it's a very good report, which I probably should have put a link into, sorry. Look it up. Andrew Gray wrote the reports for SILIP, which is excellent. It focuses very much on the skills piece. So the question about whether or not we spend so much time on that, but we may. And then, what kind of outports, can we develop sample pipelines, workflows, implementation, toolkits, fender guidelines? These are the kinds of things that Jisk, in our thought leadership activities, that's what we try to do. We try to provide these things, both to our own sort of constituents, but also to the wider world. So usually we try to make them open. And so in terms of the discussion right now, people on the call, maybe some questions that we hopefully can try to answer a little bit. How can we ensure that data is ready for machine-actual collection use and access? Thinking of collections as ground truth data. How can we utilize machine learning techniques to help us improve access to collections in the first place? Or what's the discovery of collections and perhaps access to collections? And what is currently not being focused on if higher education institutions and Paul, I'm sorry, Neil mentioned the fact that we also work with further education, but in this instance we're focusing on higher education institutions. So what's currently not being focused on if HEIs are to make best use of these technologies and where can we take action that's useful? And I think, hopefully that was not too garbled. I think I've kind of come to the end with what I wanted to say and I'm hopeful that you can either put on your camera and join the conversation. It's always nice to see people. I know that can be a bit daunting, but why not live dangerously? Or just put the questions in the Q&A. And that also, of course, putting the questions in the Q&A does help us also to have a record of them so that we can consider them if we can't answer it all now or sometimes people want to ask questions that we might have to come back to people for. You can always contact either Neil or I if you want to get involved or you're interested in this work or you have more questions that we can't answer today. And we're around in the environment, so get in touch and thank you very much for listening and please give us some hard questions because we'll know the answer. Not too hard, not too hard. That's right. Thanks very much. We should stop sharing now. So it says pause share now to stop sharing. So if people do want to actually put their camera on and talk, and do or talk, you need to put your camera on, you could just have your audio on. Then if you put your hand up, then our colleague, George, will be able to transfer you into or turn you into a panelist and then you can turn your camera on. I think that's the way it goes. But of course, you can just also put stuff in the Q&A screen or in the chat, which Julian has done. In fact, there was a request that just came to host a panelist about a link for the Sillip work, please. She says, I think you just referred to some Sillip work. Peter. Sorry, I didn't hear what you said. Did you refer to some work that Sillip had done? Yes, which is the skills piece. I think, yeah, I could try and see if I can find that report and I'll post it in there. Is that the question? Sorry. Julian was just asking whether we could make the link available. Yeah, I should do my best to go and find it now. You can remember that. Yes, she's put something else in there as well. So it's a comment. She says, buying from managers to assist support is sometimes a challenge and when budgets are so constrained that the RPI's IT system is controlled by the HE institution, it can be a challenge to get them on board too if they're not familiar with the RPI reliably specific programs. Yes. Something we're very familiar with, isn't it, Neil? This is so true. I mean, quite often when we start off this work, we will start off with a sort of shaping activity but then that may then result in further activity which may have some small amounts of funding attached to it. We've done some work in the past where we did pay some institutions to partake in a project because they were providing us with data and they were providing us with lots of insights and so forth and it just helped to oil the wheels but I mean, at the moment, I think we're not quite at that stage where we've got to find a program of work. So I think that, you know, we would consider those things. We do consider those things. Usually if people come to events and so the gist, we help with paying expenses and things like that. We can also help people making the case and explain what we're trying to do. I mean, your experience, Neil, you've been around the block with those sorts of things for a long time as well. Yeah, yes, indeed, yeah. I mean, going back to the idea of, you know, trying to get IT departments on board, I mean, I think it's certainly been the case in digital preservation systems but in other areas as well and trying to get that internal conversation, you know, in the right place and with the right kinds of advocacy and furnishing people with all kinds of information that they can have those internal conversations, extremely important. Jane has just joined us as a panelist, which is great. Thank you very much, Jane, for popping up. And yeah, we've referenced the tackwork that you're doing. Would you like to just tell folks a little bit more about that and perhaps some of the lab's work? Yeah, sure, yes. So in the Archives Hub team, we've got a couple of projects that were, one that we're leading and one that we're involved with that both are about, to a certain extent, machine learning. So the AHRC even funded tank project that we're involved with, which is led by University of the Arts London. The main area that's looking at in terms of machine learning is developing a tool that can uncover kind of bias and issues with cataloging. So that's something that UAL, that their Creative Computing Centre are actually gonna be developing and we're gonna be working with them and we can obviously provide data from all of the Archives Hub contributors and provide some of our kind of expertise and knowledge of the Archives domain. So what I'm hoping is at some point, those that contribute to the Archives Hub will have access to this tool as a kind of beta tool and I guess we'll see what we find with that. That project has only just started and that's a three year, very big kind of project. And then we're doing our own somewhat ambitious labs project, Archives Hub labs project, which actually involves both triple IF for images and machine learning because we thought we'd make our plate really full. So the machine learning side of that, we have an AWS certified machine learning expert working with us. And it's a very open-ended exploratory project. So we're working with, I think it's about eight of our contributors now. So a number of universities and some smaller institutions with their content and data. And it's really, it's kind of seeing what we find in all honesty so and seeing what the problems are as we go along. So it's been an interesting process. Even kind of getting the large quantities of data has been somewhat problematic with some of them. So that was a challenge to begin with. Kind of getting the data, how the data is labeled, how we can identify it, matching up images with archive descriptions, which is different in every single institution and can be difficult. And then we'll be moving through to testing out some machine learning tools. We're mainly using the AWS, the Amazon cloud tools. So things like, you know, text and image recognition, all these various tools. And we're gonna kind of see what we get. The last thing that I would say on that before I hand it back is, I think I've said this a little bit in the blog, but it's been kind of interesting because the out of the box machine learning tools tend to be set up for, how can I put it? Kind of for the present day, if you like, for the modern world. As in, when you think of archives, for example, we have some images of 1940s and 50s household kind of furniture and items as part of the design archive. Well, machine learning tools don't recognize the cooker and the fridge and whatever. They say this cooker is a fridge or whatever, you know. So we're quite interested in that whole area of whether GIST has a role to play in training algorithms to recognize kind of more historical items and artifacts and the sorts of things that you come across within archives. So I'll just leave it there. Hopefully that gives you some idea. Do you have a look at the blog? The blog is very, we're really, really trying to be honest about the problems and issues and take you through it step by step with what we're doing in the blog. So it's worth having a look at that if you're interested. Certainly, I picked up a lot of information from the blog, Jane. So it's always good to have a look at, actually, because it's obviously very much focused on collections and archives, very specifically. And I think there's not many things that may be quite as focused as that on those things. So that's really, I found very valuable about it, thinking about things like you said, identifying or not being able to identify the things that are important to people who are doing archival research, which is quite different from people who kind of do other things like facial recognition or whatever, which may be important to archives, but yeah, certainly the blog. Again, facial recognition. I mean, that's something we haven't prioritized that. At the moment, we're prioritizing more. We're thinking of looking at text in images, particularly for a lot of archives where you have kind of posters and things like that. Facial recognition, we may have the same kinds of issues, I think, with older kind of black and white photographs. You know, again, the tools aren't necessarily trained on our kind of materials. Yeah, there's a point just made by Melinda in the chat. That's a really fascinating point about historical objects not being recognized. Is that also around personal signifiers? Yeah, like uniforms, Melinda, sorry. That's an interesting social status. Well, quite, probably. I mean, probably we've only just started on this and I do think this is a particular area that we want to explore. So for example, as I said, with the kitchens, the 1940s, the kitchen photographs we've got, because quite a lot of them are well-labeled, what we'd like to try and do, and this is where it's quite difficult and this is why you need a lot of data and it's why individual institutions might struggle because you need the quantity of data to say, well, we've got some well-labeled photographs here. So we can feed that in. And then over here, we've got some that are similar and from a similar era, but they aren't labeled. And now we've done a bit of training of the algorithm. What can we get? Can we get improved outputs? So that's kind of what we're looking at. But yes, it could be the same thing with anything like that, in fact, it'd be interesting to see. Yeah, so please stick around, Jane, and just be part of the panel still. Because I think that point about individual institutions, we've got another comment from Julian there. Sometimes it can help to have an expert in the meetings. And I think, Julian, you might be referring to those kinds of meetings we were talking about in terms of those internal meetings, to try and get things organized on the ground locally. That's an interesting question. And one thing that I would say is, this is hard. It's quite hard. Like we've got somebody that is certified and we all know this when you actually come down to... That's certified in terms of A double S, right? Certified in terms of his vanity. Okay, well, possibly, possibly. So you can have somebody that's kind of done the exam, but the practical reality we're coming across issues that just aren't, we didn't necessarily expect. So you need quite a lot of technical knowledge. So I think that's an interesting idea following somebody with a bit of expertise, yeah. Yeah, yeah. I mean, that's something that we could discuss in terms of our interventions and this is what we'll kind of want to get to in terms of shaping just because of activity and how we support and help. Nothing, this idea about maturity out there, in the institutions, in the sector. Peter, you and I were talking a bit about it yesterday, weren't we? And you had conversations with the likes of Ruth Isaacson and Exeter. Do you want to share that conversation that you've had with him? Well, the maturity of institutions in terms of. Yeah. So yeah, I mean, you know, not to quote indirectly, but we were talking about, you know, what is the use of web in, so often in research or in academic activity? So often it's still kind of as a substitute for paper. So reading documents, individual documents, which there's nothing wrong with, of course, but he was suggesting, you know, that so much more use could be used if computational analysis to help with various tasks in the research pipeline. But, you know, often people don't have the time. That's, you know, so people are time poor. So having time to engage with new techniques and new ways of doing things can be an issue. Having expertise to hand, sometimes the language, as I said, language can be daunting or stuff is in all kinds of different places. You know, sometimes getting in touch with specialists in itself can be daunting, you know. So if you're going to talk to a specialist, can you ask them the right questions? You know, if you get in touch with Turing, you're going to probably meet some quite expert people. Sometimes people might not feel confidence that they can do that, you know, because they might have some questions, but they might not feel that level to, although, you know, whenever I spoke to Turing, actually they're very open and helpful people and explain very well a lot of the issues and are very well aware of those things, of course it's how it's the national center. But, you know, are there things that we can do in terms of the interpretation or the bringing people together to consider those kinds of things and maybe try to just shift the ground a little bit so that, you know, computational approaches become a bit more prevalent, you know. It doesn't have to necessarily be a revolution. I mean, it's kind of a revolution going on all around as a tech, but it's really about getting people to engage with these things at a level that they feel comfortable with and then developing those, again, it's developing skills, knowledge and contacts and those sorts of things, which I think is quite good at trying to bring people together around topics like this. And blogs, I mean, I think, you know, Jane, again, I think the blog just provides such a useful set of insights into some of the work that you're doing. And so documenting those things, I think, is, you know, is essential. Yeah, that makes all that makes the difference. You're documenting something practical that you're doing. We have to design these touch points in ways that make sense to institutions at the right level. I mean, we've got a question coming in the chat, but I'll just say that, you know, it seems to me that one of the ways that we can, you know, foster that discussion is around this idea about, in your presentation, Peter, you've got humanists with questions and the idea of, you know, starting from the research question seems to me to be a rich way forward. Rather than say, okay, we've got all this data, what do we do with it? If you come at it from the other side and say, okay, well, who's got questions and let's identify the data sets or the subsets of data that would actually be pertinent or irrelevant to helping that person, you know, progress those research questions is presumably a good way forward. Yeah, making it, yeah, so demonstrating things and then documenting things so you can move to the next stage, really, and have a documentation of what's occurred and trying to answer the question around, how do you get around some of these complications of terminology, I mean, there's a diagram I showed about the two main types is, you know, it's probably as it stands, not that useful because it's still using some of the terminology, but can we explain those things more readily in a way that people feel uncomfortable with to be able to talk about, I think, in the first place? You know, what kind of descriptions can you make? But again, I think based on practice and again, going back to the archival approach of actually doing stuff with data which relates to collections. I think in terms of terminology, I mean, of course, that's always a difficult question, isn't it? And there's domain specific terminology and technical terminology that can be difficult. I guess what I'd say is I'd like to think that this can be one of the strengths of JISC and certainly within our project. You know, I'm an archivist and I'm writing blogs. I'm trying to write blogs for a kind of archival audience, you know, a non-technical audience, people that maybe have an understanding of archives, but they won't, you know, have a knowledge of, have kind of technical knowledge, let alone knowledge of specific machine learning. So I think with so many of these things, you kind of need the technical people that I sit in meetings with and try to follow and understand. And then I try to kind of take that and write that in a way that I think people will understand. I usually run the blog post back past the more technical people and they'll say, oh, this is slightly wrong or right. But, you know, I think maybe that intermediary role is increasingly important in the world that we're now in. Yeah, so I hope that that's one way that we can address the problems with terminology. Now, I think actually coming at it with a slight bit of ignorance to start with actually is actually useful because you've got the questions yourself. Yes. And then over time being able to frame those so you can ask someone, you know, a technical person, you know, a weak person, a person in the archives have team and I passed a blog post I want to do that way. And it's very helpful to have that conversation and to get a bit of critique. And that then led me to a slightly better understanding again and then so it's that interpretation really. And I think trying to, there's a question here about humanists are very much trained to imagine research questions that are reasonable within the scope of their own discipline. And Peter is saying that. I hope I'm pronouncing your name correctly, Peter. But I think that's so true. You know, people are in their domain and they're interested in that domain. But then they may have a recognition that there are things that could be useful but then asking the right questions to get to that useful part to them, to their own practice. I think that can be really tough actually. I just, sorry, Peter, just to go back to also the question about beyond how can you help people imagine using these technologies beyond the paper thinking. So I think if I'm understanding that correctly, just to say one of the things we're gonna be doing on this project is to release our technical outputs on GitHub. And what I very much hope we'll do again is provide not just the kind of, here are the outputs on GitHub, which I find a little bit impenetrable myself, but with some kind of documentation and help and the blog will lead up to that. So obviously that could be a good starting point for some people. You know, if we've already done some work with a number of archives and we can release some of our outputs in that way for people to maybe take them on and reuse them. Indeed, yeah. There are some other initiatives. I'm sorry, I won't immediately, I've got a terrible memory for things, but I won't immediately be able to say what it is, but there is an initiative which is doing some of that actually, is trying to provide documentation on GitHub to explain, you know, to explain the processes that a project has gone through and why the outputs that are there on GitHub are useful and what they're useful for. So I think that, you know, those quite often when people I think involved in technical activities, that documentation can be tricky. So trying to find mechanisms to help people generate documentation and explain things and then trying again to abstract up a bit, maybe a level so that people can then drill down again. Just those kind of approaches to queers really communication, I suppose, really around these things. Because I think so much technology is like that. If you try to make some easy answers, that's probably not it. You've got to base it on some of the practice and then try to make easier answers from that practice. And I think that's what so much, the word that we want to engage in is about is trying to develop it around the actual practical things and then describe those in an effective manner that enable people to gain access, I suppose, that take those things up, whether or not they are humanists who are very much in their own domain. And of course, we've got to remember, of course, that humanists who are in their own domain have started to go out and use machines to help. You call them, you talk humanists, but what does that, you know? I'm interested in the motivations for people to go above and beyond their usual practice or to think creatively about using machine-actional data. And the richness and the rewards one might get. I mean, when you were talking earlier about putting humans in the loop and the supervised machine learning, when I started thinking about this area some time back, I said to our boss, Liam, about these kind of data sets, these big data sets that we might pull together that might be used across the humanities. And there's, you know, with that sort of pushback on that and we realize that generic data sets probably aren't much use. But I also got pushed back on saying that, you know, perhaps there's a role for Gisc for going in and doing some of that supervised learning. And, you know, is that tedious work? I mean, is there a way, you know, to kind of put these data sets together that is rich and rewarding and that information professionals in our communities can come towards this work and really get a lot out of it, like, you know, as well as coming up with these Grand Truth sets, I wonder. I think to answer the question about whether, I think there is a fair amount of work in getting these things ready. I think that's the other thing, isn't it? Having the labor capacity, sorry, to do this. Well, yeah, but, you know, as I said, I think the most efficient way for us to end up with machine learning tools that I think will be useful to potentially create better metadata for better discoverability is going to be to be able to train the tools on data that is already well-labeled. And in order to do that successfully, I think probably it's going to be done at scale. So we've got a number of collections of university collection photographs. So those are very general. They're kind of collections, maybe of the construction of buildings over time and of laboratories and all sorts of aspects of a university's life and history. Now, if a university has a collection like that and they'd like to run machine learning through it and see if they can get something because they haven't had the time to catalog it in that detail, well, I think the way that's gonna work best is if we've already got tools that have been trained on well-labeled photographs. So I'm kind of repeating myself in a way, but if you see what I mean, that won't work so well for one university to do that or whatever. But if we've trained it on a number of similar collections, then there's a much better chance that you'll get something good out of it. That's my understanding from the way that it works. You do need to train these algorithms to produce something useful. So in that scenario, you're talking about utilizing collections which are of a similar nature from each, from say five institutions or whatever. Is that what you mean? So I think in the blog you did describe it. Sorry, Jane. I'm asking now for this conversation. Yeah, I'm trying to give a few examples to make it kind of concrete. So as I said, a collection that relates to kind of university buildings and laboratories or a collection that relates to kitchen and household objects from the interwar period or whatever it is, you can see how you can train the algorithms so that when you get other similar stuff, you might get good results. We've then got like one of the examples of collections is images of fossil fish, for example. So we're looking at that and going, okay, there is AI to help you identify fish, but not to help you identify old photographs of fossils of fish kind of thing. And that's quite a niche thing. So it's still a case of, well, where do you go with that? Because you'd need to kind of train it to recognize these things. And are there lots of collections of photographs of fossilized fish? So those are all the kind of things we're teasing out at the moment. There will be somewhere in the world. Well, yeah, that's why I'm saying if you bring them together, you might be able to do something. Yeah. That's our thinking. Yeah, so with Adele's post of a link to the Interparis Trust, which may be of interest to folks on this call and indeed us. We've only got a couple of minutes left. As well as whether we don't think there's anything in the Q&A box still. So people are hopefully avidly listening, not asking questions, but maybe we should just finish just on a couple of commentary about bias. Because I mean, and representation and it's something that's part of your work in the tank project, isn't it, Jane? And Peter, you and I were talking about it yesterday in terms of what's kind of recognized in terms of bias these days within machine learning and data sets. Peter, at that point, I think someone made to you about human bias and machine bias. Yeah, so I mean, I think the point is probably not entirely new point, the notion that we think if we're dealing with a machine giving us answers, we're much more concerned. We're very concerned that is there bias here or these bias results? Most likely they are going to be because the data, any data that we conceive or we put together has, is likely to have, highly likely to have some biases one kind or another. So the answer to that probably is yes. But then of course, if I go and ask an archivist a question in an archive, we're probably going to get some biases in the response. Now, those biases may not be negative biases. You know, that they have hold biases against someone or something, but more that they've got particular domain of knowledge that they have expertise in that they know about and they might direct us to those particular things because they're familiar to the exclusion of others. So when we're thinking about finding aids and humans as finding aiders, is there a huge difference? Might be a question. Jane, what do you think? Well, yeah, I mean, it's a fascinating and difficult area, isn't it? I think with the tank project, I think that's going to be interesting because we're working mainly with museums. So we're kind of the main archive representative, I suppose. And in the first workshop that we had with tank, they were showing us some labels for art objects and asking which of these was written by machine and which was written by human, that kind of fascinating thing. So they're trying to unpick that kind of area. And again, I think it's working with a lot of other people, working within a community to try and sort these kind of things out is going to be much more effective than working individually. Yeah, as you say, there's always going to be bias of some sort, but actually machine algorithms may do a better job of identifying that en masse than we're going to do, or than we have the time to do potentially. They can make things explicit as well, that those biases are present. It holds the decolonialization agenda in some ways could be supported by showing what's missing, the gaps, and what's not coming to the top, what's not surfacing, those sorts of things. So essential to trying to in some way understand those problems better, in terms of the long-term cataloging, it's following the historical record of the archive and how it was being shaped and those sorts of things. James Baker, programming historian and Southampton is very strong on some of those topics and it's interesting things to say there.