 Mae'r ffyrdd yma yng nghymru Kate Petherbridge, felly mae'r ystyried ynghylch gyrfa Gwyddon Rhysgol, a'r ysgrifennid Gwyddon Rhysgol Ffyrdd. Gwyddon Rhysgol yn ymgyrch ymgyrch, yn ddiweddu y gyrfa'n gyd, ac yn gyntaf i'r wladol, a'r ysgrifennid gyrfa'n gyrfa'n gyrfa'n gyrsgol. Mae'r ysgrifennid yn cyfnod o gyd, a'i meddwl o gydag ysgrifennid ymgyrch yn ymgyrch o'i gyrdd ymgyrch. So, fy amser Sl kingdom has worked with a growing number of authors they have found that many of the myths around OA Publishing still persist, particularly arts humanities and social sciences, and this is a real course of concern. In this presentation Kate will cover some of these and talk through what the experience of working through these areas is like actually in practice. Mae'n gwneud, Kate. Yn ddechrau y White Rhoad, mae'n cyd-dwellwyr ddyn nhw'n cyfathryd. Mae'n ddøch yn ddysgu'r ddysgu'r ddysgu's ddysguiraid. Mae'n ddysgu'r ddysgu'r ddysgu'r ddysgu's ddysgu'r ddysgu. Mae'n ddysgu'r ddysgu'r ddysgu'r ddysgu'r ddysgu's ddysgu'r ddysgu's ddysgu'r ddysgu's ddysgu. Mae'n gyrraedd yma yn y rhan o fewn, a phobl yn arwain ystod, ond rwy'n gwneud y fawr o'r cyffredin iawn yn unig cyfanol ar y cyfrifio yng nghymru ac yn ddiwethaf y lluniau. Rydym yn ymddill, mae Y White Rose Library'n gweithio i'r ffair hynny. Rwy'n gwybod i'n bach i'r ysgrifennu i ddechrau ar y peir wahanol iawn, ychydig o arddangos, ac yn fawr o'r ffordd i'r ysgrifennu. ac mae'n fawr gwawdd y gallwn pan gweithio, sy'n gweithio i rai'r disgrwng y dda o pawr y dyfodol. A rydyn ni'n dweud yn ymwybodol i ymweld â hyn i chi eisiau mewn nai ymddangos, i wneud y fath o bethau ei iddynt ar gyfer y cyfnodol, a am wneudPorc gweithio sicrhau ei ddimen, a gwneud weithio i gael y peth yma. Pryda'n gwneud ychydig o ddweud o'r ddweud o'r ddwy o'ch ddwy o'r ddweud. N consensusั่ and the other thing with... To kick off with the list quite early on one of the things that is quite an irritating myth for me. Is that you can't be a university press if you publish open access. I'm hoping that this is moving away now. But all the way through I think one of the things that will come out of this is that you open access publication is open access because of the license that is applied at the end of the process and the fact that we are an academic publisher that publishes academic content that we have the scope of the university press, we have the admission processes of the university press and the rigor that comes with that, the fact that we license our publications for open use really doesn't take away from how we approach publishing. You really can do both. We're having this conversation in the context of the changing landscape. I know you will know this. There's increasing emphasis on open access for monographs as we see evolving policy and we see differences in focus and I also don't think we can underplay the role that the pandemic has had in in surfacing this debate for our academic colleagues in a way perhaps they've not been able to engage with it before. It's been quite an abstract thing as to why we've wanted open access and I think seeing how it works in practice in terms of removing barriers at the point we really needed barriers removing has focused the mind but I also think it's brought concerns to the head particularly in areas around instrumented and social science where monographs are still very much the publication form that's seen as most relevant and where there are issues that people feel need to be tackled. Talking about this, what have we done? What's our experience? We can bring this conversation so we've published eight monograph volumes so far and we've got seven or so more in various stages of production post commissioning. All of them are in the areas of arts and humanities and I appreciate this is really quite a small sample size but it covers a range of different projects of different types, different scales of volumes and we've seen each one of these as a really strong learning opportunity and the experience we've got from them has been really interesting for us when we compare it to the conversations that's being had particularly around why arts and humanities social science monographs are very difficult. I'm going to launch it into the myths now as they said I'm going to do some of them. We could be here all afternoon if I tried to do all of them. When you see the myths come up please don't roll your eyes and think well we've talked about this we know. I suspect many of you in the room do know however these are still very much live issues that we see all the time when we're talking to academics who are new to OA and who are exploring this for the first time and they are still very much live concerns and we may be overestimating how much success we've had in communicating on some of these points I think. The first one illustrates that open access means there will be no print book. We know that's not the case that there are different models but most open access publishers will will provide a print option however this comes up in conversation with every author that we we talk to every single time even now and it really seems to be that opening barrier for them particularly certainly in arts and humanities around getting to grips with what open access publishing would look like for them and it's a huge concern. I mean we do print on demand and we do sell print copies as a press we've sold of our star copy volumes you sold over 250 copies of the two volumes combined so it's not the case that this is a small scale sling either if you compare it to traditional print one sizes but it's not our primary product you know those 250 print copies go alongside 40 000 plus downloads so it's it's not what we we want to be pushing but it does need to be visible and shown to be viable. It does mean that there's a sub myth I like the idea of a sub myth that actually that's fine but print on demand is rubbish and what we find in conversation is that academics really still feel a lot of them that the print on demand means that someone's printing out a PDF in black and white on a home computer and putting it in a ring binder and posting it and so if we want to really engage and take this forward it would be really good to to show how open access publishing is really compatible with high quality print if that is a requirement. There's the royalty's conversation and I don't know where this comes up from it's raised at meetings I go to at workshops I've been to and it comes around in when you have conversations with advocacy on behalf of the press internally in the institutions I work for there's always a question about but why would academics not want the royalties that come from commercial publishing this is not a conversation I've ever had with an academic who's come to us wanting to do an open access project which is not to say it's not a consideration for some but it's not just it's just not something I've seen so I I don't know whether we are perpetuating that myth um uh other publishers I don't know why we keep having these conversations and it would be useful to put some of these things to bed if we're not if we're creating issues I think there are the not as good myths um and I'm going to talk about these but I would point out again that an open access licence applied at an end of a process shouldn't have any bearing on what the publications that come out of that process are like in terms of of the scholarly nature the quality of the the content and also the quality of the production of the volume itself if a publisher is producing something that's not high quality scholarship or that's not well produced as an end product it's it's a publisher issue not an open access issue and publisher choice is the same with OA is with any other model um it's important that publishers and uh academics engage with with publishers and ask questions about how things will be um but it's it's not the case that putting an open access licence on something should impact how it is produced or what is produced um open access publishers have to be really transparent about the rigor of their commissioning processes because if we're not we don't get to index in the places that we need to to to disseminate the content um and also to have been quite surprised at the level of rigor that we bring to our commissioning process for example and we're not alone in that I mean I can only speak to our processes and for us we have a a process that has them two stages of peer review from on a grass proposal stage which may go around a further loop of peer review if the editorial board asks for revisions to the proposal and when we're commissioned the final manuscript is reviewed as well to see what we are delivered is is what we were hoping for so it's a really big thing for us as well as for the authors that we work with and the the actual outputs being not as good again um it's not the case that that how something is shared and disseminated should really have any connection with the processes that go behind the scenes to to make a a volume of quality however the format is so um applying a cc by license or some sort of creative call of commons license to a publication doesn't have a no copy editing clause in that um and I do wonder whether this in some ways connects different dots in people's heads so people are used to seeing green articles deposited in in repositories in a prepublication format so they still see it as a non-formatted word version or a pdf version or perhaps with track changes on um that that seems to create a misunderstanding that that is open publishing when it's not it's a different a very valid route of of meeting open access requirements for articles but it's not what a published open access um article or book would look like and we do as as all the universities that I'm aware of do all the standard basic services you'd expect to make sure the volume that comes out is of good quality and we've had really good feedback um this was the book review of our first publications that the star car volumes and it was really um validating to see not only the the books themselves but the formats and the way they've been produced and shared be included in that view in such a positive way so so I don't think these things um are the issue they seem to be but publisher choice and exploring a publisher is still going to be important whatever the the end format is going to be third party content now this this is the biggest one that people seem to to to say is the the kind of the issue that makes open access arts humanities and social science monographs not viable who's going to let you use and third party content if you're going to share it under a creative commons licence um in an open access publication and the problem with this is that it's based on a misconception that a creative commons licence apply to a volume covers everything in that book it only applies to the content created by the author for the volume if as with any other type of publication if you're using third party content you need to to get the permission and the rights and and agree the use conditions for that and give the the appropriate credit and rights statement that's true for all publications not just open publications so it's it's it isn't it's not puzzling in some way to see this this conversation focused on on open um and in our experience we've we approach right holders or we support authors in approaching rights holders we have the conversations and when we see we meet concerns a concern is is almost uniformly about use of content in a digital publication it's not anything to do with how that publication is licensed whether it's open or not it's to do with it being a digital publication so the concern is is based around format not about licensing it is not an open access issue however it's not a myth that this does seem to be harder to have these conversations for open access books because they are primarily digital publications and and as to why the the focus is on the licensing as part of this rather than the the format I think it's because because our publications are open it would be a much higher risk if content was used without the correct permission so we're really hot on making sure that everything is done and in place that needs to be um and and this probably means that I certainly know that we are asking authors to engage with this in a way that they haven't before um we we get feedback that authors are are being asked to to to really agree permissions for this volume rather than use something they've had permission for in the past for a different publication for example and I think the fact we're asking them to to how these conversations can can be frustrating and it perpetuates the myth that it's it's to do with with open publications and open access but it's it's not it's to do with the format almost exclusively and there are challenges in talking to rights holders and some are fantastic and really engaged and know that they already understand it and and want to engage with it or they want to know and they want to be part of this change but there are really are challenges and again it's it really is format based when when we see these things the the lack of engagement with what a digital publication looks like and is is is it's the ways understandable I think working within HE and within libraries particularly we are used to the digital format as a fairly standard thing these days but for for lots of places that the people who work with these things are not within the sector and they are not used to being embedded within digital versions of publications and this is not their environment for them a book is still a print thing and when you talk to someone about a digital publication they equate it even though they may use ebooks in their language what they really think about is a website and so content can be licensed in such a way that makes it unusable in a scholarly publication the resolution of the file the size it can be used use of a watermark things that actually on a website aren't necessarily that big a deal but they're in a book are are not something we can deal with and licensing models are based around numbers of copies in a print one style it doesn't work if your your format is primarily digital in a way that you can't track those sorts of uses but it's very doable I mean we recently released the capability brown volume towards the end of last year it's very image heavy very complex in terms of the number of rights holders the different positions there to do but in every case where we approached we we we negotiated permissions in a way that everybody was was happy with there were I think out of their 100 images or so three that I wasn't completely satisfied with how we could use them in the end which you know I don't think is a bad thing but none of those were to do with with open they were all to do with the digital nature of the publication funding I'm not going to pretend that there's myths around funding being a challenge because because this is something that is incredibly valid but I wanted to share with you our context of this so far even now before the the focus moves on to open access monographs in the way we expect it to 40% of the projects that come to us that we're working on have got external grant funding and we are we are having quite regular conversations with people who want costings for publications so that they can include these in their grant proposals and their funding bids at the start and I think this shows how people are now aware of of the need to to have this and and where it can be sourced it's not going to solve everything because that's not viable for everyone um so it is a wider conversation that needs to be had but just for context that's how we've seen it so far so monographs are viable we know this because we've produced quite a few um there are issues apply um but they're they're seen in our open access publishing but aren't specific to open access publishing they are publishing issues authors may need extra support but actually as we do this more and more it's very likely that that it will become easier because conversations will not be the first time that a rights holder is has been engaged things will become more standard and an infrastructure will improve and and this this should be the start of something fantastic now that's where had we been having this conversation a year ago i'll likely have finished but in the time that's passed since and in conversations that I've had with other people it it's really been brought home to me that I think there is a bigger conversation to be had um because as a myth that I have a horrible feeling that we have have helped to to kind of perpetuate if not to cause around why we're doing this in the first place and it's to do with with compliance and when you talk to people authors they do tend to focus on on open as compliance and something they have to do um I don't know again whether this is really to do with the fact the way we've presented green and the the ease with which the act on acceptance approach helped people comply while it being just an extra step in a process they were familiar with um but that hasn't challenged authors or or um you know sector as a whole to engage with with why open and the benefits um and that lack of challenge hasn't changed anything around the publishing status quo and this has led to further push from from uh funders uh from institutions and I think it's important we try and and and combat that you know away isn't the compliance sex as long as libraries didn't go I know what we need to do to make our lives better let's let's create an extra step for people to do that we're going to spend a lot of time checking um it's about really revolutionising the scholarly landscape and the communications making sure that we get maximum benefit from every piece of research that's done because it's done for a reason it must be valuable to someone so why are we not sharing it as broadly as we can so you know it's a value to everybody in that that infrastructure and beyond that we move to a more open publishing landscape so really compliance isn't the point and and that's a conversation I think we really need to drive it's a tool but it's not why we do open we don't do it for its own sake we do it for what it brings so I've been talking about this for a little while I think I first did a thing on a myth busting in 2019 in in York when we could still be in rooms together it was lovely um but more recently as I know that there there was a session that talked about the OAP and OA books toolkit earlier and there's a section in that about myth busting um from a north perspective and I've been looking enough to to work as one of the people working on the new university press toolkit that's coming up next week um under the jisc umbrella which almost feels like that's planned doesn't it um and there's a section in there on myth busting aimed at people who are thinking of starting a university press and I'm also working with Graham Stone from jisc on some events around this hopefully for for late in this year aimed at a range of audiences to help tackle both the overarching myths around how and why we're doing this but also some of the more detailed about how we found working in practice and what you can do to make things happen so thank you so much for letting me speak to you today that's great thank you very much take we've had a few questions come in could we kick off perhaps you talk about there that the bigger myth if I could so say to you there is one myth I could break for you which myth would it be it's tricky because sensibly you think it would be that bigger myth because you know but actually I don't know whether that's that's not something that's in people's consciousness as to it being a problem if you see what I mean and it it almost feels like a practical solution to something like the third party rights issue showing how it works getting some sort of infrastructure to help with that would go some way towards engaging people further and then being a way in to have that wider conversation I'm in two minds about it I'd like to think it's the the hearts and minds culture piece but actually I suspect that it's more about the how at this stage and the showing that it's possible and then the benefits that will become much clearer as we go that's great thank you very much Kate so we have a question about grant funding which wants to understand where the grant funding is coming from whether it's included in ukri funding bids and if so is that an allowable cost to be included in the bid that's not really something I can talk to because it's really the relationship between the author and their funder if you like um we're not necessarily told who the funder that's being approached is when we're asked for a costing we're we're asked to provide a an estimate quote of what a volume of this type would cost the fact we're being asked implies that the funding is there somewhere um and the funders that we have seen bring things I mean the the star core volumes were funded by historically and for example as part of the funding for that star core project um and there are foundations that we've worked with rather than the necessarily particular ukri funding grants I suspect there are people who would answer that question better than I would because we're looking at it from the publisher end rather from that funder academic relationships I can't give that much more detail sorry that's fine but presumably you'd be all in favour of um actually ukri funding bids um being able to actually support open access monograph publishing as part of perhaps the block grant for APCs I mean I the way that I always think about this is if you're funding research why are you funding it if it's not going to be accessible if you think about the the the amount of funding that's given to support research and actually the percentage of that that would then go on an open monograph that would share that funding and increase its impact uh share that research rather than increase its impact and really get the maximum value out of it it seems to me like a a really good investment to make to make the the the research project that is the primary focus of the funding reach its maximum potential if you're doing it then then share it with the world that's great thanks Kate so suggestion here about perhaps potential to move away from the compliance narrative um by perhaps a change of name so called open up monographs rather than open access monographs and I think that anything that engages with people is fine I think that if we're changing terms we need to be clear as to what we're changing and why we're changing I think that open is an area where we have we see lots of things already you know there's there's open research there's open scholarship there's open monographs there's open data there's green there's gold there's diamond there's um and in my previous life when I was a a reader services librarian helping people navigate this until you're actually really embedded within it it's quite confusing as it stands and and I think if there's something that we could agree is a sector that makes it really obvious as to why we're doing it that would be really helpful but I would want to simplify it rather than introduce something else I think yeah good point so lots of compliments coming in on the presentation Kate and a question around do you see many new UK open access university presses setting up again I'm probably not the best person to talk about this um in working on the new new new university press toolkit with with jess and Graham and the other presses that contributed to that there is an obvious appetite for that um I'm approached as someone who who works in this area by institutions who are looking at setting these up and we have had lots of of kind of initial what was it like um conversations um can you tell us what would you do differently so there is definitely appetite and interest out there um what those would look like I think would depend very much on the the the strategic goal of that institution so it may will be that it's not an open access university press it's a university press that does a range of models because there are university presses out there that that have mixed models um it depends on the scale of the operation how much people can invest as to to whether they're going to be doing this at um at what level are you hosting content under a branding and it's the editorial stuff that you're doing what what is it you're doing and I think one of the beauties of the environment at the moment is that because there are so many different flavours of of university presses and academic led presses in this area as well not to forget those that that actually there should be somebody that is a fit for academics when they they've got to find um a publication venue so yes I think there are people coming through who are interested and it'll be really interesting to see in a year's time perhaps what that landscape looks like in terms of new people and what they're they're planning that's fantastic that's very much Kate so perhaps on when we meet in reality um rather than virtual reality next year we'll be able to compare notes and how many more university presses have actually started up let's hope it's it's a few so very much to Kate thank you so much for your time and we move on to our second speaker this afternoon um who is going to talk to us about the future of discovery artificial scholars and automated collections so I'd like to welcome Ed Faye who's director of library services and university librarian at Bristol University so artificial intelligence is increasingly prevalent across areas of society and is used in ways which inform or even replace human judgment the operations of artificial intelligence however are somewhat opaque and increasingly suspected of bias in the field of libraries and scholarly communication artificial intelligence is applied in numerous ways to augment the researcher workflow this leads to issues which should be subject to critical analysis we have opportunities to apply our professional practice to improve technology through a focus on the human and the ethical hello everyone so I'd like to talk today a little bit about some of the specifics of artificial intelligence we've had some great framings this week about where we see the effects of this technology within society and where we're seeing some of the emerging risks what I'd like to do today is just dig a little bit deeper into what we're seeing within higher education particularly scholarly communication and then where I see implications for libraries so the picture you can see there is from some work that's been done by massive attack which is a Bristol band of course and they undertook a collaboration with storage technologists and music theorists and what they did was encode mezzanine which was one of their big albums in the 90s into DNA and this references very much the way in which information storage is being transformed and also what it means for information transmission because they then stored this DNA within graffiti spray cans which is what you can see there they then used artificial intelligence to recombine the album so what you can see in the background is a VR performance which is remixing the album based on human feedback and machine learning so referencing information retrieval and creativity so I'd like to talk just a little bit about artificial intelligence and this is a question that has been posed for a while and it was very helpfully posed to me as I walked through a Bristol underpass are you thinking and I just want to talk a little bit about how that's been applied to computers but before that just to note as we've heard earlier in the week that we very much see implications of artificial intelligence right across society and UKRI has commented very recently about the impacts that this is likely to have on research really across all disciplines we've also heard very much about the the sort of limits and the cautions to purely technological analyses so particularly chaos on the first day warned of the loud voices and skewed perspectives that can arise from just taking technology as primary and we see in the Gartner hype cycle for artificial intelligence a lot of caution about where this technology may be going and how mature and effective it actually is but this debate has been going on for a long time you can trace the history of automata back centuries across various civilizations through China, Mesopotamia and ancient Greece but where we might first recognise this is applied to modern computers around the advent of the second industrial revolution so Ada Lovelace working on the analytical engine commonly called the first computer programmer said that it has no potentials to originate anything it can do whatever we know how we order it to perform it can follow analysis that has no power of anticipating any any analytical revolutions or truth it's provinces to assist us in making available what we are already acquainted with and I think we might find that to be quite prophetic when we consider about the bias which we're seeing within machine learning at the moment post world war two then is when this question start to be to be posed more explicitly so we had van of our bush talking about ways of associative storage of information and Alan Turing asking a very explicit question cam machines think nor that venus said immediately after the war that long before public awareness of the atomic bomb we were hearing the presence of another another social potentiality of unheard of importance for good and for evil and that's I think a very important ethical reflection for us today so I do want to talk a little bit about the operations of machine learning because I think it's important that there is growing awareness of this it's interesting that Finland as a country have a goal of exposing one percent of their population to the operations of artificial intelligence and the question has been posed within libraries what would it mean for librarians to reach that same level of awareness so what we're seeing today is often referred to as the third wave of artificial intelligence is actually just the application of methods which have been around since the 70s or the 80s but applied today with modern computer power and importantly the availability of data at scale so at their core machine learning is the main application of artificial intelligence today that we are seeing these giant leaps forward in and at their core they are basically statistical models the issue is that there are so many layers of complexity that it's not always possible to explain the way that a machine reaches a particular outcome so what we're seeing here in machine learning is a shift in paradigm and computing so normally what you would see in a machine is that you would program it to follow certain rules apply that to data and then produce an answer but in the world of machine learning what we see is in fact the machines are given the input and then they're given the output so they're given data and they're given answers and they come up with the rules themselves and this is where the risks emerge around abstraction and this black box in the middle because it's not necessarily possible to understand the ways in which the machine has decided to link the input in this case which is an image of George Washington with the output which is the tagging of that image as George and it's interesting to look then a little bit under the hood at how we reach that point so we're probably all very familiar with the recapture that appears across various websites it started as a way to correct OCR and then more recently it's moved on to be training image recognition so those are the first two that you can see up the top and perhaps you think those are not particularly problematic if you're just you know correcting a spelling error or looking for a you know a zebra crossing or a bike within a series of images but if you look at the data sets which sit behind the training of a lot of the artificial intelligences that we see in the world today deeper problems emerge so image net is one of the largest machine learning training data sets which has been used since really 2012 in particular to take the significant leaps forwards where we have seen artificial intelligence match or exceed human performance in image recognition and the ways in which this is constructed is particularly interesting so what you can see here are snippets from the image net database so it's based on a classification hierarchy called word net that has had very very little scrutiny and the way in which images were categorised against this it was put out to Mechanical Turk and this is a platform in a cloud platform run by Amazon where anyone in the world is paid per click in order for the the work that they perform and it is this categorisation of images that is behind a lot of the artificial intelligence in the image recognition that we see today this is also the case if we think about the GPT three language model which we heard about earlier in the week this is trained based on scraping data from Wikipedia and from Reddit amongst other sources and so there is a potential here for these models to be very significantly skewed in the ways that they believe that they link data to the output and there is also a very limited set of ethical oversights around the way that these things have been put together and it's really only in the last year maybe 18 months that serious ethical looks have been taken at the image net data which you could say is particularly overdue and so the key point here is that artificial intelligence will only reproduce what is available in its data sets and it will extrapolate from those about the norms that the AI assumes as human values. If we think then about applications within scholarly communication I picked this image in particular from another street in Bristol my implication here is either that artificial intelligence itself is locked or perhaps that scholarly communication is locked behind certain barriers and I deliberately want to let that ambiguity rest but what we see in the world of scholarly communication is this increasingly datafied world and it is this datafied world that is leading to these advances in machine intelligence and machine learning so this is a visual representation of academic big data it maps the open data that is available in different academic disciplines and we can see emerging the classic big data trends that have grounded many of the applications of artificial intelligence that we see today crudely speaking they are volume velocity and variety so as we all know the rates of publications are increasing significantly the rate of data production we are seeing increasing volumes of journals data archive books and we're also seeing those increasingly fragmented across the scholarly landscape so available on different publisher platforms in different discovery services and so forth and a real variety in the kinds of information that is available in the scholarly world we are also seeing the increasing rise of social network influenced reputation both an individual academic level but also as those signals being used to drive reputation within search engines and this leads to a lot of issues around veracity about the provenance of information the authenticity of information and these two issues in trust and confidence we can see many different kinds of artificial intelligence applied within scholarly communication I'm not going to dwell too much on those on the interest of time but what we can see here is a representation of some of the commonly used text mining tools within social sciences and this is about 80 or so tools and this is just an example from one academic discipline some of these are open source and openly available other these are commercial products and others are embedded within different products which may be in use in different ways what's of particular interest is a survey conducted in 2019 about publisher adoption of artificial intelligence which showed that over two thirds of publishers are using at least one artificial intelligence tool within their different workflows and that more are planning to do this are the built-in house or adopted from the world but crucially here only 10% of publishers are actively checking these artificial intelligence tools for bias and the uses are really quite interesting so they can stretch anything from the back end workflow around metadata extraction around classification around summarisation even uses within the peer review workflow in some cases right the way through to the front end interfaces that are used by academics and students to conduct search and also around recommender systems which are themselves often built using attention data from library users so the ways in which academics and students interact with those platforms and this raises significant questions about the ways in which this could be skewing perceptions of a knowledge domain so the word on the street is that there is significant disquiet rage against the algorithm for anyone I think who went through the exams fiasco last summer or who has friends working in admissions although that's not strictly AI I did detect a certain rage against the algorithm at that point and in some levels this can be amusing so for anyone with a religious bent there is something about the disruption of the spiritual and timeless there in the way that a game of cricket has been classified and also something related to biblical scripture where at one point in time Google translates providing a translation of curiae a liaison as sir take it easy if you repeat that search now it's interesting that it has been manually corrected but it does show some sort of amusing examples about the ways that this can go wrong but this is actually a very serious issue so what we increasingly see is artificial intelligence used to either walk mentor to replace human decision making and if we consider the risks to the quality of those outcomes when those outcomes are then used in decision making significant social risks emerge so the Turing Institute describes the kinds of ethical issues that can emerge around bias discrimination denial of personal autonomy invasion of privacy unexplainable or unreliable outcomes and social isolation from the advent of filter bubbles radicalization this information and deep fakes this analysis from New York University tracks the number of publicly sort of recognized ai fails within a single year and it categorizes them against bias misuse of facial facial recognition and surveillance impacts of technology on the climate and workers life workers rights so it's clear there can be very significant impacts this is a just a visual example of a couple of those types of biases they exist within commonly used artificial intelligence libraries today so the first is a reconstruction of Barack Obama's face and interestingly the artificial intelligence has decided that in fact he's white the second example there is putting two US senators through the same image recognition library and we can see very clearly a gender skew here in the types of characteristics that are assigned to each of these individuals so how is this relevant to scholarly communication so Sophia Noble in her book algorithms of oppression captures this I think really succinctly she points out that search is not just a way of retrieving results it's not just a way of finding a paper or finding an article but actually the way in which search operates it doesn't present pages but it structures knowledge and there's something here which has been called the ontological weight of search engines if you can't find something does it actually exist and welcome collection in their design principles for their new discovery layers recommends very specifically the risks which can emerge from distorting perceptions of what exists within your discovery tool so this is very much the case that what these algorithms are selecting to present as relevant as interesting to you as a scholar as a student very much determine what you believe exists within the knowledge of you within the domain of human knowledge so are there ways in which this can be addressed so this image is again from Bristol and this was during the protest that took place around Black Lives Matter and highlighted here is an adjustment to the memorial which says that this this statue was not erected it was rejected and in the same way that statues represent a consciously constructed form of history we absolutely have to recognise the forms of knowledge that are being presented by the systems that present library collections and that we are providing out to our communities and this is as much as about recognising the cultural issues and the context in which this bias arises as looking more deeply into the technology itself so if we think back to the ways in which AI learns and which AI trains the question in front of us is how can we best critique the cultural context of artificial intelligence from what does it learn by who is it trained and we've also heard about the advent of explainable AI so this is an attempt to address some of these concerns in a systematic and structured way so the Alan Turing Institute in a consultation last year arrived at this set of principles about ways in which AI models can improve their reliability safety and robustness it's very interesting to note that the ways they went about this was through a series of citizen juries working also with industry providers of these technologies and none of these and none of these principles are particularly about technology there are a lot about context and there are a lot about culture so they recommend that AI systems need to be transparent which means that we need to be open about the ways that these systems are used and you could term this algorithmic awareness it's crucial that they are accountable so that there is some oversight of the outcomes and the decisions that these tools are reaching so that even if they remain a black box you are still able to challenge the context in which they are used it's crucial to recognise the context in which these tools are applied they have to be appropriate to the setting and that's about choosing which tools to use and when and recognising that different tools have different emphasis and working very different ways and then importantly reflecting on impact so really looking beyond and around these tools to understand the purpose to which they were put and the observable effects they have on the community and the society around them and this is very much looking for issues of bias and quality in the ways that these things operate so fortunately for us within the last 12 18 months the the sort of issues around transparency have started to be picked up in earnest within the library and archive community and we can see a number of people and organisations here who have been active in this space and this very much relates to the ways in which we can understand artificial intelligence and increase transparency of how this operations and the way that it is being applied within scholarly communication but the question before us is what more could we do in order to address some of these things so this question was also asked of me as I took another walk through Bristol in last lockdown which I felt was was rather existential and slightly provocative frankly and it really asked why are you here and if we ask that of ourselves it seems to me that we have some specific opportunities in front of us it's clear that there is an inevitability about the influx of artificial intelligence within our professional practice artificial intelligence will be deployed within the scholarly communications workflow it already is being deployed it is inevitable that artificial intelligence will interact with our collections whether that's to mine knowledge from them or to be trained on the knowledge which exists within it and we will have to deploy artificial intelligence in order to manage our future digital collections and that's likely to include the need to archive the operations and potentially the algorithms themselves as they are such key components within our knowledge infrastructures so the questions in front of us is that information professionals and really how do we best equip our communities to navigate these information realities to critique algorithmic mediation of these infrastructures and to identify and mitigate bias fortunately I think there are several themes emerging and you can see this cutting across a lot of the reporting and analysis which is out there already and these very much link to the ways in which we work already so some of these are local and maybe to be applied within our own libraries and others are ways in which we need to engage and influence the sector more widely and if we consider the Turing report many of these relate to transparency accountability context and impact they're not in themselves technological aims the first opportunity is to really understand the information and needs and behaviors of our communities in detail through partnership working in-depth user research and using this knowledge to drive the design and critique of information seeking artificial intelligence systems. We have a role in curating corpore and curating data both for training artificial intelligence for advocating for a focus on ethical considerations and identification in bias and ensuring that provenance and authenticity drive the preservation of our digital cultural heritage. Information literacy is clearly a crucial way in which we can support our communities training development supports to build awareness of how to evaluate trust not just in information sources but in artificial intelligence itself relevant not only to study within the academy within higher education but also in the ways in which we engage as digital citizens within an equitable society. A very specific opportunity around assuring the quality of artificial intelligence systems we are collaborative purchasers of technology on many fronts and we have the opportunities here to help draft system specifications to make them more transparent and ensure that we really critique the ways in which these systems operate in open and transparent ways and finally around open access through advocating continuously for more open quality academic information this is only going to reduce the siloing of information and provide artificial intelligence with greater opportunities to avoid bias and gain the ability to extract and link knowledge across across the entire scholarly record an ongoing production of human cultural heritage. So finally I think it's inevitable that artificial intelligence is going to impact on our work in many ways it already is. Ethical and cultural issues are still emergent within society and within academic disciplines and they require ongoing debate and engagement and I think that libraries and archives and information professionals have very specific opportunities but also I would say responsibilities to engage in these debates and that's from me. That's really interesting panc said that's a real call to arms as well could I kick off with a question for you around because you've mentioned our communities for instance I was just wondering if you've got any thoughts about what we should be talking to our staff about how we can encourage awareness what kind of skills development we should be looking for for our staff in particular to just help us with this issue. Yeah of course I think what's really interesting when you look at the reports that are out there is that these impacts but opportunities as well cross right across our teams so it's not I think the case that we need to be considering how we start to develop an ability to engage in a very deep technical way. I think we need a level of understanding of these technologies and the ways in which they operate but actually it's the implications which are far more interesting and I think where the urgency is about action and collaborative action. So my sense is that this crosses right across libraries and right across teams and where we need to start is by raising awareness around really some of the basics and we need to join the dots between what we see in the news what we see in society about concerns and then tie this back very much to our realities and the ways in which these things already present within our libraries and for our communities and actually then demystify the ways in which you can engage with these and I really like some of the sector reports around the ways in which this is relevant to different teams and is actually about understanding the context understanding the culture understanding the human aspects we already understand those things we can observe the technology critique the technology without being computer scientists or deep tech people and I think that's where we need to start and that is where the gap is as well frankly. So there's a question coming about the role of some of our academics in this piece so with most of our institutions there are academics who specialise in A&I not just the technological application but particularly around ethics so do you think there's an opportunity of working more closely with our colleagues within computer science departments et cetera around this issue? I think so I think this speaks to the theme that we're hearing about the ways in which libraries partner with the research process so what we see around these ethical debates is inherent into disciplinarity so we see social scientists working with computer scientists we see domain specialists reaching out to get greater awareness of algorithmic approaches right across the piece and I think that's a very natural space for libraries to operate I think there will always be some grey area and lines between where this is you know deep academic research and then where this is about maybe public engagement in kind of in sort of areas of open access areas of discovery where there are perhaps different kinds of opportunities for collaboration but those are what we can observe I think within scholarly disciplines is that this remains emergent and that's really bringing these perspectives into those conversations is something that's often welcomed so within Bristol we have a data science sort of network they have an ethical group I turned up and said I'm from the library and everyone was great what do you bring so I think these doors are open and there is a willingness really to engage and what we see in the ways that these debates are formed they are being formed by our early career researchers they are being formed at the interface between disciplines and they are being formed by people who really want to engage and to take this very seriously so I think all of those potentialities are there if we step up and take those opportunities that's great thank you and extending that kind of thinking about working in partnership what about our suppliers how do we should we be working our suppliers in intentional AI yes it's really interesting when you look at the debates that are taking place within the publishing industry and you look at some of those stats around adoption so I think there's the larger issue of how we engage with our suppliers right now and we know there is a sort of particularly big play up at the moment where we are entering into negotiations nationally it's very interesting that that company has fairly recently hired around 200 data scientists to work on the corpus of scholarly information which is within their collection within the freedom collection so this is very much happening so I would say we have leverages procurement consortium we spend millions and millions and millions we write the specifications that have to be met there is a degree of power within that relationship that we need to explore further and this is about when we are procuring platform services when we are procuring discovery services we need to be writing these kinds of things in around transparency I think what we see as well is very interesting moves around the open source space so what we know as well is that a lot of commercial products are built on open source technologies so kind of regardless of what some of our commercial suppliers are doing there are actually ways into their technology supply chain to engage with those open source technologies and if we can become active partners in those then that is a route to bring that openness and that transparency in different ways that's great thank you very much Ed have no more questions coming at the moment so in order to keep strictly to time we'll move on so thank you very much for your presentation most enjoyable when I say very very challenging great call to arms we look forward to looking at it once more great to review it again so we move on to our final presentation for this afternoon session entitled transforming the library services platform why the future of libraries is open and our presenter is Laura Wright and Laura is the assistant director for metadata production at Cornell University library so Laura is a cataloger and an inveterate tinkerer who was frustrated with library systems and now is inspired by the folio communities approach of librarians designing tools for libraries she's also interested in inclusive descriptive practices she's the assistant director for metadata production at Cornell University library in Africa New York where she lives with her partner and their three dogs and I'm told there may be extra points if you spot the dogs in Laura's presentation so welcome Laura good to see you have us have you with us this afternoon and we look forward to hearing from you there thank you very much Liz and hold on while I get my screens situated so hello um good afternoon or if you're in my time zone good morning and um my name has changed in the past year I'm now Laura Daniels I work at Cornell University library and I am also the convener of the folio metadata management special interest group and I'm honoured to talk today about transforming the library services platform why the future of libraries is open so for most of my more than 20 years in libraries I have been a cataloger of some sorts including working with maps serials government documents serials ebooks and more serials I was also at my previous institution responsible for batch loading and record load profiles and I'm sorry I am so distracted by chat I'm going to have to turn it off okay so I am someone who can't help looking at how our tools shape not only our workflows but our fundamental understanding of our work itself and I believe our tools need to be flexible and adaptable that we need to be in control of our tools rather than relying on vendors to tell us what we can and can't do with our workflows and our data so there are three basic questions around which I've structured this presentation what is folio why is Cornell choosing folio and how and when are we implementing folio I was tempted to add a fourth thing here and that's what excites and scares me the most about folio being part of this groundbreaking project and community is both scary at times and exciting I'm usually pretty much an anti cheerleader but I'm truly enthusiastic about the work we are doing in the folio community folio stands for the future of libraries is open here's one definition and this is taken from the folio.org website the project is a collaboration of librarians developers and vendors using an agile development process to rethink library technology it's significant that this begins with collaboration folio is far more than the set of library management apps with that name it is an open international community where design and development work is grounded in the requirements identified in various special interest groups known as SIGS by subject matter experts in other words the tools are being developed to fit the needs as the needs of users as framed by the users folio is by librarians and for librarians here's another way of framing it and this is from one of my colleagues at Cornell folio is an international collective response to the trend towards commercial consolidation among library management systems vendors I have stolen this image from Harry Kaplanion of EBSCO and one way is that folio is transforming libraries is it's microservices architecture our needs will change they will continue to change and we need to be able to change our tools to accommodate our needs not the other way around this model comes with some challenges cross app communication is needed and the best way to accommodate that is not always clear folio like all tools has limitations and drawbacks no systems or platforms could possibly meet all the needs and desires that I can identify and there never will be that magical metadata wand I dream of but as we identify and prioritize needs there will be the possibility to adapt existing apps or to create new ones without reimagining and rebuilding an entire system this partnership between libraries and vendors is also significant so just as publishers and libraries do vendors and libraries have different goals different models different values different bottom lines libraries being completely reliant on vendors for tools is not sustainable folio partners include a number of libraries and the open library foundation developers from some of those libraries as well as from various other organizations and a growing number of vendors now I would be lying if I said the relationships among all these different entities and all the people representing them are completely harmonious we disagree often sometimes vehemently and then as a community we decide both what needs are most important for the community and what is feasible to do with our collective resources identifying needs outlining solutions and prioritizing them are ongoing work that we envision will slow down as the products and the community itself mature but will never completely end and in in acknowledgement of this the community has been working actively on creating a sustainable governance model part of the benefit of engagement with this community for me has been regular interaction among my folio metadata colleagues at institutions around the world this is an opportunity for us to share solutions to common problems for camaraderie and even for emotional support as one of the sig members said a while back sometimes complaining together is team building and in the folio community complaining together sometimes also leads to the development of new solutions here are some of the reasons that cornell is choosing folio it is collaborative flexible independent and affordable I've already touched on collaboration flexibility includes the microservices architecture and the opportunity because folio is open source for anyone to build and share new apps there's also a flexibility specific to library metadata that is especially important to me as a cataloger and that is that the bibliographic data format in folio is not necessarily mark so this is a screenshot from Cornell's training environment this is the inventory app which is where basic bibliographic metadata reside enough metadata to identify a resource in order to support library functions such as acquisitions or circulation the folio instance or bibliographic data might be derived from an underlying mark bibliographic record as it is here but it could be native folio with no underlying source or in the future it might also be derived from for example Dublin core record sorry Dublin core record or from bib frame data and this view source and edit in quick mark action options are available when there is an underlying mark record but inventory and quick mark are not envisioned as full featured cataloging tools at Cornell we will be doing most of our cataloging in OCLC connection libraries might also choose to create inventory instance records directly in folio for example it rains a lot in Ithaca it's raining here today and we lend umbrellas at least when we're lucky enough to be on campus and right now we have mark records for these in our catalog as well as various office supplies and equipment in folio there is no need to use mark format to account for something like an umbrella a room key or a power cord we are also using this migration as an opportunity to clean up the holdings and items for these records making them more internally consistent and easier for circulation staff to find and use these are not records we display in our public catalog our public facing discovery will still leverage underlying source data whenever it exists based on open source code folio is and always will be independent at Cornell we also use blacklight which is another open source tool for our public catalog and over the years we've developed various automated processes such as querying external sources for additional metadata it's important for us to have complete access to and control of all our data so while it is true that open source like open access often means free like a puppy not free like beer there is certainly still a cost to implementing and maintaining any library services platform open source tools are more affordable than many commercial options it's important to to acknowledge that Cornell is a relatively large relatively well-funded institution at least compared to many others in the united states i hear people from smaller institutions say they just don't have the staff knowledge or resources to run their own systems here is another place where the library vendor partnerships in the folio community come into play there are various hosting and support options offered by among others index data by water solutions and ebsco libraries also can choose to implement only the apps they want and need chalmers university in sweden for example the first library to go live on folio in 2019 does not use folio's acquisitions apps in their former system they had to pay for the acquisitions module even though they never used it with folio we can allocate our resources where and how we see fit another fiscal advantage to folio is that smaller institutions benefit from the work done by those with more resources for example five libraries who plan to implement this year or next including cornell are sharing the cost for index data to build a way within the inventory user interface to directly import and overlay single bibliographic records from oclc's world cat this is a basic functionality that we decided we could not go live without the cost for each of these libraries is lower as we're sharing it and it will be available starting with the iris release in may of this year to any library who wants to use it these are screen captures from one of the development environments as this functionality is not yet actually live in production environments cornell is already live with the folio e resource management or erm suite of apps and that comprises licenses agreements organizations and eholdings integrated with the ebsco knowledge base an example of what this allows us to do is this we add a license id to individual records in our catalogue index this value then allows us to make calls into folio that pull the accurate up to date license data which we can then display to patrons through the terms of use link once we complete our full migration in july of 2021 so that is only a few months from now our er m will be integrated with the rest of our library data most notably acquisitions for the first time this will reduce some double entry and will enable us to more effectively connect payment information to the specific resources being paid for as with any system migration we have been asking ourselves a lot of questions we've been reviewing location codes for instance doing a lot of data cleanup and some decisions are more more folio specific such as the format for holdings data we currently use mark format for holdings data affectionately known as muff head but we plan to use the native folio holdings format when we might migrate this will make our holdings management easier we hope and we are reviewing our mappings now to make sure we will not lose any granularity that we need for reporting purposes such as data in action notes that are currently coded as 583s we've also been playing with how we want to map bibliographic notes here is a snippet of the jason mapping for the mark 382 field which is the medium of performance in our test environment we have mapped 382 383 and 384 fields to a music details note and this is how that note displays in the inventory instance you can also see here the local statistical note this is a product of our experimentation with how we want to record cataloging related statistics we currently record these in 9xx fields in mark bibliographic records we would like to get those out of mark but while we are waiting for the development and implementation of some fields specifically intended for this sort of tracking we have to figure out where we're going to put this data this shows both some of the benefits and drawbacks of implementing a system as it is still being developed as well as how flexibility can be at times a little daunting our staff have many questions as we begin training in folio these include what can folio do that our current system cannot and what does our current system do that folio will not these are a few of our answers keeping in mind that what folio can do will continue to change based in part on available development resources and in part on what needs the community identifies and prioritizes and I won't read all of these to you but one aspect that staff definitely have more appreciation for now after a year of remote work is not having to use remote desktop when we're not on site in order to access our catalog being involved with the development of folio has made me rethink and question much about how we do our work and why I've learned a great deal from my colleagues both about what we have in common and how much variation there is across institutions we have shared our questions our frustrations and sometimes our solutions often in the course of a discussion someone says that's not the way we do it now but we're not trying to recreate the way we do it now we're trying to meet the needs that the way we do it now may or may not be meeting rethinking our processes is laborious and at times painful it also gives us the opportunity to consider not only why we do what we do but how we can do it better thank you and I've included my email here feel free to reach out if you have questions comments just want to chat and also the folio website and the dogs in sweaters thanks very much Laura that that was great and I love seeing pictures of the puppies I have to ask are those home knit sweaters they are the dogs were cold and so was I that's fantastic and so we do have a couple of questions come in before I ask you a question I'd like to ask you so the first is they absolutely love free like a puppy not like beer absolutely and so cost being a very interesting point so how do the overall costs and that includes kind of staffing and local library technical staffing time required to actually maintain and develop folio running folio how does that really compare with the outright purchase rental because we do have a live in mind system and I have to say that depends I also have to be very honest the development costs initially for those of us who have been paying for development are far exceed what we have been spending the development itself is very expensive but we anticipate once the development is at a minimum I don't think development will ever stop that that will not be the case there's I you one has to host one has to pay for so one can host oneself one can outsource the hosting a lot of us are outsourcing both hosting and migration but one could outsource migration services but not hosting um but there will be no licensing paid to a vendor and there will be no reliance on that support which quite honestly we had it was a bit ironic it was during a the last in person folio conference our our ILS went down and our ILS support was completely unable to help and a number of people said what are we paying that support fee for so so it's those sorts of of external supports and also the the idea it's very appealing to me if I say I really need for instance I've always wanted to be able to easily code something that's open access there's not a great way for us to filter in our public discovery on what is or isn't open access and at least if in our catalog that we run ourselves and I would love to be able to add a code to all of our records while the systems I've worked with in the past sorry you can't add any codes we can't do that for you so you could add a text field but we know how reliable text is okay that's great thank you um some libraries do struggle with implementing open source if it's not really supported within the institution and I think a number of us would be familiar with with a line um but is it definitely against an open access open source movement is it is the use of a wide spreading in core now we have we have some other open source and honestly apparently we forked the code on it a long time ago not the library the the institution and that's why part of why we are actually going to be hosted by EBSCO for the first three years the Cornell as an larger institution didn't want us to be self-hosted because they wanted to avoid the the pitfalls in the past they didn't believe us that we wouldn't fork the code so there is some support but we've also had to advocate for it and part of that has been saying look how much we are spending every year on this other system that is no longer going to be supported and what support we have isn't adequate to our needs okay thank you I was particularly struck with your um complaining together is team building and I wondered if particularly this time of pandemic how's that pandemic provided particular opportunities or challenges for you to invent a failure yes yes it has I because because it's an international project we were already on zoom um so we were very well poised for that I hated the word pivot that we all heard last year but that pivot for the folio community there wasn't much of a pivot and because of that there are two two things one was we were all dealing with a lot of stress and it was proposed that we not have folio meetings for a month and I reached out to the the metadata management group that I convened and said shall we cancel our meetings and they said please don't this is the only normal thing we have left the only normal spot in our week is this meeting that we're used to going to so it was really nice to have that um and it's also been really really helpful to hear what other institutions not just in the US but around the world have been doing how they've been managing their services are they allowing catalogers to take materials home for instance to work on um so it's it's been a great support system thank you and it's just wonderful to bring ed back in would you mind rejoining the conversation ed if you are available hello yes I'm just thinking that we've got the two of you in the room now and the thought about well we have a folio is is putting the the control in the hands of librarians in terms of technologies and how we might think of of um really of of librarian in AI taking control so I was wondering if the two of you are very short quick conversation on AI libraries librarians as the kind of behind that that control of what's going on yeah great I mean Laura I'd love to hear your thoughts from from your perspective in folio it's hard for me to have cogent thoughts because I'm not engaged with AI but one thing that strikes me what you talked about Ed and what I've heard others talking about throughout this conference thus far is really the focus always coming back to the humans and the human why are we doing any of this it's to serve humans the resources we're describing and making available are created by humans and so having that human component like we have the the humans wanting to use the system and then wanting to be able to be able to control what we can and can't do with the system it seems very relevant to the humans being able to to work with the AI and to understand the AI I love the I love the idea of opening up the black box I love the way you were describing that folio is being designed without preconceptions some ways actually what do we want it to do to the extent you can to open that up and actually that is exactly where that that human can come through isn't it I wondered about the kind of a cruel of community interest to the platform and particularly some of those industry relationships so EBSCO have a huge reach in terms of their data infrastructure and their index so actually if there is an interface there where a kind of community interest can accrue and then a more dynamic conversation can happen in those data layers and how that is open and visible and transparent and engaged I wonder about the sort of willingness of the industrial partners in that to engage there I wonder too because that sounds fantastic and that's something I have struggled with in the past with working with a discovery layer as opposed to the more traditional opac where I can't get at the data behind it so I don't know why it's doing that and I can't fix it myself absolutely yeah and you know if that can be spun as corporate social responsibility as good actors working visibly and ethically in an engaged community way you know there's a good way forward perhaps for all stakeholders in that relationship and I do hope that these partnerships might be a first very small step toward changing those relationships that's really interesting thank you we do have one last question come in for you Laura a very specific question can you partner with other choices of knowledge base or other sources of bibliographic records in addition to the examples you showed in your presentation the first with the knowledge base the earliest integrations have been with EBSCO and also with go kb other integrations are planned but that is somewhat dependent on the owners of that content and those platforms being willing to work to integrate with folio in theory it should integrate with anything that can take an api call it's all api based and then in terms of bibliographic data absolutely chalmers actually ingests data from the swedish union library which is actually not marked data it's it's linked data the german libraries are bringing in data the format I believe is called pika um they're not using oclc the reason we built that oclc we're building that oclc connection is so many us libraries to uh to um really rely on oclc but you can currently bring mark mark bibliographic data in from anywhere if you have a mark file