 Mae'n ddweud i gael Giles Bergl, oedd y University of Oxford, ym mhwyl i'r cyfnod o'r ddweud o gwaith o gweithio a'r llwyddiadau a'r llwyddiadau yn y Llyfrgell Llyfrgell yng Nghymru. Giles yn y Llyfrgell Llyfrgell, yn y llwyddiadau a'r llwyddiadau yn y Llyfrgell yng Nghymru, ac mae'n ddweud i'r Llyfrgell o'r cyfnodau a'r aeis yn y Llyfrgell a'r llwyddiadau o'r cyfrifio cerddol yn y llwyddiadau yn y llwyddiadau popular. Wrth gwrs, Giles, i chi ei gael. Diolch ar bobl, Andrew, ac roedd yn digwydd i gael ei wneud o'r gwaith sefydliadol fyddwn ni'n ddweud yr anoddiad yn oed ar y byd, ac i gael gwneud am y gwir. So eu gwneud o hynny i ddweud ar y ffasgfynedd i gynnig o bobl o Gymraeg yng Nghymru o'r Llyfrgell yn Ysgrifelidol Llyfrgell i Gael Gilell yma yn yr ysgrifelidol Treadol. I'm based in a computer vision research group, which increasingly is an AI research group in the Department of Engineering and Oxford. The purpose of the collaboration was to apply some of our methods to some materials in the National Library's data foundry. Specifically, it's chat books printed in Scotland data sets, which consists of 47,000, over 47,000 image files, metadata for those files. It also includes OCR. We didn't actually use it on this collaboration, but we also used very importantly authorized curated data from the National Library from their digital gallery. So, standardised terms for printers and publishers and so forth, and that was really important, as I hope it's simply clear. Here is one of the items in the National Digital Gallery, a chat book printed in Scotland. If you're not familiar with the genre of chat books, they are small, cheap, printed books. So, a poetry, prose, drama are very occasionally songs very often sold and distributed and sold by travelling peddlers or chat men. Often illustrated, the illustrations have sometimes had a kind of a rough right at the hands of chat book scholars Ed Cowan and Mike Patterson, and it was an excellent guide to the genre in Scotland. Note that it was often felt necessary to ornament the front cover of a picture and a woodcut usually service purpose. It was fairly crudely executed and made only indirectly if any at all to the content. So, two parts of this, that these illustrations are cheap and simple, and also they're not necessarily related to the content, but they are kind of free-floating means almost. Here is an original chat book printed block used in Newcastle in the 18th century in the British Museum. I'm showing this in this rather nice RTI view where one has light from various angles and I'm toggling the light back and forth and showing this simulated light, which is very nice for kind of showing the grooves and the grain of the wood and other features as a book of story already to lighten. And on the left, you can see an impression of that specific block. The specific block was one of the things we wanted to investigate, we wanted to track during this research. The procedure we carried out, we have three steps and I've given a brief epitome of the technical methods of the step. The steps were first finding the illustrations using an object detection, CNN, or convolutional neural network, and for the people interested, I've given the name of that particular CNN, further technical details are available on our website. Next step was to match and group the illustrations per unique block. Again, there's some description of the methodology there, SIF features, a form of computer vision that predates the current AI boom, still very useful. And last I did a bit of experimental work in classifying the subjects or content of the illustrations with, again, a CNN. I'm not sure how many people are with these terms, but I'll show them in action through this collaboration. So step one, finding illustrations. We used a generic object detector form of neural network, a classifier, commonly used for example for identifying objects in the wild, such as in this case on the left, trams, people, umbrellas, and other things that you might want to isolate within a scene. So there's two parts of this. There is the identification of a thing and the localisation in space. And on the right, you can see us applying this detector to the illustration. So you can see here a view of all the pages in the NLS dataset. And we are interested in just finding the illustrations. And what we're doing here is annotating the pages with example illustrations, which we then want the object detectors to go and find on its own initiative. So which to cut along story short, it did very successfully. We successfully extracted 3,600 illustrations from the 40,000 page images. There were some speed bumps and some mis-detections, rather few actually. They did much better than we thought. Those are documented in the publication that have come out of this research. And we then open sourced the retrain model for reuse and it has been used already on other illustrations. And actually all the technology used for this collaboration is free and open sourced. Next step was to make these images browsable, which we do in this web-based demo here. I'll provide the link to that later in the chat. And you can see, for example, we are making use of the NLS's authority files to provide a kind of browsable overview of the iconography available to printers in this case in particular locations. And it's a kind of great way for kind of browsing the popular visual culture of a particular locale at some time. And you can immediately see certain stylistic patterns. So this is clearly the work of the same or all very similar block cutters working for the firm of John Robertson in Glasgow in the late 18th to the 19th century. And I don't think Parquet, Ted Cowan and all these are crude woodcuts. So these are very sophisticated design in some cases. A lot of care has been taken. Other examples are perhaps closest to the stereotypical notion of a crude chapbook woodcut. But I would argue that nonetheless, these provide powerful form of visual identity and moreover, their suit is to the cheapness of the paper that these items are printed on and also the intended popular target markets. OK, so that was illustration section in leveraging metadata for brows of illustrations. Next, we want to match and group the illustrations per block using so-called SIFT features. For this, we use a software package called Vise, a VGG image search engine. You can see it here being used to find buildings in Oxford from various angles. These were all red ellipsoids of features extracted from different images, which we then match. We can efficiently match millions of images quickly that are from multiple angles and at different scales very often. You can see this. There's a side shot and front on shot here of this things building in Oxford. And this technique, again, like our objects text are not designed for print, but turns out to work extremely well. So here we have a view of our browse and search tool. This video playing is pulling down the metadata for a publisher. I'm going to Hutchison again in Glasgow and I'm selecting an item. You can see things here. And I can bring out one of them here that has no metadata. So there's sparse metadata or only sparse metadata near the stone record. But I can lasso a woodcut and form a visual search and it very quickly returns matching illustrations. I can go in on one of them and reassure myself that I am looking at the same block. An image off the same block and not a close copy. If I like, I can go in and see the visual features that have been matched. And I will first look at the metadata for the match and I can see there's much more metadata here. So you can see how the visual search is bridging between materials that have much metadata and those that don't. There's some other features here so we can upload a picture if you want. In this tool here we have we can do clusters of matched illustrations. So here are sets that have two images and we go up to over to 24 I believe is the most in this set. It's 22 and here it was easy to see therefore there's this popular imagery which blocks are being reused over and over again in this course. And again those are the two blocks that we saw earlier which actually move around. Those move around between Edinburgh and Glasgow and close copies of them and use elsewhere. So that was matching. Nactrification mostly a piece of software called Vic is the image classification engine. And this is the sort of thing one might use to find objects or classes of imagery broader to find whether concrete things such as cats or dogs or more abstract properties such as style or colour or texture. For example, here I'm applying a small set of printers ornaments that are taken from Google. Just random printers and ornaments that are not part of the data set of the chat box data set. And I want to use these as a visual query. So I use Vic to index these images and present that to Vic as a search term effectively. So a search term made of muscle and just printers ornaments. And we find there are some things that are relevant. But if you're familiar with machine learning, you see the usual mixture of things that are absolutely what you're looking for. And things that are perhaps less closely related in some cases on what relates to tools on the second row. You can see an image of Adam and Eve leaving the Garden of Eden. This is not strictly speaking ornaments this figure here. But to some extent you can see what the system means. You can kind of get an idea of the features that are quote unquote similar across the court. As soon as you go down within the set, you see things that are decreasing the relevant as it should. I had more luck finding sailing ships. There are some things at the bottom that aren't relevant, but they have been tagged as not relevant through this yellow triangle, this disclosure triangle here. Whereas things that are more relevant have a green tick. This is not a ship either. This is a lady in a kind of triangular dress. But again, you can kind of see the features that the classifier is building up on. And you can see how this kind of rather serendipitous form of search works. Things like portraits, that work rather well. Portraits of Robert Burns in this case, but any kind of portrait at all, that was relatively straightforward. And I think research continues to provide kind of meaningful categories that users might want to look for. Or if not a specific key word, a cascry that can use this technology to browse the collection without having knowledge necessarily of what it contains and prior knowledge. Next steps are to devise a cataloging standard for blocks, their impressions and related blocks. Wood blocks do survive or can be inferred as I've shown from their impressions. If they do survive, there are no standards currently that bridge their description as might take place. For example, museums or as might be libraries. Nor are there good ways of describing interrelationships between copies of blocks, close copies of blocks. If one might want to do in order to distinguish them and to show visual learnings in what was the most far-reaching corpus of popular imagery as well as song prose and poetry available to working people in Scotland in the late 18th, 19th century. This would be a requisite to identify as to the illustrations that browse will based on this standard and to publish that improved metadata for in some cases newly identified printer and location of printing for the anonymous material. We also want on a technical level to scope the integration of technology using TRIP-LIF, which clearly are many institutions in testing it and that sort of has a TRIP-LIF standard and we're actively considering that right now. Just a few thanks. Numerous people I won't go through the National Library of Scotland as a fantastic collaboration which continues. I'd also like to thank my colleague, Abishad Dutty, who is a research software engineer in BGG in Oxford and a number of chat book scholars and collections elsewhere are very helpful. You can get these slides at this URL, which I'll put in the chat with presentation will of course be available as well. There is a live demo, the demo that I showed showing matching the metadata browse at this address. The code, as I said, is open source. The data from the NLS is of course still available from NLS and we have various other tools, demos and research agendas you can see on our website and I'm very happy to take questions at the end of this panel, but also by email later. Thank you very much. Great. Well, Giles, thank you. That was terrific and I'm sure there'll be a lot of questions on the back of that very stimulating presentation. Thank you very much. And now I think we move directly on to our second presentation, which is pre recorded, but entitled a new vision for the National Archives catalogue, revealing the richness of records, which Alex Green and Faith Lawrence from the National Archives have produced. They're going to be presenting Project Omega, the National Archives project to rejuvenate the catalogue editorial systems and to create a pan archival catalogue, which brings together data from across the National Archives. So, without, again, without further ado, we will have the presentation. Thank you. My name is Faith Lawrence, and I am a data analyst for the Catalogue Taxonomy and Data Department, and also product manager for Project Omega. I am joined for this presentation by my colleague Alex Green, who is the pan archival catalogue service owner of which Project Omega is part. The National Archives is the UK government's archives and official publisher. As such, our collections reflect a thousand odd years of British and world history. Being the record of government, our records include a lot of death, taxes and random bureaucracy. But being the record of government, there is also plenty of sex, scandal and intrigue, resulting in a diverse collection that even we at Q are still exploring the depths of. The formats are diverse as well, even on the physical side, as well as paper files and rolls of parchment. We have maps, photographs, artworks and fabric samples, wax seals, doors and even the odd mummified rodent. And increasingly, we are now acquiring born digital objects including emails, databases and archived websites. In 2019, the National Archives launched their Archives for Everyone initiative, with the goal of making the archive more inclusive, entrepreneurial and disruptive. This talk will mostly focus on the inclusive and disruptive aspects, as I discuss the current and future plans for the catalogue. For many people, their first introduction to the archive is via our online catalogue, Discovery. This is the public face of the catalogue, with records related to over 25 million assets, both physical and digital, and an additional 11 million records related to items held at other archives around the country. I will now hand you over to Alex, who will describe our future vision for the catalogue. Behind our online catalogue, there is a collection of databases and software that allow us to manage the 32 million record descriptions you can see on Discovery. When you search your browse discovery, it may seem that little changes, but in a typical year, well over half a million record descriptions are added, 5,400 are amended in response to user suggestions and many thousands become open. All of these changes are managed using an editorial system designed in 1999. Since then, we've sought a lot about records and how they are used. We know that more is learnt about them, which changes our understanding over time. Also since then, technology has linked the world through the internet, which allows us to connect our records in new ways, not only within our own catalogue, but to others. In Project Omega, we have the chance to radically rethink how we use the information we have about our records. This slide shows a variety of linked data held by other organisations worldwide. What if our records were linked to other records, to those elsewhere in our catalogue, or in other archives in the UK, or in museums, galleries and archives all over the world? What if you could see the changes made to record descriptions in the catalogue? For example, you could view the new description, enhanced by one of the catalogue projects such as the war diaries of the Second World War, alongside the old description, alongside any errors that were corrected such as a date or a spelling which might have made the record harder to find in the past. What if you could see more information about a record, such as what conservation actions have been performed on a medieval deed, what software a digital file was created in, or whether a digital record had been altered since it came into the archive? This is a vision for our new catalogue, and all of this will be possible when the project is complete. Here's our project roadmap, our plan. You can see that we have more detail for the rest of this year, and less for the following year, as we can't be as certain about what would be possible. This year we're focusing on converting the core of our catalogue, the main information you see on discovery, into linked data and validating that it's correct, designing the technical architecture to ensure that other systems at TNA can receive and send data back to the catalogue to ensure that it holds the canonical version of information about our records, and designing the new editorial interface to allow staff to create and edit the catalogue data in a way that makes their work easier, more efficient and preserves a record of those changes over time. I'll now hand you back to Faith to give more detail about how this is being achieved. The public catalogue is underpinned by the internal cataloging system and editorial process. This process is how new records are added and corrections and other improvements are made. However, behind the scenes it is made up of multiple overlapping databases, resulting in multiple potential points of failure due to the complex interactions between the different data stores, and in some cases, the lack of a clearly definitive version. Further, the editorial application has reached what is known as end of life, that is, it is no longer supported. As a result of the age of the technology, it is not possible to make any changes to improve usability or make other quality of life improvements. The challenge facing us is to create a new back-end catalogue system. As we are going behind the scenes of designing this potential system, I'm going to get a little bit technical, but I will do my best to explain as I go along to try and keep everything clear for those who are not from a technical background. With the idea of the disruptive archive in mind, we want to take the opportunity to not just replace the existing system, but to take advantage of recent technology improvements to create something much better. When I say better, this has two parts. Firstly, better for the archivists and editors who are updating and maintaining the catalogue to make the workflow easier, more user-friendly and more efficient. And secondly, and indirectly, making it better for the public users by making it easier to search, share and expose the data in the catalogue and provide better support for the new front-end system that has also been developed. One of the most exciting things about Project Omega is that we had the opportunity to start from a completely blank slate rather than trying to build on an existing system. So some of the first questions that we had to ask was what type of technology would be the best fit for what we and our data needed. The National Archives has an ambition to lead the world in reimagining archival practice for the 21st century, including new approaches to archival description. So we were given the chance to completely rethink our catalogue and we wanted to take advantage of the latest technology and archival thinking. In 2020, ICA published its new framework for archival description, Records in Context. We loved their view of records as part of a vast, dynamically interrelated network of people and objects situated in space and time. We know that records are initially created by an organisation or possibly more than one in these digital times and then subsequently other organisations inherit those records using and adding to them over time. And then, once they are transferred to the archive, the records are used again by generations of researchers with more and more connections being made between people, corporate bodies and records by these activities. We decided to redesign our data model to reflect all of these connections as well as the traditional arrangement of records according to the principles of respectophone and original order. Our new data model also gives us the opportunity to model records in a new way, one that separates out the aspects of each record which are unchanging, which we call the concept from the specifics of what is delivered to the user, which we've turned the realisation. This is especially useful for born digital records which can have multiple versions due to reductions or versions in a different format, but we wanted to extend this to all records to allow us the freedom to describe and differentiate between every type of record we have. In addition, we recognise that the descriptions we receive from the government departments that transfer the records are not the only valid descriptions. Our new data model will enable us to incorporate alternative and complementary viewpoints from other sources by separating out different descriptions for the same concept of a record. These could come from crowdsourcing projects or even be generated by artificial intelligence. Finally, we wanted to find a way to be transparent in our archival practices, so the new catalogue will include a full audit train, any changes made to the records and their metadata. Instead of deleting data, every change will be saved as a new version so that users can view how a record and its catalogue entry has changed over time. So these are the key characteristics we needed in the catalogue to achieve our ambitions, but what about the technology? It was with these factors in mind that we decided to go in the direction of graph technology. Closely associated with the idea of linked data, which has grown in popularity over the last few years as it has moved from academia to enterprise level implementation. Graph databases have some similarities to XML databases in that they encode both the data and the relationships between the data. While an XML database is constrained by a defined hierarchical structure, graph databases have a freer network structure, although to get the most out of a system, a data model, schema or ontology will define how the different types of data fit together. But what does this mean for archives? I would like to give you a visual idea of the difference between our current approach to cataloging records compared to the possibilities of our future catalogue. The current catalogue is modelled in the blue diagram on the lower left. Each series belongs to a single department and each record belongs to one series. And if that record mentions a named individual, it is only sometimes tagged as a person. More often the name is not tagged, but it is just part of the text of the description so there is no way of searching specifically for a person of that name. There is no mention of earlier descriptions of the record and you can't see differently redacted versions of a digital record that have been available in the past. The future catalogue however shows all of this. It enables us to see that a record not only belongs to the series it was placed in by the government department when it was transferred, but it also belongs to a series of records that were loaned to the British Library for an exhibition. This catalogue would show you any other records that were included in that exhibition, not only those from TNA's collection, but ideally the records loaned from other archives. Also in this new model we can see that two individuals are linked to a single record but no link exists between those two people. Their link to the record however allows us to discover another person maybe of relevance to our research. The future catalogue also shows us different versions of a description over time as catalogue enhancement projects, user of suggestions and corrections have changed it. Finally the redacted record is still available even when the full record has been opened allowing us to see what information was considered sensitive previously. This is just a flavour of what would be possible with the new catalogue. This level of detail is facilitated by our new data model because, unlike the current system but similar to FERBA functional requirements for bibliographic records, it models three levels of conceptualisation for a given record or record set. Concept, a record transferred to TNA and asserted as a new intellectual entity under within TNA's control. Description, a record will have one or more descriptions which will facilitate and document TNA's intellectual control over that entity. And finally, realisation, a record will have one or more realisations which will facilitate and document TNA's control over the actual bits, physical or digital, that make up the record. So each piece of information in the current system needs to be recorded at the correct conceptual layer or layers. The catalogue is the heart of any archive. If we make our new catalogue an editorial system correctly, it will help keep the information flowing smoothly around the different systems that make up TNA and the affraction of which are depicted on the slide. We are going through a period of change here at the archive, not just the catalogue and the editorial workflows, but a significant number of these other systems are also currently undergoing re-evaluation and redevelopment. It is an amazing opportunity for grace and improvement, but means that many of the decisions that we have to make are part of a wider conversation about how these new systems, some beginning to go into production and others only just beginning to be thought about, will work together. As you saw in our roadmap, the data is only one strand of many in this project and while we have been focusing on it over the past year in part to verify our data model and ensure we are building on a solid foundation, it is not our only concern. Deciding on the technical approach for the API, that is how the different systems will communicate with each other, requires consensus across the different projects and Omega is in the forefront of that conversation because our final product will connect to so many of the other systems around TNA. I hope that this talk has given you a glimpse into the work that we have been doing on Project Omega and some of the complexities that we have been grappling with as we start making our dream for a panarchival catalogue of reality. The link at the bottom of this slide is the Project Omega web page which contains ongoing details of the project and links to the reports and resources related to it. If you have any questions about Project Omega and the future direction of the catalogue, you can contact us at catalogprojectsatnationalarchives.gov.uk. Thank you for listening. Well, Alex and Faith, thank you very much indeed for that presentation. Really, of the moment, I think, in terms of the kind of IT challenges that a number of organisations are going through, but something which will, as you say, make a big difference to users. So thank you very much and then that takes us with questions in the final part of this session, of course. It takes us to our third presentation by Holly Smith from the University of Leeds Special Collections documenting complex histories, balancing multifaceted representation with accessible archive navigation. Holly is the project archivist working on the Women's Aid Federation of England archive within the University of Leeds Special Collections and today she's going to be introducing us to the Women's Aid archive as well as discussing some of the research that she's been undertaking as part of her TNA and RLUK professional fellowship. So again, there's a recorded presentation and after this we will move into questions. So can we play the presentation please? Hello, my name is Holly Smith. I have brown hair, a slightly unruly fringe and I'm currently wearing a navy and white striped top. I'm here today as the project archivist for the Women's Aid Federation of England archive which is currently being catalogued as part of a welcome trust funded project. As well as this I'm currently taking part in a professional fellowship in partnership with the National Archives and Research Libraries UK. For this I chose to research inclusive cataloging practice namely the balancing of representation and accessibility, which I'll talk into you about today. So documenting complex layered descriptions that authentically represent the voices in an archive is quite the holy grail for archive projects at the moment. Equally we can't forget the importance of ensuring simplicity with easy navigation and standardised finding aids and access points. So it's these two sides that seem slightly at odds with each other. But nonetheless that's what I've decided to explore and what we'll hopefully be unpicking a little bit today. But before I start I do have a couple of admin points. Firstly I'd like to flag that Women's Aid is a domestic abuse charity and our archive impacts with various sensitive issues around this topic. I won't be overtly mentioning the details of domestic abuse during this presentation but I will be showing some images from the archive as well as bringing up Women's Aid and the services they provide. Therefore I will be showing some details at the end for anyone that may feel affected by anything they've seen. The second caveat before I begin is that this presentation mainly revolves around my still ongoing fellowship. I'm currently about to hit the halfway mark and it's so far it's been a lot of me asking quite broad questions around inclusivity, representation and access. So I'm afraid there's nothing too concrete and groundbreaking for me to share at this point. So what this is going to be is a bit more of a discussion which I'm hoping isn't too underwhelming for you all. I hope it will inspire you to maybe go back to your archives and have similar discussions with your teams there. So to kick things off I want to give you a bit of background about the Women's Aid Federation of England. So Women's Aid is an acclaimed domestic abuse charity that works as the national coordinating body for local domestic abuse services. It provides information, training and resources as well as lobbying and campaigning for rights and legislative change. Women's Aid has been at the far front of the Refuge movement for almost half a century celebrating its 50th anniversary in 2024. So because of this the Women's Aid archive goes back to around 1974 when the organisation was formally established. They emerged out of the activism surrounding the Women's Liberation movement and were back then known as the National Women's Aid Federation WAF, as you can see from some of the things on the slide now. So they were initially seen as quite radical activists raising awareness of quite taboo subjects such as domestic abuse and gendered violence. But our archive tells an incredible narrative of perseverance and progress. In the span of just 20 years Women's Aid transforms from being an organisation met with skepticism and aggression to one that was highly respected for its original research and expertise. They even held their 30th birthday celebrations in 10 Downing Street. So as you can kind of start to see from this very brief introduction the Women's Aid archives more than a simple label of women's history. It's complex and it spans the histories of activism, law, feminism, the Refuge movement and within Women's Aid itself in the archive we can see special interest groups for women who identify as black, lesbian, disabled, working class. The question is how do we kind of document this complexity in our catalogs and how do we make it discoverable for researchers? Truthfully I'm still trying to figure that out but today I'm hoping to talk you through three of the main discussions I've been having. So first off will be the importance of community communication. Secondly the idea of recording user generated content and thirdly the services we provide with current cataloging practices. So first I'm going to talk about community communication and by this I mean how we can build up relationships with the groups represented in our archives. So for us the main contender here is Women's Aid themselves. We want the Women's Aid archive to authentically represent the functionality of their organisation. So I'm made sure to have Women's Aid representatives provide feedback on things like our collection structure. I also use the expertise of staff members to double check the names of key groups and individuals that cropped up in the records and also ask them a lot of annoying questions like the administrative difference between the Women's Aid Council and the Women's Aid Council of Trustees. It's just small little interventions like that and often it is just simply confirming what you found already in the archive. But in doing so we ensure a more accurate representation of Women's Aid voice. I've also been able to use this communication with Women's Aid to discuss the nuances of representation through appropriate language. Our archive is not without its first share of historic terminology. One of the main examples being the term battered wives, which you can see here on the left of the screen. It was used by Women's Aid themselves throughout the 70s and the start of the 80s. And we do want to make sure that this is documented as part of their history but we also want to back it up with the current terminology. So through discussions with Women's Aid we learnt that victim survivor is the most widely accepted term at the moment. It's the one they currently use and it's the one that women from refuges are most likely to feel represented by. The project has also benefited with close communication with Feminist Archive North whose collections are also housed in the Uni of Leeds Special Collections. Feminist Archive North, also known as FAN, is entirely run by volunteers and a lot of whom were involved themselves in the activism of the Women's Liberation Movement. This makes FAN a really great group for engaging with us, as for a lot of them this is their story. We recently had a tour for them in their collection. We got out some stuff we thought they'd enjoy and the main aim was to introduce them to the Women's Aid material. We actually ended up getting some grated bits of infer from their responses to the records. For example, if you see on the screen here there you can't be a woman poster. This image of the kind of grandma cartoon comes up quite a lot in early ephemera and FAN were able to say that they used to know the artist, Annie Smith who was known as Vega or VEGA at the time. It's that kind of feedback that provides a level of personal detail and presents the level of infer that you can't get from a Google search. I think it highlights how engagement with relevant groups can really enrich our understanding and documentation of a collection. This kind of leads me quite nicely onto the topic of user-generated content which I've been really intrigued by recently and I know it's a bit of a debate in the archive world. The question is how can we record and preserve the information gathered from these engagement events like the one I had with FAN? I think this is kind of intrinsically linked to the idea of representation in collections and also the accessibility. The starting point for this debate in my head was the Collections Trust Revisiting Collections programme. For those of you who haven't heard of it before, the Revisiting Collections initiative encourages institutions to open up their collections for engagement with community groups with the aim of collecting and preserving the information users might have. I think this is amazing in theory. It provides that link between engagement and the catalogue. But the Revisiting Collections initiative isn't without its flaws. So the Collections Trust themselves have shared evidence in a 2013 report that despite encouraging meaningful interaction, inspiring community knowledge, et cetera, the results of the sessions were actually rarely preserved. This can partly come down to the fact that the Revisiting Collections initiative isn't without its flaws. So should we, as archivists, be recording this user-generated content? It's kind of a non-expert metadata in inverted commas. It isn't exactly part of ISADG. It's definitely at odds with the traditional idea of the archivist as a gatekeeper. So it really is a bit of a debate of whether we should be recording it in the first place. But personally, I think these views are the physicalities of it all. So there isn't really anywhere to star this content in our current databases. There's a quote in the 2013 report that puts it quite nicely. Revisiting Collections is trying to put stuff in a shoebox that doesn't quite fit. So I guess a good way of explaining this is a case study for my own special collections here at Leeds. So we house the Gypsy Travel Aroma collections, which is a great documentation of nomadic culture. It comes from an academic perspective very much from an eye of an observer. So as part of a Revisiting Collections initiative the project part with Leeds Gypsy and Traveler Exchange to hear the perspectives of people from within these communities. So this created this amazing added layer of description for the archives but the thing is there wasn't really a place to star it in the database. This was worked around at the time by putting on the Yann website. So for those who don't know, Yann is kind of a platform that allows people to collect material and write up blog-like descriptions. But the thing is that Yann isn't supported by our organisation. It doesn't link to our catalogue in any way and there's no confirmation of long-term preservation. This user-generated content therefore ends up being not discoverable and not accessible to researchers, which feels like it gives a disservice both to them and to the communities that provided it to us. So for the Women's Aid archive, I want to make sure that any matter to be gathered from Women's Aid fan or any other engagement is appropriately start. But it's not just something that our databases are currently set up to do. I think in the meantime it's an archive by archive basis. We need to look at our databases at fields that don't have a role already, can be represented on the online catalogue and perhaps searchable through researchers online. It's one to think about really and it's one that we need to start thinking of solutions for because more and more archives are producing these similar hidden history projects. We should start thinking beforehand about how we can preserve the user-generated content that might come from them, as I think it has a real potential to open up collections to be more representative and to be more accessible to people by being inclusive to lots of different audiences. So following on from that quite theoretical discussion around current database limitations, it feels like a good time to discuss the things we actually can do now to make our collections both representative and accessible. So subject classification seemed to me to be the first part of call for creating more access points for the Women's Aid archive. Kind of a way of flagging the key voices and key topics in the collection. I was taken aback by how underused it was both in my own workplace and the wider archives I talked to with some archivists admitting they don't assign subjects as part of their cataloging workflow. So a recent project we've undertaken at special collections is the LGBTQ plus archive internship, which looks at how best to surface LGBTQ plus stories in our university archive collection. For this subject classification was a big theme, so they were particularly grappling with the tricky issue of how different people can be identified with different terms and trying to understand what subjects to pick out. So to work through this they carried out focus groups that brought the community themselves into the decision making process ensuring terms were more relevant appropriate and sensitively applied. So setting out our approach to subject classification, therefore made LGBTQ plus voices in our archive way more discoverable. So I think another obvious call is finding aids using index lists in particular to supplement catalogue records can be a simply effective way of improving accessibility. So our women's ed projects currently have two indexing volunteer projects one looking at the VHS collection and one looking at the newspaper clipping collection. So I can already tell these are going to be amazing outcomes. They're really opening up areas of the archive that would otherwise have been incredibly overwhelming to approach single handedly. So that's kind of subject indexing and finding aids and another one that I just wanted to mention is collections guides. They really provide a contextual information and some hints and tips on how to start exploring collections. So for us the 50th anniversary of women's ed is coming up in 2024 and we're aiming to produce a guide to help users dive into this last half a century of history. So this is going to include helpful information such as highlighting key people, organisations and dates as well as listing relevant legislation which is quite useful the women's ed history and explaining the inside number of acronyms that women's ed seem to use over the years. So I guess with all of these very simplistic things that you can do with catalogs at the moment it's all about just putting yourself in the position of the user. So what knowledge would help them access these histories? What do they need to know about the collection beforehand and what access points can we provide for them? It's all super simple stuff but it's maybe worth just taking a step back sometimes and thinking about the different ways we can open up our catalogs. So hopefully that outlines some of the thoughts that have been whizzing around my head rather mannically since I started my fellowship back in February. So until we start playing around with our databases and shaking up how we present our catalogs some of these thoughts can be quite hypothetical. But then on the other hand a lot of them are also incredibly simple and I am very aware I've basically just told a roomful of archive professionals how to use subject classifications and finding aids. But having these discussions however simple about how we can improve our cataloging approaches to take our archives as representative, accessible and inclusive as possible is so important and should be something that is ongoing in our work. So in the second half of my TNA RLUK fellowship I hope to continue working with these principles in mind. Time will tell how well the case study of the Women's Aid Archive as an inclusive catalog will turn out but please do stay tuned and thank you very much for listening to me today. Great, well Holly thank you very much for an interesting presentation and I'm sure to excite some questions. So that is the last of our three presentations and so I'm now going to invite all the panellists to turn on their cameras and we will move into the question session. And I think the as I said beginning questions should come through the chat function and they're open. But perhaps while people are thinking I could direct a question toward Giles really interested to see this work and the power of the AI search facility and I'm just wondering whether this has a applicability in, for example, early printing and the identification of sorts particular sorts and the degradation of sorts as they use by printers to put ink on the paper and so on. So probably someone's already doing this it's probably already coming down the track but I'd be interested to hear. People are already doing this and this is very exciting. So in particular some research that Carnegie Mellon University are really great in collaboration with Chris Warren who is an early modern literature specialist who is a scientist and librarians and book historians and digital humanists on a project called Printing Probability that has already identified some texts whose printer and publisher publication has been an important text of works by Thomas Hobbs John Milton this has been done manually for something extremely painstakingly for some time but it is at scale over hundreds of millions of pieces of type all larger things so the technology I show does has this application for woodcuts for ornaments pieces of type, other bibliographical features I mean we're beginning to think in terms of there being a whole new sub-discipline of print studies bibliography but yeah I think there's going to be just a stream of data coming out as more and more materials digitised and tools like this become more and more mainstream. Well most interesting I have a personal interest in early music printing as a musicologist and there are some really bespoke publications from the pre-1550 era which you can't imagine that there will be a vast market for them so where they got their type from and is an enduring question that I'll just leave it lying there and hope that well if someone can come to my rescue that would be great but Charles that's really really interesting thank you so looking then to the Q&A function there's nothing appearing there yet but there is a question in the chat which is to Faith and Alex from Karen Sayers who asks whether the changes to discovery will affect the way that external repositories can contribute their records and also whether additional data will, sorry about changes to the records of external repositories sorry will that data become available that is a very interesting question so there's currently a project which is looking at changing the front end the public catalogue which is a separate project so we haven't 100% worked out how the changes to the back end of the catalogue are going to be reflected in the front end of the catalogue and there's also the sort of two slides of it in terms of the we can only have the data that we're given so it might well be that we know how we're going to redo the underlying data for our catalogue in the future it would be very interesting to take in the data where we're getting it from other people's catalogs and that's going through if when we get changes those changes will probably be held as part of the data model if we've converted it over so you will see those changes in the audit trail but we'll only get it when those are sent through so we won't be creating any new information that we're not being sent through from external archives but ideally any information we have can be passed on to the reader because it's the researcher and the reader and the person in the public who are looking at it who find that information useful hopefully so you know there is going to be some discussions about maybe some of the levels of detail in terms of like are we saying exactly who made the change or is that a more generic thing just a change was made because we don't like to name names in the civil service except if we're trying to win an election or something so there are a few discussions still to be had about exactly how that will play out and it should be said those discussions are a bit in the future so it's not going to be an immediate thing because obviously we need to work with our own data first before we start thinking about other people's data I don't know hopefully that answered it Alex I don't know if there's anything you want to add to that Early I'm looking at your first question Karen will the changes affect the way that the external repositories contribute their records it's the same answer we don't know if it does then we will be very clear about what those changes are and give you any help that you would need but yeah it may be that there isn't any change and we try and keep it as consistent as possible but again it's a lay off I'm going to say it's fair to say about the design process we were very aware that we are getting data from in our case parts of the government for whom there is already a certain amount of overhead in giving us that data and the chances of being able to change what they're giving us or getting them to give us more and more detail is probably fairly low so the idea is to it will be to minimise extra work for any external data suppliers definitely Andrew you're muted I'm not sure I did that right so this is a long term project the omega project and it's really just where do you see it ending is there an end point to it I realised when I started I should have said for anybody who doesn't have visuals I have light brown hair which is slicked back in a bun because it's really hot it has glasses and an alien landscape with an alien creature behind me is my background which is giving me a very nice halo effect so I'm not going to take that as an answer but yes I just realised we'd been asked to give the descriptions and I'd forgotten when I started speaking so I think there are well there is a medium end point and then a fuzzy fuzzy end point so the first sort of end point will be when we switch over to using the new system so the current system as we've mentioned is on its last legs we don't really know how long it will last so the priority is to be able to switch over to the new system and do the current work that we are currently doing with the new system hopefully in a way that is easier for our catalodders and editors and make their life a bit easier but then we've got a whole long to-do list of improvements we would like to make to the system and other data that we'd like to bring in so we're starting with the born physical data because that's what the current editorial system uses but we want to bring in the born digital the digitised when we started looking we found over 10 different catalogs around and I used to turn very loosely some of them are spreadsheets some of them are probably notes on post-its so we'd very much like to amalgamate all that data into the single catalog that's what will make it the pan archival catalog so that will keep us busy for years but it will be a gradual improvement and hopefully most of that won't necessarily be as visible from the front end except that more data will hopefully be made available as it's brought together more for a few bits are available but they're coming in different pathways so this will be amalgamating it and making it easier on the back end I mean I think the end point will probably be I mean the current system lasted 20 years in 20 years we might decide that okay we need to build a new system rather than keep improving this one but you know the catalogue will never go away so there is going to be an end point when we're using it but then we will have continual and iterative improvement from that point on until we I don't know switch over to quantum computers or upload ourselves to the internet or whatever the next stages of digital interaction yeah and just as a follow up I hope you won't think this is unfair but are there any plans to link images into the catalogue yes short answer so obviously most of those are part of the digital data or even when they're coming in and born physical they've now been digitised so that's sort of we've got to do the born born physical stuff first and then the digitised and the digital are the next things on the horizon so yes it's on our to do list it's even near the top brilliant excellent that's very good to hear okay right well we're accumulating some questions here so so one for Holly which is with the sensitivity of your archive subject area and the possibility of quite a motive archive access such as dealing with sensitive terminology have you got processes in place for ensuring staff and researcher well-being yes this is a really interesting topic when it comes to a women's archive because obviously it's got a lot of sensitive potentially traumatic material involved and I think particularly when it came to the volunteer projects that I mentioned in the presentation we kind of realised there was an additional layer of a duty care to the volunteers you couldn't just kind of have them in like a normal archive project and set them working with this kind of content so there was a bit of research there's a trauma-informed community of practice group I'm sure a few people are aware of and we also did a bit of research into just general trauma-informed practice which has been really useful so we've created actually a volunteer handbook which we give to the volunteers at the start of their projects which kind of is just very transparent about what material they might come across so we kind of mentioned the historic language the themes that might come up kind of just diminishing that element of surprise a little bit and point them towards resources that might help them and then we also created a volunteer management checklist which we kind of follow as staff members during and after a volunteer project but it's a super interesting question because this is such an overly potentially distressing collection but I think you can use it with all archives as well I think every archivist has come across something that they've been a bit taken aback by in a collection for one reason or another so it's an interesting debate I think people are talking about it now Indeed and several questions on your presentation in fact one from Michelle Williams will there be plans for ongoing discussions with women's aid and other communities represented within the records and as terms and contexts might change over time Yeah so I think that's a big thing with all the historic terminology debate you want to use terms that are kind of valid now so like I said in the presentation we're going to use victim survivor because that's what women's aid use but this actually came up quite a bit in the LGBTQ plus intern project I also mentioned because they used focus groups to kind of figure out what terminology people represent themselves by nowadays but we're very aware that in 10 years time if not 5 years time that might have completely changed and kind of terminology we use now might be offensive or just not quite PC anymore so I think we're actually going to put a policy in place for you, the terminology we use every 10 years or so which I think is super interesting and in terms of the women's aid collection and talking to communities I'm kind of currently having a bit of debate in my own head about whether talking to actual women that use services is a legitimate an ethical way of kind of doing representation in archives because it's their story in the collection as well of victim survivors themselves but the ethics of approaching them and kind of making them potentially relive quite distressing times in their lives is something to think about but I think even just speaking to women's aid our feminist archive now going out and talking to people that have an expertise more than your own and kind of accepting that that's the case means that your archive is going to be documented in a way more kind of inclusive and representative where if that answered the question I think I rambled a little bit No no, I thought there was a good answer and if I may just a quick follow up on terminology I mean this isn't new in a sense because past generations have adjusted terminology to their own worldview and doubtless there will be further shifts in the future as you say, how do you ensure that the content of the record doesn't sink beneath layers of filtering You mean with kind of changing terminology within records for your time? I mean I think there's a difference between because we did a bit of work about kind of editing past records that might have historic terminology and I think it's important to stress that we're not changing any of the terminology that's from the actual record because that preserves what was said at the time and it's a really important historic document and it's important that even if it might be offensive our distressing that we maintain is what we've said at the time but editing the archivist's vice and how it's described is kind of a different way of doing it if it's not in the actual records that's a different matter but even when you're editing perhaps the archivist's comment I think it's important to preserve that past description and so we've been playing around with that because it's very similar to the user generated content debate where the only way to put in our databases legacy descriptions but it's important that we maintain them so we've kind of done a bit of a thing where we've put it in narratives in our emu database and I think that's a way of protecting it from slipping through the cracks, you're maintaining it the most recent and the best why it is in the catalogue but it's also there the past descriptions if researchers need to go back that's great, thank you and then so this is all you I'm afraid Holly so it's a number of questions here so this is from Rezitsa and the question is you mentioned the use of contemporary terminology for example back in wives equals victims of either will you publish a glossary of terms for others to use I guess this kind of weighs into the collections guide section so obviously it's fair enough using all these different terms and represent historic terms and current terms but having it in a collections guide that explains it to researchers I think is quite important because it kind of just provides that context and helps them search in a way that's going to discover records the best so I think a glossary would be pretty ideal and that is the girl because it's a bit of a weird situation where almost if you search with quite an offensive term you're going to bring up more records than if you use the pc modern term just because that's the one that's in the most collections so maybe explaining that although it's not nice but if you search for battered wives in our collections database you're going to bring up a significant amount of records and I think explaining that in a glossary if not in just a general contextual description in a collections guide might be a good way of facilitating access that's great thank you so moving back to Giles if I may Giles could you say a bit about specific challenges you faced in the preparation and processing of the data on this specific project actually it wasn't such a problem because the National Library of Scotland makes their data available in extremely well documented forms with lots of metadata both the data foundry data and Sarah aims on the call which I'm grateful to her for this collaboration and the authorities data we got from behind the scenes that was very good data I feel there's a more general point though about the necessity and the invisible labour if you like required in so-called data cleanup operations that don't really kind of credit that labour sufficiently and I think it's it's a good question but the question brings out the kind of difficulties many data sets that aren't often accounted for sufficiently project planning and certainly my group is a whole often impressed on collaborators at this phase in which it's you know one really kind of has to think carefully about what the purpose of the data is as well as what it really represents and therefore its provenance and all those sorts of issues that is invariably much more difficult and takes longer than people think the other side of this I think is that having used such if you're lucky enough to have data of the quality the NLS produce there's an issue about how you represent what is in effect decades of curatorial labour and collections care in the actions description and if as we are doing one abstract that that scholarship and that labour in form of machine learning model I think it's an interesting question for ourselves and for the sector as a whole about how we have credit and credit that work. Yeah I suppose following on a bit from that I was thinking as you were speaking whether there is applicability of this technology and I'm sure there is again it may be coming down the track of machine reading to large manuscript corpora beyond for example what the French he money's project has done was now been around for a while so there are corpora like that and then some vast runs of documents including many from the middle ages that are too big to edit or print and I don't mean too big to do it physically but just too voluminous too many pages to devote that and is there a machine reading solution to this? There is or there will be or there certainly kind of intense effort going on things like transcribes and the work of read consortium kind of various other projects that really can only approach this over long term you need vast amounts of data and vast amounts of annotation so again we're talking about kind of clean up and labour and judgement in processing that and this has moved on a lot in those first remember seeing demos of transcribes so they're purely a technical interest but I think they're now just a genuinely useful tool and so it's on the alternatives out there I'm not myself a manuscript scholar and there are different sort of challenges we've done things like to take illustrations of manuscripts and to take marginalia on the sides of print of books but absolutely I think this is possible and I think as well once one gets beyond the notion of reading a manuscript the sole kind of source of truth or information in the document there's really interesting work on and these are unpaged and diplomatics and scribal identification and all the rest okay so I think again Giles this one's for you and Melanie asks could you say more about how this work might enable wider access to materials and their significance for audience development yeah so some chapbooks are paradoxical in a sense in that they are or we think they are popular culture for some value of popular for audiences that very often we don't know very much about we don't know how good literacy statistics are for this these periods possibly this material indicates wider literacy than the figures that we have indicate so although they're popular in historical terms it's not well known material these days the term is termed chapbook is obscure they're difficult things to make available for one reason or another even digitally descriptions aren't always great the NLS is an exception I think so I think there is a lot of opportunity for engagement particularly through communities own printing histories because the history of chapbook is a history of the spread of the printing industry across UK among other places from its emergence until the age of machine press so a printer sets up in a town the first thing they do is print a popular well known chapbook or it might be a broadside ballot because those are known commodities if you like they will sell and they are commodities that are part of the bread and butter of the industry so I think that story about what was produced in a locality is one that access these materials can bring out I think some of the methods that are and this is digitisation in general I suppose but I think some of the methods that are shown provide access points to the collections which is quite interesting so visual browsing where you may not know the terms that one might search upon but if you can simply search by visual themes or you have access to tech because of improvements in OCR and there is this possibilities for increased discovery density Great Thanks, thank you very much indeed and then I think probably this is likely I think to be the last question because we are up against a hard stop at 315 but it's from Matilda Seabreck and I think it's for Alex and Faith although actually it could be for any of our presenters so a bit of a vague question but as someone just starting out in a big digitisation project of archaeological artefacts what would be your top tip to ensure that the digital database can be more easily searched in the different ways that you've discussed today so I'll throw it open to all of our speakers but perhaps Alex and Faith since you've been speaking about a catalogue Alex do you want to? No you start I think you're more aware of the data impact So I guess my top tip would be thinking about how you want to use the data because you're always going to have to weigh up how much metadata you have the time and resources to capture about your artefacts or records and so you're going to have to make decisions and you probably want to have depending what technology you've got you're going to have some restricted vocabularies in terms of these are the ways you're going to describe the things and so what you make available through that is going to determine what can be searched on so I guess thinking about your users and your user needs and what they're likely to be looking up and the problem is for that you probably won't necessarily know you might think you have an idea that's going to be different so I guess just being a fae with the sort of fact that you're probably not going to capture all the stuff that they want first go out but then have to review and revise and think about it as things going forward but yeah just having that think about the decisions you make on the metadata and what you make available through that those are going to determine the type of searches that your users will be able to do so to think about the type of searches that you want to support and you think your users need so I think that's probably mostly unhelpful but that's my top tip I'd just say if anybody wants to discuss it further Alex and I are on the TNA snorl tomorrow lunchtime so along with one of my one of our other colleagues who is an actual archivist cataloger so if anybody wants to come along and have a chat with us please do and I throw it over to the others for their top tips Great intervention, great answer so anything from you Giles or you Holly I'll really improve on faith's answer about thinking about users needs but also having a certain portion about one's ability to know what those are yeah thank you Holly I just have a very barring answer of just making sure you nail the basic content of if you've got I don't know the basic description and the dates and all the barring stuff that's mentioned in my presentation like subjects and authority files if you've got the basic information there you're doing kind of the most that you can for researchers anything extras pretty good, it's a very barring answer it's basically just catalog it but I think that at the end of the day that's the best you can do sometimes