 I want to tell you two things in this paper. One is about the collection of the Scottish session papers and the other is about the technical challenges that we're grappling with at Edinburgh. So this starts with stuff. This is a cross-section of a typical section of session papers. About ten years ago I came across a run of about 600 volumes of these things, all uncataloged. They intrigued me and that's led to this obsession to get them all available, which I'll tell you about now. So the session papers have been described as the most valuable unstudied source for Scottish history in existence. And so it's worth saying a little bit about Scotland and its legal system and why that's different. So Scotland has a separate legal system to that of England, which was maintained after the 1707 Union. And it does mean that the materials we have are very different in many ways. The Supreme Court in Scotland is the Court of Session, which meets in Edinburgh. Up to the mid-19th century every paper which came before the Court had to be printed in a small number of copies for the lawyers. They contain rich content relevant to all areas of life, social history, linguistics, even some poetry. It's also instrumental for understanding print culture in Enlightenment Edinburgh. James Boswell, who was the biographer and advocate, he said, as is a court of papers we are never seriously engaged but when we write we may be compared to the Highlanders in 1745. Our pleading is like they're firing their musketry which did little execution. We do not fall heartily till our work till we take to our pens as they do their broadswords. So here you have some of the facts. I won't read all of these out but basically we have a big challenge. A quarter of a million bibliographical items, no mark records, no digital suricates, massive potential for all areas of study. Three surviving collections, this is the Advocates Library in Edinburgh which is the largest part. So we very quickly started discussions with the Advocates Library as one of the main holding bodies. It was obvious to us that this needed to be a partnership project between the three holding institutions. The second collection is the Signet Library coming up here which is the other professional institution that has session papers. Each of these libraries has multiple series of session papers on the location within and between libraries. The scale and complexity rules out traditional retrospective cataloging. Without metadata and index or even shelf lists, how were we going to start? So we felt with this project we should really test what digitisation could do. We already had a large thesis digitisation projects on the go. All three partner libraries were keen to see what we could do in-house rather than going down the commercial route. And we all wanted to work together to produce outputs that were free for everyone to use. So last year we ran a pilot project which captured over 13,000 images from the three collections. Here you can see some of the interesting formats that we're dealing with. The pilot showed that digitisation was feasible with the right investment in equipment and binding repair and that the variety of documents meant that we needed high quality photography as well as cheaper book scanning. We had then had some Eureka moments. The first Eureka moment was that a digitised first approach did seem feasible. The pilot included technical development to look at the formulaic structure of the documents, the options for text recognition and machine learning to see if we could use the digital images as the source for the lacking descriptive metadata via optical character recognition. The second Eureka moment was around the international image interoperability framework and the realisation that we could serve up these high quality images from across all the holding institutions allowing us to virtually reunify all the surviving documents telling the story of Scotland's session cases. However, we needed more content to test so we started a second pilot which is beginning this month and this will digitise a further 300 volumes including volumes in poor condition from across all the collections with dedicated conservation support. It will also look at tools to automatically harvest metadata from the scanned text so that we can tag the documents with names with minimal human intervention. I don't think there's been enough incomprehensible diagrams in this conference so far. So to address this, this is one of our early attempts to represent the work flows. So we do need to join up the different image creation streams, the OCR processing and the IIIF presentation and we'll be doing more work on this. The third Eureka moment was when we realised that despite these being library collections, actually an archival approach might work best. So this is a screenshot from the archives catalogue of the University of Virginia Law School which also has a small collection of session papers and they have been creating archival metadata starting with records for the actual legal cases. So we are now looking at putting records for all the reported session cases in a shared instance of archived space as hooks on which we can hang the records for the case documents and all the relevant digital images. However, we're still going to need to extract metadata from the OCR text as we can see here. So these are screenshots showing some of the approaches that we're taking to these documents to improve the OCR including reduction of noise and identifying standard features of a layout. The inversion of black and white here you can see is to ease the visual understanding of the process. The masking helps the machine to understand where the text is on the page allowing for a crop that makes the work of the OCR engine easier. At the moment we're focusing on training the OCR engine which in this case is Tesseract once we've taken the steps described above to reduce noise we still need to deal with all the issues of 18th century print. Variant spelling, unfamiliar fonts, damaged type, thin paper. The aim is obviously to create an accurate text transcription to underlie a digital facsimile. One of the tools we've developed is this GeoParser which picks out place names in which the session papers very rich. We're really excited in the potential of this project to link the digitised OCR contact to other datasets. Maps, statistical records, records of births and deaths and all sorts of other things. So we think this project is really exciting it's going to require significant resource but there are huge benefits for other collections and for collaborative digital scholarship. Developing workflows and tools for text mining and automated indexing will open up some really interesting opportunities for archives and libraries to work together. So your interest is very much appreciated in any comment, very welcome. Thank you.