 So I hope you can all see that fine. And yeah, so I'm going to teach you today as I was introduced for the Q science collections digitization project. So digitizing the collections is one of the top priorities in our new science strategy, which was launched last year. And the strategy sets out how we will focus our science to fulfill our mission. And that is to understand and protect plants and fungi for the well being of people and the future of all life on earth. And this is ambitious plan to help stop biodiversity loss and develop sustainable nature based solutions to some of our biggest global challenges. So as we all know we have a lot of challenges facing us today and Q really wants to play a role in trying to solve some of these. So our collections is one of our strengths that enables us to undertake this mission. And so first of all just wanted to give you a quick overview of our collections. I'm not sure how many of you are aware of what QQ has. And we're much more than the living collections in the Botanic Garden. The largest collection is the herbarium where we have around we don't really know until we digitize it so that's one of the advantages when we digitize is we have seven million preserved vascular plant specimens that are arranged by family genus and species for study. And this collection is representative of global plant diversity. It contains around 95% of vascular plant genera and the herbarium which was founded in 1852, but many of its subsequently donated collections actually contain also earlier material. And then we also have a fungarium, and we hold over 1.25 million dried fungi specimens and Q's fungarium collection is the largest and one of the oldest and most scientifically important in the world. It was founded in 1879. And in 2007, the International Mycology Institute fungarium, which is owned by Cab international was housed alongside the Q collection. So this adds very substantially to our overall holdings and gives us over the 1.25 million. And then we have spirit collection so we have plant specimens preserved in fluid and stored in glass jars. You know, used when drying, pressing and mounting on herbarium sheet is unsuitable. So when you want to particularly observe three dimensional arrangements of flower plants or fruits, and we have like a lot of orchids and palms in spirit. And then many of you have probably heard about our seed collection at the Millennium Seed Bank. And this is based in Wakers place. And it's the most diverse wild plant species genetic resource in the world with over 2.4 billion seeds representing almost 40,000 different species. And Q seeds are collected. The seeds are collected through global partnerships and fields research as part of the Millennium Seed Bank partnership networks to Q very much works with many partners around the world. And then we have the microscope slide collection, and that holds around 150,000 slides, which includes leaf surfaces sections pollen wood roots and chromosomes and the economic botany collection. So that's over 100,000 objects, and that's sort of any sort of objects that are derived from plants and fungi that are used by man. And these will include things like bath cloth, baskets, botanical jewelry, Chinese traditional medicine. If you ever get a chance to go and have a look around the collection, it's so fascinating to see all the different things that are there. And then we also have a tissue bank and DNA collection, and this contains approximately 60,000 samples representing nearly all plant families and over half the genre of flowering plants. And the composition of this generally reflects studies carried out at Q over the last 25 to 30 years. And then we shouldn't get our library out and archives collection. So the libraries get again is one of the largest collections of botanical information in the world. We have 200,000 botanical prints and drawings, and our archives contain the official records of the Royal Botanic Gardens Q. So we have personal papers of many botanist gardeners and other individuals, including Charles Darwin, Joseph Hooker and Marian North. So collections are the answer to many scientific questions. So how particularly are specimens used? Well they act like a reference library that you can return to. So each specimen is an is an immense source of information that can tell us what plants look like, where are they found, what environmental niche they occupy, when they flower produce seed. They provide the basis for modeling distribution over time and help determine which species are threatened by extinction. And that is a source of material for anatomical biochemical and phylogenetic analysis, so how species are related to each other and evolved over time. And in the collection we have like wild crop relatives so they're potentially new crops and also new medicines. So our science strategy has five different priorities, but priority three is we've called their digital revolution, and it has three initiatives, which is the first one is digitizing the collection. They really provide baseline data that could be used in other priorities. So we are prioritizing the herbarium and fungarium collections for digitization. But we also want to make links between the different collections that we've got. So we need to integrate the specimen data with all the different collections and also data. We want to share this data about the plants and the fungi so about the world checklist data, which is like a global consensus of all known vascular plant species, and associated information on their distribution. And the last part is all about sharing this data as widely as possible. So we want to we don't want to say obviously just this data is used by Q we want to use as many researchers around the world as possible. So digitization. Well, and it's a purest form it's like conversing physical to digital information, but this is what we mean at Q by what we mean by digitization. So here is a typical herbarium sheets. I hope many of you will be familiar with it. And the first step is that we add a unique identifier so a barcode. We capture any folder level information, and then you would capture capture one or more high resolution images, and then capture the specimen label information into a database. So key information on a specimen include what species it is things like who collected it when it was collected and where it was collected, and of newly collected specimens would have GPS coordinates. So, even though we're now barking on a large digitization project. So Q has been doing some digitization for a long time. And when we really started, which was same time as I arrived as we mentioned before we were a digitization officer on what's called the African plants initiative project, which was funded by the Andrew Mellon Foundation. And in 2004 did not really seem possible to be able to digitize the vast collections that we had, and the technology really really wasn't there at the time. And so we prioritized type specimens. So a type specimen is a particular specimen. In some cases a group of specimens of an organism to which the scientific name of that organism is formally published. So the specimens are cited when a new species is published. So it's really a specimen selected to serve as a reference point when a plant species is first named. So they are the most important specimens plant axomic work. So the project started with Africa and then expanded because it was successful expanded to Latin America and the rest of the world. And it was very large project with around 300 partner institutions in around 75 countries, and Q also was involved in sending with training and digitization and sending imaging equipment around to other partners. So all of our type specimens were digitized. And that was around 346,000 specimens, although they'll be, although part of this mass digitization project will be finding new types in the collection that we've not yet identified as types. And then new species have been described all the time so we're digitizing new types as they as they designated. So one of the biggest digitization products that we have was called reflora. So this was an initiative, particularly to increase access to and use of Brazilian plant diversity information. And this was deposited, you know that are deposited in institutions within and outside of Brazil. And this was involved in digitizing our specimens so they were barcoded folder level information captured an image Q. And then at this time the label data, the label images were sent to Rio to generic gardens, where their staff did the transcription. So this was making use of local expertise on localities. The additional aspect of this project was exchanging expertise. So over 110 Brazilian based scientists, including many postgraduate students made study visits to Q within the framework of this project. So they have participated almost all aspects of the project, they selected material for digitization, they provided authoritative identification so they were looking at the, at the material material and naming it. So they had the expertise in digitization approaches, and they collaborated with Q scientists. They discovered species unknown to science and well as recording dozens of species not previously known to occur in Brazil. And then this images and data are available on Q's website, and also the reflora virtual herbarium. So Q had over 264,000 specimens digitized that way. And so we've had a number of smaller scale projects over the years. So generally therefore particular, you know, excited like scientific purposes. So for example, we have digitized all our collections of Dalbergia, which are rosewoods, terra carpas and a number of other legume species. And most people think of the most widely traded illegal wildlife product in the world. You might probably think of ivory or rhino horn, but it's actually in fact rosewoods. So this project was completed in collaboration with the Natural History Museum London and Edinburgh. So all the information from the collections were available from research. And so the data is already fed into conservation assessments and a site is a checklist so the Convention on international trading in dangerous species of wild fauna and flora, they will be published soon so all this, the herbarium data was, has sort of was used after this project. And finally, this is a funghi example. So we digitize specimens on the UK plants health risk register and generally known to be pathogenic such as Russ and spots. And so this provides a source of data facilitate research into the plant pathogenic fungi so includes data on the distribution of the fungi and their host relationships. And in this case we were imaging collection labels really rather than the specimens themselves as the images are not as useful for identification purposes. So in this case, actually we use volunteers to help us capture the information. And this was captured using crowdsourcing, which I'm going to talk to you a little bit about later. So this is kind of an overview of there are many projects we've involved with the digitization from ones that purely digitization to one where digitization is just a small element of the overall projects. So some projects may need to database just need to database the label information and not image specimens. And then we have individual staff, honorary research associates, volunteers and visitors who are doing certain aspects of digitization for their own research projects. So volunteers helping particular researchers. And then we have image requests. So any other research from another institution you think they want they want specimens that would be deposited at Q, they can request them to be digitized. But there is a limit on this just because the amount of requests that we get so one request would only have like 20 specimens. Currently, we have digital records of one million herbarium specimens and approximately 500,000 Hungarian records. So as you see we still got a little way to go. And funding and digitization efficiency often does not match this is sometimes a bit of a stroke problem so because research is often geographical based or particular group of species, and the specimens to be digitized can often be located in all areas of the world. So often returning to the same cupboard multiple times. And the method of selecting can make a big difference in digitization efficiency. So, you know, obviously the most. The best method is just to select it covered by cupboard approach. So you're selecting whole family or general and selecting material by country level for example on Gola which is not that represented well represented in the collection you'd have a 50% reduction in digitization costs. So wherever possible we are taking a covered by covered approach, but if, if certain projects come up then we will, you want to use a particular data that is scattered we still will still do that. So, as I've mentioned at the beginning we thought we can do that but imaging technology speeds have increased. So you will just think about the camera on your smartphone. In fact, you know when we started, there were no smart phones. So if you just think about how much your camera and has has improved that's also the same with imaging. So back in 2004 we were using flatbed scanners, because there were no digital cameras that could produce a resolution that we wanted. There was a particular ice equipment called a herb scan that was developed by one of our photographers, but it and it would have been flatbed scanner which was turned upside down, but then it would take three minutes to scan a specimen. So one person would use two machines waiting for one scan. And so you would use you would set up a specimen on another machine. Whereas before you could only do 80 to 100 a day. Now you can image the same amount in about an hour. You know for one person. So for us it's for the past it's very important to create a very high resolution image. So around 600 PPI. So we want it to be sort of equivalent to a bit like looking down a hand lens. So taxonomists can start looking at structures and you can make identifications from the, from the images where possible. And so you'll be able to take measurements also plant parts from the images. So as mentioned before, it was possible to digitize the whole of barium and in fact a few others had managed to have done it before so Paris, and that was museum, and the naturalist biodiversity center in the Netherlands. So we entered a small outsourcing pilot in 2015 with the London Natural History Museum. So you did this with an external supplier called Pittri who had worked with naturalist before. And so most people who have completed the whole of barium about source their digitization. And so we image the Solanum, which are the potatoes in there, which contain the potato and tomatoes in that family, and the dice gracy, which have the yams family. And so specimens were shipped to the Netherlands. It's always a bit scary when you're shipping your precious collection overseas but it was only a small part of it, and everything works out fine. And they imaged it on this conveyor belt system, which is called a digital street. So one person would unpack the specimens from their boxes and put them on the conveyor belt. And the other person would move them from the conveyor belt from the pile to the conveyor belt and barcode them. And then the specimens move under the conveyor belt under the black box and the specimen image will be taken and there'll be a person at the other end that takes the specimens off the conveyor belt and puts and make sure they're put back in the folders in the right order. So the conveyor belt, the conveyor belt is controlled by the person who takes the specimens off the other end so there's no problem with anyone with the specimens falling off the end or anything. And then the images were transcribed by a team in Suriname. So what are the main successes and challenges? Well obviously there's always worry about non-experts, how they're holding, handling material. But they were always good so we had no complaints or issues that have been highlighted in since when the materials come back and nothing has been turned up, it's been about like five, seven years now. But Q staff did go over and train and made sure that the specimens were being handled properly and that the order was being maintained. We had high imaging throughput, so they're around 3,000 to 4,000 images created per day. And collaboration with the NHM allowed more testing of different variables and workflows. And this project has enabled us to determine staff resources and costings for a larger scale project that we're just embarking on now. So the challenges are really logistics for imaging off sites, kind of really resource heavy. You've also got the risk of moving the material itself. And to stop pest damage or prevent pest outbreaks, we need to freeze the material come back into Q. So it's just a logistic amount of effort to make sure that we can freeze all those volumes coming in and out, going back into the building. And also the supplier struggled with identifying sheets with more than one specimen. So probably it's only, we don't know exactly how many multi specimen sheets that we have, but probably about 5% of the collection. So you'll find more than one plant specimen on a sheet. Communication was very good, but we probably needed a little bit more up to face to face meetings at the beginning, but it was, you know, quite a small pilot. But the big ones will make sure we have more face to face meetings with the supplier. So we needed an effective protocol regarding qualities for data standards, and to quantify our expectations of image quality, and to make, and the quality, the transcription quality was good, but more thought was probably needed about how much interpretation that we wanted the transcribers to do, and dealing with exceptions. So we have been falling behind other institutions on this, but we had had full support from the director, the executive board in the, and the board of trustees. So we have trying to find funding for this project for a while, and we've, and we've done the cost for four year project I mean there's a lot of money because it's a massive collection is 29.3 million pounds. And but we have secured 10 million pounds from death for funding. So we're just kicking off a to the first like two year of parts of the project. So the main aims is outlined in the business case is obviously to make this data available to facilitate and accelerate a strategic and novel scientific activities. And to do it significantly increase access to the herbarium and the fungarium collections, you know, and to make sure that Q's collections that we don't fall behind others and Q is still a global leader, one of the many global leaders in plant and fungal science. We also protect the collections and to provide a backup copy of this data. We know that it won't mitigate everything because obviously you can't take a DNA sample from a herbarium specimen or fungarium specimen, you can't do chemical analysis. But if the worst did happen to the collection, then we at least we'd have this digital surrogate. So we also a part of the project is to make it more efficient use of the track of tracking the use of specimens. So there's a lot of requirements under developing access to genetic resources and benefit sharing legislation, which is like includes the Nagoya protocol. So you want to be able to track particular specimens from where they were collected and permits to their how and how they're being used and to enable efficient collections management. So the reduction in the cost of transaction management. So the herbarium collections digitization project has three aspects. So is the digitization itself, again, just the herbarium and the fungarium stage, the new database for collections because we had out of date content management systems, all the different collections were in different systems that didn't talk to each other. We also wanted a data portal that was more user friendly and not all collections actually has an outside facing portal. So work has been underway, and we issued the tender in February for the outs for a supplier, and it was evaluated and awarded in May. So we're doing contract negotiations with a successful supplier, which actually is a different company is a UK based supplier called Max communications. And actually rather than a conveyor belt approach they're using a copy stand workstations, but we aim and they were aiming to digitize whole of the barium in the next four years. It's quite an ambitious we're just embarking on this quite ambitious timetable. The pilot demonstrated that outsourced staff could not identify multi specimens very easily. So we decided that we'll do some of the work in house as well. So this is, we're barcoding and imaging the barcoding and imaging the orchids, because as you can see orchids quite commonly have lots of plants specimens on one sheet. So here's a sheet with like eight different species specimens on it. So it's quite hard if you're not really trained to be able to identify how many there are on this one sheet. And then the fungerium that are in lots of different packets on one sheet. However, we'll send these images over to the supplier to then transcribe, and then large bulky specimens like palms that do not sort of often lend themselves to the mass digitization approach. We'll have to you have to refocus in between each one really, and then also maybe stitch some images together. So, yeah, so at the moment we're at the implementation stage. And so we've got quite a lot of challenges. If you imagine there's a large number of specimens to digitize in a short time frame. We know exactly how many specimens we have. So we, you know, so the scale we'll have to monitor this and see whether we reached the end sooner or we have many more than we think. And we have a large recruitment drive. I've never recruited so much my life as recently. So we have approximately 36 staff recruited. So we have an external supplies to buy who bring in their own staff. We still need managerial operational staff projects for data managers who will get the data into the data images into the system quality assurance staff and curators to make sure that they're handled the specimens handled properly digitization officers, portal development IT staff, you know, you can imagine. And then there's a lot of new projects as we gave out we have a lot of new stuff so have a lot to think about the working space, but actually hybrid working is held so we thought this is going to be a massive problem. At the moment it's okay because you know with some people working from home a lot of the time, we have managed to find enough space in the in the herbarium. We have a large training program that we have to train up with all the new people, and also ensuring our storage network and data management system is suitable for ingestion of these large number of images and data, up to 10,000 images a day that might have up to like 250 megabytes per image. And also when we were not close while this process is going on we're not closing the herbarium or the bank area into visitors. So there's we have to minimize the impact on users of the collection. At the moment there will be some anxiety at the start and there is but there's strong support for staff for digitization. At the moment we just have to make sure that communication is key and then we ask and we answer answer people's questions and concerns, you know where are we going to be when in the digitization and and what parts of the collection will be digitized at which point. And so I mentioned image quality. So we wanted to quantify the image quality because you can be very subjective if you're just looking at an image. So we quantified image quality against veggie guidelines, the federal agencies digital guidelines initiative. And the supplier was asked to provide examples of the golden thread device level target and example herbarium specimen image with an object level target. So we provided a fake herbarium specimen if they needed it. And the image quality was assessed using gone thread analysis software. And so they were then determined if images reach two or three star value ratings for our defined image parameters. And then images also were assessed by visually assessed by taxonomist, and then a sample of images will be analyzed as we do in production and images will also be checked visually against quality criteria outlined in the tender. So generally for data. I mean the labels are not easy to read. There are lots of different handwriting styles. There are many different languages, although we don't ask people to translate but they would just have to understand the, the information to know which fields they go into. I mean it's quite standard information somewhere on a label but they can be all sorts of different places on the label. So there are changes in locality names and country boundaries over time. And so we need a clear transcription protocol with example labels showing how to interpret the data. So we'll be building on the one that we used in the pilot. And we just need a method for transcribers to ask questions for us to respond promptly. So we'll have a sample of their data quality software systems if you like, but we'll also want to maybe use some sort of Google Doc style where people can paste any questions that they might have. So then the images will need to be imported into our dam and just asset management system and transcribe data will need to go into our collections management system. So we have a, we'll have a large number, you know, we're increasing our rates quite substantially so we'll have to create quite a smooth pipeline. So we recently tended for a new collections management system which is actually called Earthcape which is a Finnish company. And we've gone somewhere already into replacing our old bespoke databases. And then in Tissue Bank, herbarium and spirit collections went live last year. Our transactions, so our loans, borrows and incoming material where we're tracking consignments went live early 2021 to two and a half and garrum collections alive in will go live in a couple of weeks. And then the Millennium Seed Bank and other collections to follow. When everything's in then say we can link collections easier so a seed collection can be linked to its herbarium specimen voucher that's collected at the same time. And then when you sample DNA from a herbarium specimen, so that parent child relationship can be more easily recorded. So apart from just Q we want this data as widely as possible so data be harvested by aggregator sites including GBIF the global biodiversity information facility I hope many of you aware of it. And so in 20 and also that's helps us a way of track specimen use. So in 2021 there were 327 publications that included the use of Q herbarium specimen data that had been downloaded from GBIF. And so as I mentioned we might still use an element of crowd sourcing. And this would be using to help us do some transcription but also for public engagement. So we use digital which is created and managed by the Australian Museum in collaboration with the Atlas of Living Australia. And we have crowds also over 130,000 records this way with over 1500 volunteers. I mean the majority have been done by people by sort of a core set of volunteers. So information is packaged into what's called expeditions, which is sort of a bit like going on a virtual expedition. It's not actually the specimens that you collected from one expedition, but it's just a way of packaging it up. And it's usually linked to a particular project. So for example, this was Kim Walker's PhD project, and this was data basing all arch and Chona bar collection for her PhD project. So quinine is an alkaloid extracted from the bar to the chinchona or the fever tree. And if you ever had a gin and tonic you'll be familiar with the bitter taste of the tonic which is the quinine so it's mainly used now to add flavor to the to the gin and tonic but you probably all know that it's also one of the most important drugs in history, and it was discovered and it was used as a treatment for malaria. So she was linking the bar collection to the collection in the herbarium specimens. So this was, you know, with she's these volunteers to help her with her PhD research. So I wanted to just finish up the last part of the talk I wanted to talk about the use of the digitized data. So with the flora. So it's been going a little while now and it's the online virtual herbarium and also the flora of Brazil. And so we want to know how it's currently used in their impact, particularly on conservation science science because that was thinking behind why it was produced. So we conducted a literature survey and an online survey of people using the, the resource. The 800 scientific publications in which flora was cited and the 81% of the 1000 survey respondents accessing the resources mentioned conservation related research. And so it looks like people are using it for conservation purposes. And top uses were conservation assessments, looking at distribution, and also finding new species. And the two landmark publications came out that were would not be possible without the flora research over without the flora resources. And so for the first time the plant diversity of Amazonians lowland rainforest is quantified based on a taxonomically verified species list, which is then underpinned by the herbarium the vouchers this person vouchers and have been identified by specialists so we know that they've been identified correctly. So the second landmark publication lists all known native new world vascular plant species. So it was the first catalog of the plant diversity of the Americas. So this herbarium so getting out this herbarium data, and like what's in the collection is vital to help all these different resources. Sorry, it went too fast. And so, while we digitize the collection, we think there could be many new species in in Q's collections waiting to be discovered. So, this is kind of, you think maybe new species are found in the fields, but often they can be found just waiting to be discovered in collections. So these are just a few examples. So, Dr. Martin cheek and an msc student Charles King came across a distinctive picture plant specimen at Q when it was loaned from the herbarium at the University of Pennsylvania. And this specimen was designated then as an appendix maximoides. So, you know, unless they had a chance to look at specimen it this time it was physically loaned them. So eventually, when specimens go online or digital it would be some more people can see it more widely. And so they more new species might be discovered. And then this species not been seen since its collection 110 years ago in the Philippines. And then similarly, I'll could specialist Dr. Andre Schupman discovered a new species of orchid dendrobium as a riem while looking to unidentified dendrobium specimens from New Guinea at the London Natural History Museum. He also alerted him so he thought on this looks a little bit different but what also alerted him to the that it might be new species was the description of the color of the flower from the label. Because it says it was a blue orchid, and it's kind of very rare to have like the blue and blue flowers are quite rare in botany. Yeah, so then he then so did find out to be that it was a new species, and it was published. There will be many new species waiting in keys collection to be discovered. And then the use of herbarium specimens and conservation assessments so as we've seen from the Forest Survey work that's one important use. So the IUCN is the most authoritative source of information on the distinction risk of specimens, and much the data contributing to an extinction risk assessment can be found on herbarium specimen labels. It includes information on taxonomy, geographic geographic range, population, habitat and ecology threats, use and trade. And the example on the right pseudo hydromas Evo was recently discovered as a new species following reexamination of the specimen material at queue. And as it's known from just three sites and from the Evo for Evo forest in Cameroon, the species species has subsequently been assessed as critically endangered. So the locality information captured in her specimen labels is crucial to understanding the distribution of plant species. So the locality data is pulled together to produce, and it's so specimens may then have to be georeferenced if it doesn't have a coordinate so that's not including the digitization project. It would be a particular element you'd have to do after the locality data has all been transcribed. Then you get a signed a set of coordinates, which is then used to perform in distribution maps. So when conducting conservation assessments, we evaluate whether a species is threatened according to parameters relating to distribution and population. However, plants comprehensive population data is very rarely available. And as a result, the majority of threatened plant species are assessed as threatened on the basis of restricted distribution and continuing decline in habitat range or population size. So as her parents specimen data underpins extinction risk assessments and the accompanying distribution maps is essential that data is transcribed as precise as possible. So here we have, we have two examples of data point projections for species here. However, the data corresponding to the top example includes a record that has been georeferenced just to the centroid of Angola, rather than a more specific location. So according to the extent of a current assessment parameter, the area within the polygon, then the species would be classified as least concerned. However, removal of this implies point reveals that the species is actually threatened and would be assessed as vulnerable. So therefore accurate herbarium data is critical in producing conservation assessments. Another way that specimen data is used is the identification of important plant areas, you know which areas are best to prioritize conservation. IPA is a criteria based system actually produced by established by plant life international. But it's with sites being identified based on the presence of threatened species threatened habitats and our exceptional potential richness. And of the range of research activities that need to be conducted to imply the criteria, those in red are reliant upon collating herbarium specimen data as a major source of information on species and their distribution. The workflow shows how herbarium specimen data integral to TEPAS data workflow, and the large majority of data that feeds into TEPAS assessments already exists primarily within the herbarium. And so digitization of data on target species is critical to the process. So we would hope that our did large digitization project can help with all these different TEPAS projects. So this is an example of Mozambique for Mozambique and started by collating and mapping all known data on the strict endemic and cross border endemic species of the country. The map on the right shows how the point data can then be transformed into species richness data per area, in this case, quarter degree squares to reveal hotspots of diversity. So this work was published, published paper and that was basically based primarily on herbarium data. So this shows how the mapping of richness and endemic near endemic and threatened species relates to the selection of TEPAS sites. So in total 57 TEPAS has been identified from Mozambique, much of which clearly corresponds with the concentration of endemics and threatened species. Seven sites make up less than 3% of the total land area of Mozambique, but hold important populations, 83% of the threatened species, and over three quarters of the endemic species. Hence this offers sort of economical method of prioritizing plant diversity in Mozambique through conservation prioritization, although presents fewer than half sites of any form of formal protection. So it's just a sort of an example about how herbarium data is really being used by Q scientists. And then this is a final example based on the images. So this is talking about, you know, with large mobilization of large scale data sets of images through herbarium digitization. This provides a rich environment for the application and development of machine learning techniques for species identification. And, you know, however, limited access to computational resources and uneven progress and digitization, especially for small herbaria still present barriers to the wide adoption of these technologies. And there's a lot of work that's been going on and some good work about species identification, but there's still a lot more to be done. And this is an area where Q will be looking to to increase in the next few years. Barnaby Walker at Q is investigating using deep learning to extract representations of herbarium specimens are useful for wide variety of applications so called representation learning. So deep neural networks automatically extract relevant features or representations and representations learned from large image data sets can generalize to other tasks. So maybe more, but you could make models built on small data set to set specific tasks more effective. And so he's looking at different types of neural networks which will give different representations. I want to find the type that gives the most generally useful representations. And so there's different supervised learning neural networks trained to classify images using a label data set, whereas self supervised networks, on the other hand are trained to perform tasks where the labels are created from the images themselves, such as reconstructing the original image after some transformation, or identifying which of the two images is a transformed version of the target. So different neural networks need different levels of supervision. So, Baz or Barnaby is looking at these type of things to look to see how machine learning and deep learning can help us with species identification. So there's a lot of potential, and yeah, and so we'll be expanding into this research. So, yeah, so that's comes to the end of my talk. So just like to thank a lot of people so I've talked a lot about other people's research. And so kind of some of the, so if you have some of these questions I admit I'm not an expert in some of these areas so I might have especially the machine learning deep learning. So I can put if anyone has specific questions about those particular aspects I can also point them to the, to the experts in those areas. John Adcock thanks to the ICMS program manager, then Jack Plummer who's our plant assessment coordinator, and Ian Derbyshire leads our deepest program and Baz leads our deep learning. And also thanks all the digital collection staff past and present. So yeah, so I'm welcome to take any questions. Thank you. Thank you very much Sarah. So, anyone who has any questions, can I encourage to use either the chat or the Q&A session. So I'll start with a question myself to give everybody an opportunity to put questions on the Q&A bit. I'm curious about. So to give you some background we you know I know of other collections and other collections that go through this process of digitization and I'm well aware of the stop start process of funding we get a bit funded and other bits not. And in your talk you started from a from describing a set of funded projects to then a more systematic general approach and I'm curious. So how did you, how did you deal with funding and how did you fund that, that larger systematic approach. I mean what are the some of the challenges around how you're funding all of this. Yes, so I have to admit a lot of the countries that have managed to fund the whole projects have always come from government funding. It's quite hard to get funding. There's other bits of funding, but philanthropy funding like with foundation. It's harder because the impact sometimes doesn't come to later. So in the in particular research project is saved one particular country they just really want to say the specimens from a particular country. So it's really been a lot of effort has been put into your business cases with the government really and, and, and so we have been successful in that approach that's the only reason why we've been able to start the project really is with government backing. And but then we're hope we also hope to leverage some we work with foundation for some individual supports or, or other funding projects might, if ones come up, we might try and work out synergies between them so yeah. So yeah, it's tricky it's not an easy thing really. So it's it's it's complicated to do this systematically then in the sense of having a systematic plan of this is this is my collection. This is how I'm systematically going to go. Yeah, I mean if you want to do in like four years maybe just volunteers we so say if we had people that were interested in just Columbia for example so we would. We would then try and say well off the way our collection is split up, we won't just do Columbia we will do region 17, because then we know we don't need to go back into that particular section of the cupboards. So we would. So then we, and actually that is just as fast as selecting out just the ones from Columbia for that particular area so sometimes we do bigger chunks than we would have done otherwise. And then we can mark the cupboards they were not going in there again. So, so where we can and it doesn't slow us down we do a bit more. We do more chunks if you like then we would have before. Thank you. On the server question here, I'm going to more generalize it but on the geolocation because I assume you're dealing with specimens that were collected years ago to many years ago, there was no GPS at the time. So, exactly. How do you get to a point where you're satisfied with a with a geolocation time. Yeah, so at the moment we, it's not been included in the digitalization project at point because it's quite expensive so it will be done afterwards within other funded projects when but we'll have which specimens that we know we have and then which specimens that need your reference thing. So we can't the way that we've done it before has been. So that goes into it looking at our maps and so is that's why it's very sort of expensive. And so, but when we have a whole volume, you'll see hopefully there'll be some scale where the same locality comes up quite a few times, because you know the collector might go to the same collectors place. And then you'll be georeferencing to the same, the same localities there might be like 10 specimens from one particular locality that you can all all georeference at the same time. So georeferencing, there is sort of mass. They have been in the past some projects to look at whether you can georeference automatically using like programs to look at the text and then see whether you can automate georeferencing. And there have been some progress in that but there's nothing really available there at the moment so I know people are looking, and there's been lots of research on that. So this will be we'll have to really get a researcher in to investigate and collaborate with other institutions. So yeah at the moment it's quite a manual process. And it is quite long takes quite well. Yeah, but having all the locality transcribed is a is still such a massive like of the process. Yeah. Absolutely valuable, as you showed later with the various examples. Another question I have, which is, which is an interesting one right is so as you went through the process of digitization, the quality of the imagery ability to image has improved over time. So, how do you, how do you manage, you know, you have higher resolution images now compared to the past, which record an average. How do you how do you deal with as technology advances. How do you keep, how do you manage that digitization process. Actually, because the more with actually the, because we already started at 600 PPI the resolution, actually our resolution hasn't improved as an increase that much in that sense, because but it was just the time it takes to produce that images improved, but the technology always moves on so always just testing new camera equipment see what comes up see what's better. And, but you know you so basically you want to in some ways you want to get them up the most that you can afford at the time and then you know because it will have a lifespan. So we're always sort of, we've got an image specialist queue and we'll keep an eye out. So, you know, what's technologies about there and then have like an upgrade plan for our for equipment really. So, yeah. Would you have to re digitize specimens if you suddenly find that the quality of the imagery is old or would you use a digital approach to to touch up. Yeah, it's always. Yeah, if that thing is it's always that you could wait forever until it improves and improves but at some point you just got to do and you don't really want to do it again. So that's why we kept with this well this will tell us that hopefully this is a good balance between resolution file size, and also we can do this now and the cost and. Yeah, this is the best that we can do at the time let's do it and hopefully that gives us enough uses later anyway to future proof us to a certain amounts, but good just knows what will happen like in another 80 years but it's a big process to do again. So, you know, you're trying that's why we picked the to produce not lower resolutions from the beginning but higher resolution images so that text on this can actually start doing identifications from it really. I have one final question. It's about time for one final question. And this is something that lives very much in the digital environment it lives very much in Merrick. So I can usually thinking about digital technology. It's around the carbon footprint of this exercise. So would you have a sense of the carbon footprint of the data is digitization and data storage, as well as compared to the carbon footprint of the searches visiting or the specimen being sent out. So in essence, what the question eludes to is through the digitization process, are we actually being more carbon efficient than if we were to do an equivalent kind of research elsewhere. Yeah, I think so. And so sometimes digitize a bit of different so we do send lots of images alone instead of specimens on loan. So we're definitely not transporting so you know as many, because what happens is people can narrow down their search lots sometimes they still want the specimens for certain certain reasons. But they can narrow down what they really need. Some people don't need the physical specimens at all. And what we might find though is what we hope to find as well is that people now know what we have. So in some ways it's so sad that we might have still some people. Some of these specimens might ask them on loan for certain other reasons that we didn't they didn't know we had them before. So it'd be quite interesting, we do think they will be a decrease in loading out material, but also, yeah, we would hope that we, yeah, it'd be quite interesting to see. Sometimes we might have a less because they were thinking well maybe there's less taxonomous as well so how much is background of taxonomy and being used as compared to, to because we've got stuff online. So, yeah, so we need to kind of kind of look at that but yeah I think we will be sending, you know I think even at Q now people will rather than go and look at the specimen they might go and look at specimen on their screen. And I think it's you know they're searching the session online to look and see or what do I need rather than going in the collection. So, yeah. That's a that's a good new story to kind of bring out around your efforts right. Yeah, and we saw that. Yeah, and in the pandemic it's been really useful for people to be able to we can access the collection for a long time. And we could still send, you know, people skills or data and imaging, and we did crowdsourcing during the pandemic, you know, for our backlog so. And yeah, so, so that was, you know, we're still able to search server searches. So it's yes even more kind of came to the fore during that period. Yep. Well, thanks a lot. I think that's what we all we have time for today. Again, thank you so much for a very, very interesting talk. The moment we're going to done here I'm going to disappear into the cuban digital records and see what I can find. Yeah, we haven't got a nice portal at the moment. It will be coming. So yeah, I mean, there is a portal there but yeah needs some improvement. But it will be coming. Watch the space. I'd like to remind everybody that we've recorded the session and we'll make it available again to watch on on the digital environment website on the YouTube channel. Again a reminder to subscribe to our YouTube channel. The link is on the chat. And I guess this is the final webinar of the series. Thanks again Sarah you've, you know, we finished on the bank it was a very, very good seminar really enjoyed it. We'll start the next week of the summer. And we will start again in the, in the spring with a new webinar series. And we'll be in touch with this precise topics as we develop that further. So with that, thank you very much for attending this session and to our speaker and enjoy the rest of the day.