 to start. Thanks again for that introduction. It's a real pleasure to be here. I think Mike and I really appreciate this invitation and also of course thanks to our audience for coming along today. So Mike and I are going to do a little bit of a double act looking at really the changing role of natural science collections, why they're so critical to addressing some of our big science and societal challenges. And our talk is going to be quite a mixture of issues covering collections, infrastructure, digitization, big data, and also looking quite a bit about the impact of that data and how that's addressing some of these big challenges. We'll be looking at this partly from an institutional perspective but also in the context of these various national and international programs that we're both involved in. So with that let's make a start. So I think natural science collections are probably most famous for some of their iconic specimens, things like Darwin's Finches, leading to the theory of evolution, this iguanodon hoot tooth here, central to Richard Owens' understanding of dinosaurs, otherwise things like this skull looking at human evolution. But in a way that sort of gives a false impression of those collections and in particular the sheer size and scale of those collections. So Natural History Museum London for example has something in the region of 76 million specimens, there's a fraction of those are on display, maybe 30,000. We're typically adding to that collection and of course that's a really variable number of what we add but in any one year we might add between 30 and maybe 150,000 specimens. We have something in the region of 8,000 scientific visitors at least in a normal Covid year and we're up to now something in the region of 5 million public visitors, a really extraordinary number. We'll typically loan out last numbers of specimens annually, maybe up to 70,000 annually and we have a science staff of around 300 and a really substantial PhD presence as well, so up to about 100, sometimes even 150 PhD students. And increasingly that volume of collections is becoming really important to answer some of the big science and societal challenges about issues that certainly we care a great deal about, issues like food security, healthcare, biodiversity, conservation, climate change, etc. And we all know, I think I'm preaching to the converted, that these are very much central to our sustainability. NHM London also recently declared a planetary emergency as part of our strategy and a lot of our work now is repositioned around basically trying to address that planetary emergency and providing a lot of the evidence that feeds into a lot of national policy commitments and there's a raft of those policy commitments here where there's a direct line between the data that's associated with collections and then through science feeding that data into various policy commitments. And of course there's a lot of societal recognition about the role of natural history and natural science collections too. I picked this example I could have picked many just recently a new GCSE on natural history was announced trying to embed that into the curriculum. So this talk between Mike and I is going to cover a little bit about our origins, how we started to make that transition to digital. We're going to spend quite a bit of time talking about the impact then of those digital collections, how we're scaling up, how we're doing things nationally and also what's going on on the international stage and how we fit into that. So with that, we're going to start with Mike and he's going to talk a little bit about the BDS origins. So from the origin of the survey in 1835, we've made our collections available to the public for study and for education. And the Museum of Practical Geology that ran through from German Street Piccadilly was opened by Prince Albert in 1851. It was a highly specimen rich display. All the crown jewels were there. Anybody could walk around as indeed Darwin and Lyle did on one of the balconies while they were discussing matters in London. We even published next instructional catalogs and you can download this from Google Books next. But after moves to South Kensington in 1935 and then Keyworth in Nottingham, far fewer specimens were on display. Indeed, in Keyworth virtually everything was in store. And when the collections were moved, the scientific community actually expressed concern because it had been convenient in the past to visit the Natural History Museum, study material and then pop next door to the Geological Museum. Next. So clearly the internet was a good means to address this concern. And when I arrived at BDS in 2000, it's been one of my drivers to make the collections more accessible. Unfortunately, the BDS strategy in 2000, the new strategy at that point, actually included the introduction of online collections databases within five years. And we started this with an online GIS with borehole information that was launched in November 2000. Next. Next. So we followed over the next five years with online databases for mineralogy and petrology. Next. Next. Paleontology. Next. And borehole and related materials. Next. And this meant in terms of specimens that were visible to the public. In other words, specimens that you didn't need to make an appointment to go and see. So I'm including not only the material that was on display and material that was loaned to the public. So back in 1900, we were talking about something like 60,000 specimens on display or visible by the public without appointment. And then with the move to Keyworth, this was very drastically reduced. But as soon as we went online next, you can see the difference in scales that we had by 2020, we had over a million specimens available in databases. And as we'll talk later, a significant proportion of these were present as detailed images and even downloadable 3D models. Next. So BDS, I think, was a little bit of a head of the curve of Natural History Museum in terms of their kind of digital thinking. And when I arrived at the Natural History Museum in 2006, it was, I started to sort of look at what was going on digitally. And this little back of the envelope calculation here, which I did in 2008. Don't worry too much about the numbers, but the headline from this is that there were lots of little digitization projects going along. But if you started to add up and try to work out how long it would take to digitize the collection from those little projects, it was going to be about 900 years to get the data and a further 500 years to take the pictures. So clearly, we were not progressing at a particularly fast rate at that time. But fortunately, I think some really big transformative technologies were coming along, digital, of course, being one of them, the others critical to this being the genomic revolution and also citizen science. And eventually, we got ourselves sufficiently organized to start from internal funding, a really much more comprehensive program of digitization. We have this sort of Google-esque mission to collate, organize, and make available one of the world's most important natural history collections. And at the time, we set the sort of ambition to digitize about 20 million specimens. And we have this really quite transformative program. And many of this sort of high level plan, much of this actually bore fruit and really holds true today. It's kind of rare that that exists in big organizational settings. But we looked at all the various areas that we felt we needed to transform to really get us fit for digital, which of course is far more than just the act of digitization. It's about transforming our policies, our infrastructure, our people. And so we put in place this kind of high level plan and really to kind of get things going started with some pilot digitization projects. And there were a number of these. Now just dip into a few, actually, but maybe first just mention what kind of data it is that we're trying to unlock from those specimens. So each of those specimens, often we would have a species level list of what was in the collection, but we wouldn't have a specimen level list. And once you start to look at the individual specimens, you can unlock a world of different data. Molecular data, although obviously that requires certain technologies, chemical data, morphological data, geospatial information, taxonomic ecological information, it's all locked either in the specimen or in the series of labels that are associated with it. I think it's also worth referencing what do we mean by digitization, because of course it's almost how long is a piece of string that can mean many different things for different communities. We now are in the position where we actually have a standard so that we can start to compare what we're doing in the context of digitization with others. So we have this concept of the minimum information for a digital specimen and these different levels of digitization with what we call MIDS2 being kind of research ready. So typically that would contain the minimum level of information that a lot of our researchers are wanting to do by using those specimens. So we started with these pilot projects and we chose a few taxa which we thought were going to potentially be quite impactful. One of those was the UK butterfly and moths collection. Really quite an enormous collection. There's a very long history of collecting butterflies and moths in this country, a particular focus on the butterflies to begin with. The moths are a bit harder because there's a lot more species of them. And we really started to refine our digitization processes to speed up that digitization and we were able to get through about 800,000 specimens in about three years and bring the cost down to essentially about a pound of specimen. And almost whichever way you cut it now that figure of a pound a dollar a euro, often with our international peers although the average, although there's considerable variance in cost of digitization between different specimens, the average often ends up equating to about a pound a dollar a euro etc. Another group that we looked at is our slide collection. This in particular is our parasitic glass slide collection. We have about 1.2 million slides and again a lot of the advantage of working with microscope slides is that they're very regular. So they're quite standardized and it's easier to go about extracting information from those. And in some cases we were able to digitize as many as 700, 800 specimens a day by one person and extract that information. Often we're a lot less interested in the images than the data that's on the specimen and I think that's an important point to mention. Images are very important particularly for certain kinds of specimens and of course they uncap all sorts of interesting opportunities about what you can do with that digital object but a lot of it is the data associated with the specimen. At that time also we had no real mechanism of getting our data out there and making it available and so simultaneously we built the Natural History Museum data portal which is essentially our platform where people can access, browse, download, site, the collection and I could probably speak for another whole talk about the data portal but I won't. I'm only going to talk really a little bit more about the tracking and data citation a little bit on that but it's worth noting that that was a huge undertaking to kind of get that up and in operation. So a huge amount of what we do now is really about understanding the impact of that collection, those digital collections and demonstrating that impact and in this next section I want to talk a little bit about how we do that. So perhaps fairly commonly ubiquitously now people are using dashboards for everything we built a very early dashboard tracking use of our digital collections and because every dataset has a DOI, every specimen record has a permanent URL, every version is has a permanent URL as well, we can provide really granular data about how our collections are being used and this dashboard here I just pull out some key facts. So since 2015 when that data portal was launched we've had 31 billion records flow out of that portal in nearly 500,000 datasets and they have been cited in 1,744 publications at least that was on Monday I looked this morning and that figure is out by nine. So in the space of just the last five days there are nine new papers that are using our collections so really amazing levels of use for that and we try and track in quite deep a level high level of detail the kinds of users essentially so this is our interface for doing that and on the left hand side you can see that list of topics so we've got things like climate change, conservation, ecology, human health, taxonomy obviously, agriculture and those numbers correspond to the number of publications that are linked through to those topics. We can also see things like the major publishers so who from what country is using our data and typically UK users or UK researchers are number two in that list at any one time US researchers are normally at the top and then as you can imagine there's a long tail we can do things like understand at least for those non-global studies what countries people are publishing about Brazil perhaps not surprisingly is top of the list and then in this dashboard each of this kind of this long list each of those are kind of publications that are using our data so this is just one that was published just a week or so ago climate warming changes, synchrony of plants and pollinators and we can track there the DOIs that they're citing and we can also see very specifically what specimens they're using from our digitized collections and of course most of these studies are not just using one institution's collections they're using a real mix of data from a variety of institutions but we can see what NHM London specimens are being cited as part of that so in this case about 10,000 specimens so really quite amazing granular data on how those collections are being used and I thought it'd be useful just to kind of showcase a few examples of this from a range of different areas and Mike and I that's going to dip in and talk a little bit about some of those examples I just did a quick snapshot of news associated with digitized natural history collections just when I put these slides together on Monday there are some examples of things like our horseshoe bats being digitized relation to the COVID pandemic lots of new species are discovered in collections there's typically about 20, 25,000 new species described every year about half of that number are not found from field research they're found in the back of museum drawers and digitizing those collections of course sheds light on those specimens so there's stuff here on 3d reconstruction of Darwin specimens some publications on a publication on avian color gradients using NHM London collections stuff on wheat domestication and genotyping and a lot of work goes into red list indexing as well so these are conservation threat assessments that are done off the back of natural science collections usually trying to determine the maximum area of extent to the species and the impact of usually climate would have on on that species just a few more examples of a bit more in depth on that so probably one of the biggest users of natural science collections are those interested in using those collections data for historical baselines and you can see kind of biotic responses particularly to climate change when you have a long series of specimens that have been collected and our butterfly collections actually have been really well used in this regard so you can see things like changes in geographic range changes in phenology so maybe early flowering times earlier insect emergence changes in body size as well and the butterfly collections have been used extensively actually on this a really nice paper relatively recently and Steve Brooks from NHM London, Philip Benberg from Southampton they've been looking at the impact of climate on those butterfly collections and they've actually essentially mapped Met Office data to on climate to our historical collections up from in this case just a few species and they can see things like the bigger body size the earlier emergence of the pupa the changes in the geographic range they started doing this with just a handful of species it's very labor intensive to sort of generate those measurements but now they've accelerated that through computer vision pipelines so this is some software that they've built Mothra which looks at our images and basically generates and automatically extracts a lot of the measurements that they need and so moving from essentially to actually three species in that first paper they've now managed to process about half of our species and again to take those changes in body size and look at emergent states by mapping environmental data and linking that to our collections data and that's quite typical I think of actually quite a lot of the way that collections are being used at the moment because we have that intensity of sampling often for many hundreds of years often the only source of evidence about where a species was if you go beyond something like 1975 when a lot of the efforts to recording efforts sort of really systematically kicked in in the UK the only source of that evidence is really through collections and that's why they're so critical this is another nice example actually this one kind of combined digitization with citizen science this is Gavin Thomas's team he was looking at our bird collections and in this case they were digitizing our birds and in particular bird beaks and looking for changes in the morphology of those birds bird beaks the citizen science element of this was the public were asked to put particular landmarks on those bird beaks because it's quite hard to automate that and then there was this giant meta-analysis of all those of all that data essentially mapping that to the phylogeny of birds and looking at patterns and bursts of beak evolution across those birds it's a really nice study and attracted quite a lot of press attention this is even more recent another bird study this is the most amazing data set of bird traits so this is Joe Tobias at Imperial College he and his army of students were visiting various museums Natural History Museum London constituted probably about 50 percent of the specimens that they examined and they were extracting various kind of traits of birds this covers almost every single bird species bar about 350 bird species so an enormous amount of work went into the compilation of that data set and anyone interested now in looking at the ecological patterns of variation across birds certainly using those traits that's an amazing resource that has just been published actually really exciting and then over tonight okay well in 2009 BGS operated three core stores and the decision was taken to move the contents of two of those down into a centre of excellence effectively at our Keyworth core store and that meant we were going to have to move about 175,000 boxes of core from Gilmerton up in Edinburgh where we held the UK continental shelf hydrocarbon reference archive and I realised as we were going to have to open every box to add additional packaging before we moved it it would be a waste of time not to take photographs at the same time so we did that we built a special setup and next it enabled us to take some really high quality high resolution core images and in fact on the the full resolution I can see more detail on the images than I can do on the core without using a microscope or magnifying glass and these images we know are used by all the operators in the UK continental shelf they're well used by academics and they've also been used by a number of AI projects including one at the Colorado School of Mines that's written code to extract the actual core pieces and manipulate them and at BGS we've been doing a code to quantify the quality number of breaks and that in the core. We also know it improves efficiency in the core store because visitors can look at the images of the core before they come on site and they only need to ask for those bits of core that they actually need to address the problems. Next so we did the photography with a effectively a production line and the boxes came down from storage were opened and put in a jig the barcode was read and that displayed all the information from the database it then went underneath the in this case it was a phase one camera quite an expensive high resolution camera the image was taken checked everything was controlled with barcodes so if the image was fine the photographer scanned an accept barcode if it needed retaking he would scan a retake barcode and then the material went down the conveyor and extra packaging was added before it was placed on pallets ready to transport down to Keyworth so this was a very successful exercise that by using this approach we were able to do it in just over a year and a half whereas some of industry thought it would take us 10 years to photograph the entire collection. Next please another imaging project we've done has been with our thin sections has been said slides are ideal because they are fairly uniform and we now have 160,000 petrological thin sections all digitized and online and each one of those is present both as a plain polars and a cross polars image so it's actually twice that number and you can spot them on our GIS every one of the little black triangles represents a specimen with a scanned image and we know this has been well used by academics next please and it raises the point that when we take the photographs we don't do the ultimate highest resolution possible and we don't use a system that can rotate it to capture the slide in all the different orientations under the cross polars we look on this as essentially a discovery tool so that academics commercial companies can decide whether they need to visit or borrow the slide for more research but we do know that actually for many purposes these images have been perfectly adequate. Next and actually we've used a fairly simple setup to do it and I should say a volunteer or volunteers at Edinburgh and Keyworth have done most of the work. Thanks next. Okay under new project we realise that a lot of tight fossils were first described in the middle of the 19th century might be a cryptic comment in the collections of the reverend green and in many cases the present whereabouts of some of the types was unknown so it seemed to me an important project to seek these out to re-photograph them and we've got we got funding from GISC to not only re-photograph but also to laser scan a couple of thousand of the most suitable fossils so next please so as well as doing straight photography high resolution photography we also did stereo pairs using a simple seesaw mechanism next and given that this was over 10 years ago we were using what's now a fairly primitive laser 3d scanner but the results were excellent and they're all available online and downloadable from our database next please so this is the GB3d website you can search for taxonomy fossil type institution and so forth next please and then you can view the specimens as I said high resolution images we serve them as in JP2 format to maximise the resolution and to minimise the download speed next so you can actually see as you zoom in very very high resolution next please so in total we digitised 17 and a half thousand types and 2003d digital models now what was innovative about this is that we got funding from GISC but we worked with a number of partner hubs including the National Museum of Wales Cardiff the Sedgwick Museum of Cambridge and the University Museum of Natural History in Oxford and through the geological purators group we worked with a number of regional and university museums so we had a system of effectively hubs and you could call it nodes so in some ways this could form a pilot project for some of the work that may now take place that Vincy is going to talk about next in disco. So that's a nice thick way into kind of how we at NHM London started to sort of scale up our thinking around digitisation and we really needed to kind of have a much more organised programme and organisation is really key to really developing kind of high throughput programmes that are generating large amounts of data so we structured our work into a set of what we would call core test programme and delivery activities delivery being those projects are currently running and delivering a kind of high throughput test is really about developing and innovating the various workflows that we needed to develop to support certain kinds of digitisation so we literally have specimens that range from maybe a tiny parasitoid calcid wasp perhaps smaller than a full stop on a page through to maybe a dinosaur femur certainly larger than the room that I'm in presently and we've got to develop workflows that cater for really about wide range of different types of specimens. We also needed to transform the people that we had and digitisation is very much a team sport it requires a real range of people and it's quite labour intensive so technology can certainly help us and we'll see in a little bit how technology is helping us to speed things up but it's there's a lot of people there's a very lot of specimens there's a lot of manual handling robotics can only get us so far and realistically at the moment it's far cheaper still to have a a standing digitisation team and in NHM London we had to build up that team so as I mentioned we started to work on these different digitisation workflows of various different kinds to support the diverse kinds of objects that we had within the collection some types of specimens lend themselves very much to I guess what I would call most industrial scale digitisation a little bit like the kind that Mike was talking about where you almost have a conveyor bay belt style production and probably the kinds of specimens that most lend themselves to this are herbarium sheets so this is the way that most botanical material is preserved it's typically preserved on a sheet flat squished down essentially a piece of plant there's a nice usually nice big label with lots of fairly structured metadata and they're relatively easy to barcode and they certainly lend themselves to this sort of style of conveyor belt digitisation and in a few cases particularly in partnership with Q Robert Hannick Gardens Q we've done some projects of that ilk to speed up the digitisation of our botanical material we have to do quite a lot of innovation because of the very weird nature of our collections and as you can see in the early days in particular we use quite a lot of Lego to help us do that innovation and I had to do some rather strange sort of justifications of why we're buying large boxes of Lego but they're a great it's a great way of very cheaply innovating and creating tools to kind of that we can then build in our workshop to actually kind of scale up and if you like build a bit more resilience into our digitisation processes and in the early days of our program there was quite a lot of emphasis on drawers so many natural science collections are basically preserved usually in little unit trays little cardboard trays and then they're held in drawers and of course our insect collections are really significant in that respect we've got maybe a we have about 30 million insects but only in 150,000 drawers the challenge with the drawers is that what you really want to get to is the label data associated with those specimens and many of our processes don't really struggle to get to that label information so the only way of dealing with that is either to unscure all of those labels that are associated with the specimens and lay them out or what we've actually done now is built a multi-camera setup and something we call Alice which stands for angled label image capture equipment and what that does is uses a combination of computer vision and machine learning to take one single snapshot of that picture of that specimen from multiple angles and then we use software to segment out the different regions of the label and stitch that together into one human readable image which we can then transcribe. A bit like Mike said barcodes are absolutely central to our process and we've gone to the point of encoding a lot of the metadata on specimens into barcodes so that as you take the image you then instantly in effect transcribe those key bits of information to populate our databases and in terms of digitization rates we can in some workflows I think the most we've ever done is something in the region of about 1600 specimens a day that's for one person but there's enormous variance in that different workflows lend themselves to different processes and then you can see the average for most of those workflows is much lower it's probably more in the region of two to four hundred specimens a day. We're also kind of always comparing ourselves to others other institutions are involved in this process to see whether we're more efficient or not and in short a lot of the NHS processes are pretty efficient and now what we're really trying to do is focus on building up the national program of digitization so within the UK it's estimated that there's something like 100 and maybe 150 million in total from the surveys that we've conducted partly through AHRC funding we've now been able to track down if you like about 137 million specimens and we know those are present in 84 institutions across the UK and there's this beautiful dashboard if you follow that link bit.ly slash disco UK where you can see some of the detail about what's in those collections and really get kind of a much stronger sense of the content of them and there's a bit more data there about what's in those collections very few of them have digitized records in any form and a very small margin proportion of those and what we would call research ready data so there's a huge amount of work to support those institutions in terms of getting that those collections digitized and research ready and a lot of the issues there relate to basically funding staff many of these are mixed collections so they don't just have natural science collections and sometimes getting prioritizing those natural science collections is quite hard within those institutions which have very diverse collections we're working on a national digitization plan and in fact we've just about to publish the blueprint for that which is a 28 page document a sort of snazzy brochure if you like really outlining how we want to approach that from a national perspective we've also generated a lot of training resources and looking at the economic case for digitization too an NHM London just recently has published a report which essentially says that if we were to digitize just the NHM London collection alone we'd unlock something in the region of 2.2 to 4 billion in benefits to the economy over the next 25 years and the DOI for that report is available there but there's all sorts of sectors of the economy where these digitized collections are really important we're doing similar things internationally as well so we lead an initiative called the One World Collection where we're surveying the top nearly the top 100 is the top 72 institutions worldwide we can track down about 1.1 billion specimens across those institutions and we're also looking at things like the skills the taxonomic skills within those institutions as part of that survey there's no real kind of global shortcuts to funding if we look at our peers most of the money has come internationally from governments and this is a quick snapshot a very dirty snapshot and a very incomplete snapshot of other countries that are funding their digitization programs a lot of the work in the UK has come under the auspices of something called disco the distributed system of scientific collections which is really trying to integrate UK natural science European natural science collections and create a common digital gateway to those collections and right now we're very much focused on building that digital gateway that the data portal for UK collections getting a bit more of a sense of how to support the diverse nature of UK institutions in terms of their digitization capabilities and then also of course building the funding case for supporting that and I think at one minute over I will stop there and hand back to Adrian thanks brilliant thank you very much Vincent Mike fascinating fascinating talk and the scale of all of this is just mind blowing really isn't it huge challenge for that really interesting stuff we we have a few questions from the from the Q&A if so I'll put some of those here if I may and starting with a couple of questions about digitization rates so one is just asking to what extent are new collections that are being collected now are they are they digitized as a matter of course and then a related question so how is the rate of digitization changing over time now and a suggestion that if that's increased in exponentially and also the the sort of digital profile of a specimen or a collection is also exponentially growing what does that mean in terms of sort of storage requirements for this information great questions maybe if I answer the new question first and our new collections being digitized so yes for some so the we run pilots across natural history museum with certain departments particularly insects actually which have kind of really embraced this and really kind of feel the seeing kind of the value so new entomological collections are being digitized but that's not universally true across all collection types and in part that is because we don't have workflows still for certain kind of very complex collections so a lot of the material in spirit for example so that's preserved usually in a mixture of ethanol and actually much more rarely now formal in a lot of that wet material is not digitized as it comes in it's just too hard at the moment to do it and Mike where does it stand with BGS the new material well in terms of the the main existing projects I've talked about yes any new UK continental shell hydrocarbon core is automatically photographed and imaged and put up likewise type fossils and thin sections but I have to say our resources for digitizing are very very limited and though I didn't stress it all the three main projects we've done have been financed through other sources and not through normal BGS income so we had just funded on the GB3D the core was initially at least photographed through in actual fact the whole project was financed through the sale of the core store in Edinburgh and as I said the thin sections have all been done by volunteers so yes it's true that core budget is used to photograph the new hydrocarbon cores but we are very very tight so the answer is we are doing relatively little but those projects that are ongoing we are keeping them up to date yeah we've certainly that funding issue is kind of critical to us and the only way that NHS London has been able to do it in the absence of any national funding program is really by scraping core funding and actually making it an institutional priority and I've had to over many years really kind of hammer that message around how or why this is so important to our future mission and actually the way that we looked at this is almost the counterfactual so sort of the way that I've often made the argument was what would happen if we did not digitize our collections and actually the kind of the fear about our collections becoming less and less relevant is partly what's driven that case but the truth is it requires big money very big money you know in the region of probably 150 million to do a national digitization program and all other countries have only done it through national funding I think the other question was about rates Adrian um yeah and and what I guess what increases in digitization rates and increases in information requirements what does that mean in terms of the overall data yeah so the information the information requirements one at least for us um I wouldn't say it's a red herring but it's much less of an issue than you might think so certain kinds of digitization certainly generate vast amounts of data but typically we are um you know the the kinds of data that we're liberating outside of the images um uh which are you know not ridiculously large I mean a high resolution image of a herbarium sheet for example might be 500 megabytes a CT scan would be you know upwards of 40 gig but then you are we're not doing that routinely that is not a kind of a part of our standard processes so the some of the storage requirements are actually relatively low and often we get a lot of questions about them but I always feel they're a bit of a red herring it's not the bigger challenge the bigger challenge is basically turning the handle so to speak to speed up the rates um I think rates for certain kinds of collections are plateauing a little bit um so our insect digitization rate has been massively sped up by um the adoption of that Alice system the with using the angled label uh angled cameras um but there are lots of exceptions where that just doesn't work um you know I have I've seen specimens with 16 labels all skewered on a pin they're too close to make that system work or where the labels are folded and of course then again you can't get access to the data so there's enormous um uh uh variants um the real issue for us is probably the other workflows they're trying to develop workflows for things like those wet collections a lot of the paleontological material is really fragile you can't apply some of the high throughput processes that we would like to it uh of the micro paleo material which will be in you know you'll have kind of draw bones the size of not quite a full stop but at really tiny and and those we just haven't worked on yet so there's an element of just sort of biting off what we can um and then getting at the most the fastest rate possible out of that and some of that's been I mean artificial intelligence the machine learning are really helping in some areas of that and we've got relationships with Turing and um Google and a few others in terms of helping us with um dealing with that great um a couple questions about methods if I may so I think a couple for for Mike actually one was around what GIS technology bgs used for their their work um to with GIS and profiles um and the second one is around the the volunteers that were referred to so who how are how are these non-professionals identified and harnessed and so who are they are they hobbyists retirees what where did the volunteers come from um and so that I guess that's Mike and then and then one for you this is is around the um the scanning activities for species so we heard a bit about the the 3d scanning for fossils but do you do 3d scanning for for species um so I don't know if Mike wants to to comment on the the GIS technology and the volunteers first right on the GIS I should say that it's not my special subject but we are heavily involved in esri products I have a feeling some of the earlier stuff was based on arc map but what I would suggest is uh if um somebody wants to put there send me an email or get an email to me I can put them in touch with the people best able to advise them as far as the volunteers we have a mix um we actually used to when we were based at the University of Edinburgh have quite a large number of volunteers which included students who were looking for work experience and retired staff members and retired other people um and we tend to put them on jobs that are effectively for the public good so they are um jobs like this that's opening up the collections for the whole community and on the website that we feel is is an appropriate use of volunteers so it's a full mix uh but it's well later you could see I think was um a student was doing it for work experience down in Keyworth some of our volunteers have actually been retired staff members great thanks Mike um and then Vince the the question about the scanning activities and whether there are 3d scanning of species going on yeah so we have um we have a lot of 3d scanning going on in all sorts of guises um the issue is it's not really high throughput um and that's really and that is unlocking an enormous potential for certain specimens but it doesn't scale um so well um one area that really is being given quite a lot of attention is um so generating the 3d models off the back of those ct scans is quite time consuming in in in multiple ways and a lot of that involves the placement of landmarks homologous landmarks over those um scan slices which then form the 3d models and that's something where we're using artificial intelligence to help um uh quite a bit so um there's a program um Angelika Swamy within the museum is applying AI to try and basically generate 3d models at scale she does some amazing phonomics work so looking at um evolution of the phenome and understanding how or how and why the the phenome has evolved over time and she generates thousands of these 3d models but again that's very much reliant on an army as students and one thing I should say we're very poor on is making access to those 3d models so we do and this is sort of a bit of a mission of mine I say a mission of mine at the moment it's been a mission of mine for a long time but I will try to crack it is we have tens of thousands of scans that are not plugged into our data portal at the moment a tiny number of those make it through to various kind of third party 3d repositories and um uh I'd like to do a much better job of essentially building our back end workflows so that a lot more of that 3d data is available um there are pockets of availability and um phenome 10k if you google that you'll see one of those pockets for example but that's really the tip of the iceberg so the issue is though it's still scale it just doesn't scale as well yeah great thanks sense um a couple of final questions one one is is is access to all of the data free or is is some of it is there charges for accessing some of that so in our case it's all free um uh we're open by default we have a series of eight what we call exception areas where we would hold data back for legitimate reasons um and um often those are to do with things like there may be sensitive information about a specimen either ethically sensitive culturally sensitive or in many cases the you know exposing the geospatial range of a species may endanger that species and in those cases we will hold that data back we do a little bit of scientific embargoing not a lot but there is a route for requesting an embargo um and you can get a one year embargo fairly easily and a three year embargo a bit harder and if you want to renew that embargo you've got to have a pretty good case um but yeah it is um for the most part all free and I don't know is that the case for bds might yes in general um certainly the stuff that's on the website is all free clearly if somebody needed a more detailed scan or photograph um if that was academic it would almost certainly be free if it was a commercial company that could well be a charge so it's this question of the difference between the sort of discovery resolution and really detailed work that may be required for a specific scientific project and there may be a charge in doing that great thank you um a final question um is would would you consider it worthwhile to digitize data only in the first instance and then add the images later so for example for fossils is there any any sort of mileage in splitting those two things out well I could say that certainly we started off um as I tried to explain we purely metadata purely the um digital data the text from the uh registers or from the labels with the specimens and in terms of the fossil material yes something like 300 400 000 specimens like that um then images followed later and we have far fewer images but I have to say I do consider the images very important and I'm well aware that there are some particularly in um some of the Baltic countries there are really well digitized collections photographed and those photographs have actually been recognized by workers elsewhere in the world and actually led to subsequent research and publications so yes text is important but images are actually really important to yeah thanks Mike okay um I'm conscious of time so I think that we've probably reached the end of the session so um a great big thank you again to um to Vincent Mike for fascinating talk and for all that useful discussion afterwards thank you very much for that