 Hello, I'm Ben Scott and I'm lead architect of the Natural History Museum's new open data portal. I'm sure everyone here knows and probably visited the Natural History Museum. It's one of London's most iconic tourist attractions built to be a cathedral to nature. However, what's often less well known about the museum is we're also a centre for research. We have over 350 scientists working in the museum, publishing over 700 scientific papers a year. Much of this research is based around our collection in which we hold over 80 million specimens and that spans 4.5 billion years from the birth of the solar system to the present day. In December 2014 we launched our open data portal to release all of our scientific data online as open data. Our primary data set is from our collection and we now have over 2.8 million specimen records on the portal. Our scientists can also upload any of their research data and have some really interesting data sets up there already. This is an insect manipulator Lego instruction kit which you can download from the portal and build your own insect Lego manipulator. We've got a 3D render of a prehistoric ray and all of our records are my sound archives so you can play the sound of species that we've collected over the years. All of that's released under an open data licence so you can do whatever you like with it. Why has the museum taken a step of adopting an open by default policy? The largest driver is a public service mandate. We are the UK's repository for natural history specimens and we have an obligation to make them available to as many people as possible. We have a truly global collection and that's from pretty much every country in the world. Within the museum we can only display a tiny fraction of our collection, just a few thousand items. Accessing our archives is only possible for those who could come and actually visit the museum. Getting them online means anyone anywhere in the world has access to our specimens and can now reuse them in their own research. The data portal also allows people to interact directly with the collection. You can suggest corrections to our data or contact the curator responsible for each specimen. We're also experimenting with 3D imaging of specimens so people will be able to anywhere in the world 3D print a copy of our specimens. What we're trying to do through the portal is open up our collection as much as possible and help support biodiversity research worldwide. Another very important reason why we're doing this is as a vast amount of scientific value locked away in our collection. When a specimen was collected, the data and location was also recorded. Different species emerge at certain times of the year depending on how mild the weather is. So our collection represents a metric of climate change going back hundreds of years. On this slide you can see the collection time of the orange tip butterfly since 1900 and how warmer and cooler springs correlate to its collection date. We have scientists at the museum analysing patterns in these data to project climate change models. Exactly the same data is now released openly for people to do the same studies. To get this data though we first need to transcribe and database our specimens and this is one of the biggest challenges for the museum. To start releasing open data we first need to transform a physical collection into a digital one. We're aiming to get 20 million specimens digitised and released over the next five years and this is a mammoth task. As you can imagine for a collection that's been built over hundreds of years by thousands of different people, the way a specimen was collected and cataloged is very different. For example one of our most important historical collections is that donated by Sir Hans Sloane. A Victorian collector would collect anything and everything and there's mode of organising that was haphazard. It has started as a hobby but grew into one of the biggest collections of its kind and is actually the basis of both the Natural History Museum and the British Museum's collections. Here you can see some of the boxes of seeds that he's in our collection and this is actually one of the more scientifically organised. Other parts of the collection are just grouped aesthetically so it will put all the red things in one cabinet, green in the other because that's what looked nice. Somewhere in curators that have been implemented as standard these often don't make our digitisation work any easier. In entomology it was common practice to affix the specimen label to the bottom of the insect's pin which makes a lot of sense. The label will never go missing from the actual specimen itself. Unfortunately it means that if we now want to digitise the records we can't see the label and so what we need to do is take every single specimen apart and that takes time. The specimens are old and fragile so we need to be careful because we have a team of digitisers at the museum whose job it is day in, day out to take the specimens apart, digitise them and only then can the data be pushed onto the data portal as open data. The museum's open data portal represents a massive behind-the-scenes investment on the part of the museum and the staff at the museum. One of the key challenges we face now is how we can increase the digitisation throughput. At a different rate we won't finish our 80 million specimens until the end of the century and we need to get more and more specimens onto the portal. But the beauty of open data is that we can now involve people outside of the museum in this process. We know our collections data is messy. It's one of the results of having such a historic collection. But early on in the project we decided if we waited to get our data into a perfect state we'd never release anything. It's better to release messy data than nothing at all and by putting that data out there we now have people using it and suggesting improvements and correcting our own data. And that's the future for the portal and open data at the museum building better interfaces to get more and more people interacting with the data. And we've already had some fantastic outputs from this. These colour swatches were created by a computer vision designer extracting colours from most species on the portal as open data. And they look lovely but we're also now experimenting with the same technique of extracting images from the colours from our specimens and using that in automatic species detection and improving our digitisation workflows. And it's where these two paths intersect rapid digitisation and citizen science public involvement in our open data that the portal is going to get really exciting. We know we have specimens in the collection that are new to science. We've got massive amounts of stuff in our basement that no one's ever looked at in years. And just at the weekend to coincide with Halloween our museum scientists published a new species of bats that had been pickled in a jar for 30 years sitting down in one of the basements. And so this is completely new species unknown to science just published. And finding a new species of mammal in the wild is an incredibly rare event. So imagine how many plants and insects we have in the museum still to be discovered. And soon members of the public will have first sight of specimens and digitise an output onto the portal. And as soon as that happens we will start having our first new species identified and named using open data. And that's going to be a fantastic legacy for the portal citizen science and open data at an institution like the museum. And I think that's the real value of open data here. The data is not an end in itself. What's fantastic is what it empowers people and a new open generation are going to be doing with it. Thank you very much.