 This is Ivan Hannigan, our friend behind the Hatches and Drought Index record that we looked at in thing 7. And Ivan, I'll hand over to you now to tell us the story behind this record. Thanks, Gerry. So the reproducibility crisis has become known. I think Roger Penn coined the phrase in 2006. And what it basically boils down to is it's incredibly difficult to achieve the seemingly trivial task of loading up some data that has been talked about in a paper and recalculating the exact results that are from that paper. It seems trivial, but it's actually very, very difficult and there's a lot of points in the road from research to data publication that stand in the way of this. It's nice to hear the feedback on the Hatches and Drought Index record being a linked record between papers, software code and data. And this is absolutely critical to the solution to the reproducibility crisis. And so what I wanted to do today was to talk about the entire pipeline from my perspective and what Penn and others have turned the reproducible research pipeline, which leads through the chain of linkages from the author to the reader and the distribution of data, code and papers. Now the interesting part of being a science student and designing your experiments and measuring data in a lab in a fairly basic educational setting turns into a bit of a bigger enterprise when you start a large research interdisciplinary research project and measured data becomes a very difficult to define beast. But you can imagine that you've got a fairly good idea about what data you want to find, you go out and get it or you create it and then you've got some electronic information. That's where coding can come in handy to turn that often messy and dirty full of errors kinds of data and often distinct across different data sets such as environmental, health, demographic and you can combine all that data into something that you can analyze if you want to correlate drought with suicide incidence rate ratio. Then you've got to at least join some climate data, some deaths data and some population and demographic data, often with spatial and temporal dimensions. Once you've done all that processing, you've got a computational process that also can create new data and you want to pay attention to analyzing that in a systematic and rigorous way and code comes in handy here and one of the things that Hutchinson data set was connected to is a published software package full of analytic code that takes the analytic data, pumps out some computational results, creates some presentation worthy output such as figures, tables, numbers and then puts it all together with some text that we wrote into the report that you can download from the journal. All of these components of the pipeline can go down to the reader through the distribution channel which is what data publication and all the associated activities of the reproducible research pipeline are designed to do. Now this solves a big problem called the reproducibility crisis but there's an even bigger problem where you might have heard that some quite a lot of papers cannot be replicated in new study environments. Now the adherence to the reproducibility pipeline framework goes some way to solving this more serious scientific problem by allowing readers who may want to replicate a study to develop an understanding of analytical methods, test new things to do with their code and different ideas whilst benchmarking against published, validated computational results. That leaves the pipeline open to be extended with new measured data in new populations with new areas and potentially new methods and insights so that the findings of the original papers can be replicated and in this way we can winnow out the wheat from the chaff in our hypothesizing and design new experiments that more quickly give us scientific progress. So that's the overview of the pipeline and it is how I've tried to operate in my more recent papers where the process of finding and getting the data is all systematically organized with the data management plan. I can get the data through the licensing and I understand all the authorizations, ethics. I have to put the data somewhere and that's quite challenging it gets quite big. You have to back it up. Then you start doing stuff with data. There are multiple versions to control and different types of measured data is not often something that the downstream reader will be able to use or probably wants to but the analytical data is. The reader will want to find out more about what has been done with the data to arrive at the analytic data so scripts can be reviewed and then you've got this sharing of data process with the distribution channel, the digital object identifiers and other links and licensing all being held down there in the bottom of the pipeline. So that kind of brings me to the end of the description of the pipeline but obviously in a real world environment it's never as simple as that. This I hope you can see that I use flowchart diagramming software to track the multiple steps in protracted pipelines where data is fed in from the top and keeps going down to the bottom. Actually it goes off the screen. This is about twice as big but I could only show you the top half with any legibility and the final comment that I'd like to make comes from a famous quote that data analysis are like sausages it's really better not to see them being made. I've extended that to say that even so you do want to know what goes into them is high quality ingredients. Thank you.