 Good afternoon and welcome everyone to this webinar, Costs and Benefits of Data Provision in the Room with me. Looking at the slide is Professor John Houghton to my left and is Adrian Burton who's the Director of Services and probably just off-camera Susanna Sabine. We also need to acknowledge Nicholas Gruen who's a co-author of the report, I should say, but who's not here today. So I will hand over to Adrian to describe very briefly what Anne's does and why we're interested in this. So Dr Burton, Costs and Benefits of Data Provision, why would Anne's be interested in such a thing? Well Anne's Australian National Data Service for any newcomers to the webinar today is an infrastructure program within Australia that in fact its overall goal is for there to be more valuable research data for researchers. So really that is in fact our commission that we want research data, the outputs of research and the inputs of research that are for data products for them to be more valuable for researchers in Australia and of course not just for researchers but for education industry, public policy, general citizens in Australia to have access to that data. Our focus of course is to make better research so we're really framing our questions about the value of data in its value to have better research. Quite often we go back a little bit, perhaps because of the information revolution and other sharing practices it has become more possible to share data, reuse data and have it as a valuable output of research. Perhaps if we wind back a bit the general attitude towards data was that it's a waste product or a byproduct of the research industry like carbon dioxide or other just pollutants that just spill onto the floor and once you've done the project then it's just one of the things you throw away, it's a waste a byproduct if you like. But now with the sharing systems and the fact that a lot of the data is digital and there's an absolutely amazing global network of information sharing the data itself is now being recognised as a valuable product. So what is the difference between a byproduct and a major product? It's where you see the value and in one sense perhaps what we're doing now is reconfiguring our research systems to take into account this product of research and say well what is its value and how can we make it more valuable again for better research and for the broader society. So that's why we are interested because actually it's the actually behind the whole idea behind the Australian National Data Service, everything we do around research data management, infrastructure, policy, citation of data, everything is so that the output of research and the input to research is more valuable. So it was in that kind of a context if you like that we've worked previously with John and we thought well okay what about that question how do we how do you measure the potential for value in data? Okay so there are actually two reports and the first report will deal with first up and the second report is written by John Houghton and Nicholas Gruen and I did not mention that Nicholas Gruen was also the author of the Gov2 taskforce report and he is from lateral economics. So John I know you just can't wait to get stuck into this could you take us through that first report? Yes sure I think it was around 2011 that we actually did the study focusing on government information, public sector information, PSI and we did case studies of the Australian Bureau of Statistics and Geosciences Australia and hydrological data using with both the National Water Commission and the Bureau of Meteorology. It was an interesting it was I think it's unique I've never seen another study that actually does a measures the costs and the impacts before and after it was a unique opportunity because the ABS in particular had just adopted open access and then a year later CC licensing so it was it's it most studies try to estimate the future benefit of open access but this was actually a study in which we had the before and after so it was it was a really good opportunity. Having said that I think we don't have to say that measuring these sorts of things isn't easy and we there were limitations to the data that we had available to do the study so in a way it was really only the Bureau of Statistics case that worked well the other two were much more limited. We used sort of three elements to it we looked at the activity costs and cost savings of the agency focusing on the ABS since that's the one that worked best and we looked at what sort of new activities were being done and what activities weren't being done any longer as they adopted open access for example the ABS used to have quite a lot of shop fronts where you could go in and buy reports and so on and they greatly reduced the number of shop fronts so that you know these kind of savings were available for agencies like that. We focused on the agencies and we didn't do a survey of users but so there was an assumption in that we assumed that the user costs activity costs were kind of the mirror image of the agency activity costs for example one of the activities that the ABS and the other agencies were doing was answering the phone and answering queries about licensing conditions what they couldn't do with the data. Now obviously there's two people on a phone call so the assumption that the amount that it cost ABS to run that service was also being spent by the user on the other end of the phone so we made that assumption it was a sort of a simplifying way to do the study. In the case of the ABS we basically found that they ended up losing revenue of course from not selling data but they also made some savings that we could quantify and they had quantified. There were also savings in the sense that we didn't quantify which was in things like if you stop doing the shop front then everybody in the organization can concentrate on the real issue of gathering and producing statistics so it's like distracting activities that they had to manage they could focus on core business a bit more so there were savings there. On the other side we also tried to look at the wider impacts and we used both a welfare approach and a return on investment in data approach. Both of those require some measure of the impact of open access basically the amount of extra use and that was where the study actually turned out to be much more difficult than I would have envisaged. What we tried to do was to use download website statistics download data from each of the agencies from before and after but of course there's a whole range of reasons why downloads change go up basically over time. There was a general trend in the mid late 2000s of people to download things more than they did the year before I mean it's still the case so there was obviously if you have more available online there's going to be more downloads so you have to look at both the extension and the intensification of use and this proved to be actually extremely difficult but we did manage to sort of tease out both the extension and intensification of use a little bit enough to make some estimates and I think basically the bottom line for the ABS was that the overall cost for both the agencies and the users of doing it was about five million a year circa 2006 2007 and the total benefits the savings plus the increased returns was about 25 million a year so the benefits were about five times the costs in that case. Should we bring that graphic up from the year? Yes John before you go into this gorgeous equation could you explain that term well there because it has other meanings doesn't it? Yes it does but I think it probably means the same in economic terms broadly speaking it's an approach that basically what we were trying to do was estimate the change in consumer surplus so consumer surplus is the difference between what people would have been willing to pay and what they did pay so if you don't have to pay as much as you were willing to then you're getting a benefit of consumer surplus and the various producer consumer surplus had to social welfare so obviously if you stop selling something and give it away then there's a lot of people who were paying for the data that don't have to so that is a consumer surplus straight away there's also the issue that more it's going to be used more and so there'll be more users who weren't willing to pay the old price but they're willing to pay zero so then you there's a sort of a standard way of calculating the increase in consumer surplus that is the welfare part of the calculation that that was an economic explanation probably didn't help at all. I just loosely described as new users. Glad that was the one and one here. No that graphic there that's the one that ends up in five to one doesn't Tom? In the case of ABS yes in the case of the Geoscience Australia it was much higher I think around 13 to one but for the particular type of data that they were using but I think it's really important point to note that that doesn't reflect on the agency that you know the agency that gets 13 times the benefit isn't better as an agency than the one that gets five times the benefit it's due to the completely different sorts of data and types of uses of data geospatial data is highly valued off and most people use GPS. I use GPS to get here as it happens so everybody knows that it's valuable data and high so it's important to realize that it's the cost and benefit relates to the data not the performance of the agency. Who uses this kind of data John? I'll just think that the first of the two studies was basically public sector. Who uses that data? Everybody really in various ways one of the well in terms of national statistics clearly anybody who does anything to do with the economy or policy users national statistics anybody who is in government in a sort of a non-government lobbying organization or an industry of big users of various sorts of statistics. You also get very in case of water data we looked at Victorian example and one of the big users was schools education and that's common in a number of other studies that I've done with research data and with public sector information that students react a lot more and a lot more engaged if the data is actually real you know it's a it's a depth measurement from a river that they know exists it's not a text book example it's just purely hypothetical and it really helps students engage even when they know it's real data you know much more than they do with abstract hypotheticals. So in this case here the the the cost benefit or benefit cost I should say ratio sort of varies from sort of five to twenty I think a previous study by Aesol Tasman put geospatial data at over 20 so we know that data is being used by government it's being used by research input to research as well and we'll talk later on about business and industry but am I right John that Nicholas Gruen or you and Nicholas extrapolated this in another study instead of just the ABS to the whole of the government. We did the the the first study that we're talking about here is was case study so there was a wonderful Dilbert cartoon that I saw once that said Dilbert went into his box with a pointed hair and said my PowerPoint has everything I've got data for the real people and case studies for the idiots but you can't really case studies are really interesting I mean they speak to what's going on a major point of doing the first report was to try to come up with the methodology that the departments and agencies could use themselves to make the case for open access and to help them see the benefits to them of open access but you know you any case study is just that you can't multiply case studies and get the macro feature so in a study that Nick and I did for the immediate network a year or so ago we did a macro sort of estimate of the value of public sector information and research data combined and we got an answer of around 17 billion dollars a year in total current. So the first study basically showed that the the case for making public sector data freely available sort of overwhelmingly positive we have a figure very very huge number here representing the annual value of that data well I think I think I will mention that our statistics show that that first study had I think in one year over 4,000 downloads around the world so I think you'll see in government state territory governments and common governments all the government departments have engaged to some extent in providing access to their data sets freely available licensed and we'll talk a bit about licensing that towards the end so John can we move you on now to the second report the recent one which is I suppose what most of the people are signed up for to find out about the open research data the Houghton Gruen report I'd like to take us through that one it's a very different approach it's not a study approach it's a sort of a macro approach there are two main elements to it and I'm hopefully going to state the really obvious the first question was to try to measure the value of data in public research at the moment and to state the obvious that's something that exists so we could measure it the second part of the study was to try to estimate the potential upside value of curating and sharing data from public funded research and that doesn't exist yet so we had to estimate it so the report is very much in two completely different sort of approaches what we were trying to do was to measure measure value and measure the potential upside value of curation so the focus of the study was basically government funded research and there's two ways you can look at that that is funding by sector of funder commonwealth government spending on research which is about nine billion a year but you can also look at it as a sort of a policy level who's going to make policy who's going to follow the policy in which case it probably makes more sense to look at sector of execution so government as a research organization the commonwealth CSIRO and so on and higher education for which you can make public policy so we looked at that as that research at that level which is about 13 billion a year so that gave us a kind of a range estimate of the sorts of things we were dealing with the first thing we did we did two approaches to measuring the value of data in public research now I'm sort of carefully saying those words because what we can measure is people's use and the activity of using data but of course research is a global activity not all the data used by Australian researchers is Australian quite a lot of it isn't so we're not talking about the value of of data produced in public research we're talking about the value of the activity of using data in public research I'm not sure whether I'm making that clear but it is an important distinction the first thing we did was a really simple approach it's probably the most basic sort of costing approach you can use in economics which is use value which is simply measures the time and other costs involved in creating manipulating and analyzing data and the second approach we used was to try to look at a return on investment in the amount of money spent on the activity at average returns to our only for both of those for the whole report in fact we based we in a sense didn't do any original research we based on a series of studies that I've been doing and I'm doing in the UK with Neil Begrie of Charles Begrie and we've been doing studies of research data centers we did a study first of the economic and social data service and one of the archaeology data service one of the British atmospheric data center and we're currently doing a survey at the moment for a study at the European bioathematics institute so based on so what we're doing is to use the activities that we know about from the users of data centers in the UK and say that if that was happening in Australia what would that be worth so basically we found that in the UK studies the survey respondents of which there were many thousands by the way reported that they and their in their opinion others in their field spent between 35 to about 60 percent of their research time creating analyzing manipulating data 35 30 percent if you're an archaeologist 60 percent if you're an atmospheric physicist I mean that's not surprising so broadly I mean just as a simple mean we took about say that about 45 percent of research time of typical researchers is across disciplines is spent with data so that's worth anyway between two and six billion dollars a year in Australian public research that was the first thing we did the use value the second thing was to look at a returns to R&D and it's a there's a complex sort of model to do that because the returns a crew over 20 years and you express the value in net present value from one use expenditure but anyway coincidentally because it's a totally different mesh method the answer was the same it was between two and six million six billion sorry be a year in net present value so that was our estimate of the value of data it's a very broad range because there's two ways in which you can do it and Nick and I I think spent two very long fruitless evenings arguing to and fro which we should do so we ended up doing both because we couldn't decide so that explains why there's a range we also felt that to be honest you know we can't if we gave a pinpoint number 5.9 a we don't know how accurate it is really and so a broad range I think is more honest but I would say that I'd expect the answer to actually be closer to the top end of that range closer to the six billion than two billion that's kind of an opinion rather than a calculation so what I've been talking about was the value of data which is the left hand column and that's what it says 1.9 I said two to six billion but that that describes what we were using research activity times in the UK and the use value and the return on investment calculation so now moving over to the other side with the repositories heading and they are completely separate as I say one is existing we measured it and the other doesn't exist so we estimated it moving over to that side we try to look try to estimate the potential upside value of the repositories so basically the calculation is saying if all of the researchers publicly funded in Australia and realize the same benefits as the regular users of the UK data centers then this would be the potential upside value now of course it could be a lot more than that if we in Australia did better than the UK managing and making data available it could be quite a lot more you know there's no way to know we have is just a way of getting a ballpark estimate the two big things that all of the surveys in the UK that we've done bearing in mind that the users of UK research data centers are not necessarily in the UK and the things that they all report is the number one impact of using data centers is the efficiency that they save a lot of time creating data and so forth and the second one is there's obviously additional use by reuse simply called reuse by people who couldn't either create the data themselves or obtain it anywhere else so that is pure additional new use and we can calculate the value of that additional use so the elements to this calculation were the times saved and of course when a researcher saves time they don't think okay well i'll finish at 313 go home they do more research so the time saving is just the first step the second step is well if you use that time to do more research then that extra research is also going to have a return over time so there's two elements there and the other one was the average return to the pure new use so we estimated the sum of those impacts to be 1.8 to 5.5 billion a year so the right hand column we in in in essence have no idea how much of that is still available already being done but we as a kind of a scenario estimate we simply said that maybe 10 to 20 percent of the data we're producing is currently being curated and openly available most probably generous and so unrealized 80 to 90 percent so the unrealized upside as we say 1.4 to 4.9 billion a year is what's available to us and there's a few issues well there's a lot of issues obviously around those things we we wanted to look at the value of national collections and there's obviously some specialist data centers around in Australia and we didn't do any kind of estimate but i think it's fairly it's just an opinion that given a pretty unique sort of climate foreigner flora and so forth national data in Australia is probably worth more than national data about some things is worth in a European country but that's just an opinion and we didn't sort of do an estimate of that the other thing we did which we kind of isn't on this chart we tried to think about well what would be the cost we're saying the upside is maybe up to 5 billion what's the cost and of course we're measuring we're trying to estimate something we haven't done yet so you can't really cost it and it depends how you do it clearly if you curate data very thoroughly very well it can cost you almost anything it could be very expensive if you do it badly it doesn't cost you very much but then you probably won't realize the 5 billion potential benefits the one the probably the best we we went round a number of circles with it but probably the best estimate just as a ballpark again was again to do with the uk research data centers which have been historically at least it's changing a bit now but historically the sort of subject data centers were funded by the relevant research council so the economic social research council funded the economic social data service and so on there was a government report in the uk that said that across the disciplines there was a remarkably consistent expenditure on those subject repositories data repositories which was about 1.4 to 1.5 percent total funding so if we go back to our original what's public funding 9 to 13 billion 1.4 to 1.5 percent which suggests 150 to 200 million a year would be the cost of sort of quality duration so clearly 200 million as a cost and 5 billion as a potential benefit is pretty much in no grain at territory yes well that's just really representing the the current estimate and the available outside of the 1.4 to 4.9 the straight line is a straight line because it looks funny to just have two points but even an economist wouldn't say there was a trend between two data points I'm not sure why we drew the line there the graphic yes so that that just sort of summarizes what we what we think we're doing now and where we think we could get but in that fear there's there is more than just that isn't there because in the top right hand corner it's I'm not sure if everyone can read that it says all data using data infrastructure um John talk a bit a little bit about data infrastructure versus repositories because there is a bit of a difference and we want to make sure people understand you've mentioned the cost of running these centers it's fairly small in relation to the possible return so let's just get a discussion going on what on the bottom left is an individual at the desk and on the bottom right we have full use of data infrastructure what does that actually mean what's the message there and clearly the infrastructure is both hard and soft certainly from the surveys we've done elsewhere the you know the guidelines the the standards and all these sorts of things are highly valued and highly important it's certainly not about IT and networks obviously that's essential but the the soft infrastructure is is vital and that comes across very much from the studies we've done all the time that you know people use the guidelines and go to a data center because of those kinds of facilities I'm actually a pretty regular user but the economic and social data service in the UK because they have some wonderful simple methodological guidelines available on their website Freelift which has nothing to do with the data as such so that's a really important very important aspect in terms of the infrastructure if that's what you had in mind but it is and there's two sizes there we're talking infrastructure and policy so what does the policy mean in this context well in the in the second report we the the made some policy sort of what's the word they certainly weren't recommendations probably mumbling observation observation yes okay observation that's one of the points was the one we were just saying both heart and soft infrastructure and the soft infrastructure is very important a starting point in the sense policy point of view is mandates from government from funding agencies from institutions and one thing that's come up in other studies I recently did a study for the Canadian Research Councils with a couple of people elsewhere which was sort of a backgrounder to their recently announced tri agency open access policy but one of the things that came up in that really strongly is the importance of harmonization of policies research is a global activity you know you can you don't just have one funder one institution you you're collaborating across institutions across countries across funders and if they all have different policies mandates different sort of things that you have to do to comply it's a nightmare it the cost of compliance is high and people just don't bother to do it unless they're really chased so I think it's really important when we think about mandates to think about harmonizing what we're expecting and asking worldwide as much as possible I know that's a simple thing you know it's something to bear in mind and that's all part of this going from you know from the bottom star to the top star includes a coverage of all of you know the all data that should be shared and reused being able to be yes part of that is the policy that promotes that in indeed yeah yeah and you've hit on another thing there in terms of policy obviously there's constraints about open there's not everything to be open there's privacy concerns there's you know security commercial and confidence concerns not everything can be open and it's and it's vital to sort of sort that out and make it really clear up front one trend and I'm careful not to criticize anybody but one trend and well okay I'll criticize my own university Victoria University when and I think it's something we've got to think about in terms of policy so one to give me if I'm drifting when we want to do a survey we get human research ethics clearance application and we specify what data we're going to collect who we're going to ask what we're going to do with it basically we ask permission of the subjects to use the data for the purpose that we define now that's fine if the data is mine and I'm going to use it but if I'm going to make it open I've got no idea what people are going to use it for in three years time so I can't possibly seek permission for that use so I think most institutions have got to rethink the sort of research ethics process and that permissions process it can't be about permissions it's got to be about you know protecting the subjects or whatever it is from you know foreseeable harm confidentiality privacy those sorts of things it cannot be about permission otherwise if it's never going to work and you know I think I think that's something we need to rethink policy wise and that's another example of the harmonization of these different you know ethics policy and or the ethics framework and the funding framework and other funders and not making it hard for the researchers to have you know by having competing requirements on you know at the one hand to make things available on the other hand to destroy them as soon as the project is finished or yes yes exactly yeah yes which is still you know the case with our ethics proven you know it's people for five years how are you going to destroy it is the question not how are you going to make it openly available further but so we those things so these are things that are included in when we say by through policy and infrastructure you know so a more coherent policy framework in the broadest possible sense of absolute ethics and commercialization and research funding policy to bring us up to a broader coverage yes yes and speaking and since Nicholas Grown isn't here I'll sort of speak on next behalf one of the things that he's very keen on policy wise and he runs an independent firm so it's perhaps not surprising but there's a trade-off in all of these things in policy between a kind of a top-down approach and a bottom-up approach quite often there's a tendency to have a bit too much top-down but you know that that presupposes that the person designing the top-down policy knows what's going to happen in three years time and can predict how best to deal with it which is often not the best whereas if you you know if you leave people to work out how to do it for themselves you can get more innovative solutions I mean to give so I think that the mandates and the policies in a sense need to be about setting a vision setting a name they need to be about guidelines they don't need to be about instructions I think and perhaps as a concrete example I might use my own university again and criticize it you know Victoria University has an open access policy publications and I sort of facetiously call it the film of repository policy it's not actually an open access policy it defines the expectation of open access but it says everything must be on the repository the institutional repository well that's that's an instruction the open access policy is a guideline and expectation I might from you know bottom up might prefer to do it indeed do prefer to do it differently use SSRN repake or other you know subject repositories not institutional repositories so I think in all of these things when we're doing mandates we should stop at the the guidelines and the expectation and leave the implement you know all of the implementation but how we actually achieve it for more you know innovation from the bottom up a repository policy like that rather presupposes the green open access is going to be the solution but I'm not sure that's the case and I don't think it is the case so you know I think it's it's kind of foreclosing innovation that we could have and may actually be quite negative and next very big on that point could I ask John before we sort of formally ask audience for questions what's your current read on the policy for publicly funded data I'm just dissolving the barrier between public sector data and research data what's your read are the guidelines clear enough are they out there you haven't mentioned licensing yet but we might get to that because that's one of the things you need to do in order to allow people to use that no point putting a repository if it's not licensed so yes I'm not sure I know you to clearly know much more about it than I do in terms of those those issues we have an interesting questioner but you know in my sense is the the there's quite a long way to go both in terms of the guidelines and doing it this is a question many researchers claim that there is no second use value for their research are they wrong for their data well if people are using data from a data center that they could not have created and could not have obtained anywhere else then they must be wrong so someone is using it for something that they didn't foresee or collect it for so yes I think they are wrong I think our ability as researchers to imagine what we can do with data is good but it's not good enough to imagine what everybody else would think of doing with it and I think certainly think if you look at some of even the sort of the mashups that you get with public sector information what people do with it is something you would never foresee in terms of ads and those sorts of things and I think it's true of research data too perhaps whilst waiting for the next question I could clarify I might even ask Adrian to clarify in order to get onto the top star that to realize that five five and a half billion that is annually it's not just enough to put your data in a repository what else is required what what what does that really mean which reading that access increasing value of data as data access increases through policy infrastructure what else so what's something that you're doing the first thing to note is that you know we're not at zero here the bottom star is not at zero zero so there is a there is activity going on yeah we have infrastructure and some policy that is pushing us at the moment for an example the whole increased program obviously not all of it is to do with managing and curating data some of it's some of that investment is to do with generating it but as we all know the way you generate you know does make a big difference to reuse later so anyway if we take the for example the increased investment portion that is going into creating you know things like iMOS etc that that does have a management and curation and access and discovery role for marine data in Australia so that's why you know for obviously you know in and particularly in in those instances where we're way off being at zero so i think you know there's a part of the message here is that those that where we have data infrastructure that should be continued and and also maintained and has been working across all the increased facilities but also with each of the research organizations so that the organizations have a have a role in this and yeah there's been some great uptake with you know pretty much every research organization in Australia what we need to go further there is making these things so some of these things are the right thing to do but you know not always easy so the tools and the support services and the promotion that makes it makes managing curating publishing data you know as easy as it can be is part of that infrastructure and that includes you know the kind of information that's required for reuse is a big question behind this is that you know some I think behind Ross's question is well it may not be totally useful for secondary use if you know we don't have the calibrations or the methodology or the or it hasn't been collected using the community standard etc so those kind of things that you mentioned are part of this you know making data you know more valuable and you know when we say it is valuable if we say that later in some kind of a data services value valuable why is it well because it's so much easier to use you know because it's it's there it's been managed it's been documented it's accessible so all those qualities are the ones that we're trying to bring into you know more as a business as usual for data a part of this again is the incentive question so being able to track these re-users is an important part of the frame if you like to consider around this that says that yes there's an impact from the reuse of this data and that's somehow measurable and is a good reflection back on the original researchers on the research organization and the data archive all three of them you need to have some way of getting a pat on the back to say that you know we've had a part in this and so that's where we're getting to these things about your data citation and metrics for you know use and reuse you know they're an important part of this this this mix again licensing as you talked about so yeah they're some of the ingredients licensing what particularly in the Australian context if you're silent and assuming there is some intellectual property in the data that you've created if you're silent then silence means no consent to reuse in in the Australian context so really you have to be I think everyone needs to be aware that if part of this whole new world is that we're assuming that the data is not a waste product but it actually is something that needs to be reused then you need to say that somehow it needs to those terms and conditions need to be made explicit that yes this can be reused in the most open way possible so that's the idea of putting the license there's a very simple you know creative commons by framework which is a good place to start you know why wouldn't you know you default there and say are there any reasons not to use that as a starting point and that means that there's clarity from your point of view if there's any mistakes in the data there's you know the indemnity is already baked in and there's clarity from the user's point of view yes I can freely create new innovative products in research and industry and education that are not not somehow clouded with uncertainty as to whether they can be reused. We have some questions coming in but we need to I'd like to mention that we actually have a fully operational licensing system in this country called OSCOLE. So AUSGOAL check it out and it has a license checker so this is the point that Adrian is making so well there if you don't license the data if you leave it unlicensed effectively in law no one can use it all rights reserved so let's go straight to the question so well I really like the idea of recycling it puts me in mind of you know waste product and recycling it's like being an organ donor I'm sort of saying you can use I leave my data to science it's kind of a nice idea well and it's all part of the return on industries you know let's say manufacturing industry can make themselves more productive and more efficient by reusing some of the uh what were previously called waste products and same thing here in research right we've got some questions should I just read them out so everyone knows what we're talking about here um what is the optimum investment so that data is efficiently reusable uh don't look at me we are actually going to look at you well I don't know I mean um it's one of those things that I we don't know I mean we don't actually know what's the optimum investment in research um even though we have tables of performance that say that we invest a lower proportion of GDP than Finland does it's actually a very controversial point what is too much um maybe there is a point at which it's too much um I've only ever seen one study that tried to do that calculation so I mean what's the optimum investment I don't I don't think we've got any idea but I think it's probably more than we're currently doing that question could be looked at another way which is going back to an earlier point you made the investment in data infrastructure over divided by the value of that data is a small number next question what quality initiatives are planned to reward researchers for good data management and help others identify quality data available for reuse what quality initiatives are planned to reward researchers for the data management it's certainly one thing I'm not answering the question but it's it's one thing that comes out very strongly from the UK data centre stuff is that one of the qualitative sort of questions and answers that we're getting is that that knowing the source and understanding the quality and the processing of the data that's there is what they value what the users value pretty much about everything else so those two questions perhaps relate to each other the optimum investment and the quality so a quality initiative I'm not aware of anything formal in that area so a quality initiative what I'm imagining you're talking about here is you know this is five star data because you know it's accessible easily reusable well documented openly licensed you know so there might be some kind of a standard around you know the quality of good data and as you say if you're a data had that five those five stars then you know that would be a reward for that good data management and would allow other people to recognise that now there are some very formal that the very most formal level in Australia the ABS does have a very rigid rigid rigorous got the first three letters right a very rigorous framework for data quality particularly if those that data is going to be used in a public policy decision by co-ed or something like that then there is a quite a a formal documentation of data quality that is available in our data site on our website as well as on the ABS website that's probably at one end of a spectrum of very formal quality quality assurance it's probably something we could do at the more informal research collaboration level I'm just wondering do either you want to comment on what might be the answer to that question if if it were the case that research funders counted data as research outputs rather than buy products if my understanding is currently Australian funders do not do that yes and that's where to be a bit facetious doing some kind of a mandate that in may increase the quantity of data but not necessarily good quality data so it's a good you know in your point about guidelines and and objectives at that funding level it might be good to have a five star system that says these are the kind of things that you know are included in a good quality research built on the process that we're into creating the data room we've got two questions working on the new osgoal licensed chooser now and it will include questions about sensitive data and research data generally I think we can say that comment that is a as a comment and say that that's Biden from osgoal is working on that and our final question for the day what investment needs to be made in the education aspect of open data in the research sector is there any specific amount currently invested on education or should there be a policy on investment in the educational aspect of promoting the open data and its usefulness just from you know what experience and what you hear from various surveys and interviews I think the getting particularly PhD students into the habit of looking for data and established data centers is actually a really important thing because you know it's a the process of research is evolving how we do research is quite different now to what it was I'm going to say a long time ago when I did my PhD it's very different very different and so I think we need to do we do need to think about how we encourage students PhD students research students in particular to go down that sort of built into expectations of supervision and you mentioned that as part of the infrastructure under our definition here is includes hard and soft and that the education is a really critical aspect to plan yeah well John and Nicholas who's not here thank you for that thank you as always if any follow-up questions or concrete please contact John directly at the Victoria Centre for Strategic Economic no Victoria Institute is not the Strategic Economic Studies or me Greg Loughran here at ANZ and just noting in the final slide that we are a part of this INCRITS initiative in Australia which is all about this kind of thing show where we can find the ones on the website get to all of these reports from the quick links on our website so you've got the cost the original one that John talked about the cost and benefits of data provision is there equation is there and some sure where it's talking about and then from the quick links again the latest one the open data report is available there so very easy to find from any page on the ANZ website and below the screen there is that figure again so thank you John, Adrian, Nicholas, Susanna and everyone who logged in today