 Thanks for the opportunity to speak about research data. From an ANZ perspective we see research data as actually a partnership between researchers, research institutions and national data services. I'd like to spend a bit of time talking to you about that from my perspective. So I want to cover all main points. First of all, I want to provide a bit of a background to Australia's investments in research data infrastructure. I'd like to talk about the role of research institutions in research data infrastructure. The value of seeing a collections view and then to wrap up something about why these investments are in place in terms of the advantages that Australia can approve. So let me start off with a little bit of background. ANZ was created out of a series of investments starting out with the interest investments, which were the National Collaborative Research Infrastructure Scheme, which laid out a mechanism for providing Australia's researchers with world's best research infrastructure. That is both collaborative and national in nature. Part of the approach that there was to recognize that the research, i.e. data and tools and computer networks are important elements to that agenda. So that was established and developed. It was developed under the Liberal government of the time and got underway. ANZ was established in that scheme. And then a little after that, the Powering Ideas report came out under the Labor government, indicating I think that the support for this work in establishing national research infrastructure is bipartisan in nature. So the Labor government expressed a view that there needed to be more work done on an innovation agenda for the 21st century. So this said that you need to do more than is currently the case. The infrastructure needs to work beyond the disciplinary boundaries and did need large-scale ICT investment in that space. This report was backed up by a further investigation that led to the report Ventures Australia that talked about building a scientific culture where information, as much as possible, has moved out of people's heads and onto the network into the tools. So that idea of moving data out of the lab into a shared environment where the data itself was seen as research infrastructure is crucially in this space. This led to the Major Super Science Initiative in the 2009 Budget where $1.1 billion boost to critical areas of scientific endeavor. It's really important to note in this space that along with those big cutting-edge investments in some particular areas, there was a very substantial investment made into the research. So what's the research? It's saying that research needs improved data, better computation, better tools, and better collaboration. And that means you need to have the networks authentication tools to support that work. So that led to this picture of investment over a period of time ranging from 2002 up to the current time. You can see that Ann's kicked in into this environment in the data investments in the light blue around about the end of 2009. And there's been a substantial increase in investment in that space for a little while now. And we're seeing investment in data storage kicking in around about now through the RDS high speed. You can see though that there's a substantial investment right now. But I think the more important thing to see here is that the fact is that the investment is being made across each of these areas. Recently I attended the International Conference on Research, Infrastructure and Corporate Argon, along with an Australian delegation. One of the delegation members is a climate scientist who was looking at, well, how's the research infrastructure being put in place in Australia compared to what was happening elsewhere in the world? And he made the observation that around the world there is lots of work putting in place infrastructure for climate science and climate adaptation investigations. But in Australia uniquely the investment is across the board. It is the case that investment has been put in place around data tools, high performance computing and collaboration. So his view was that these investments are starting to establish a really strong research advantage in the country. And that's quite important. Let's have a look at now why this investment is being taken place. First of all, research is changing. And that doesn't mean things are just going to happen wherever. There is a real value in having a national strategy. And the value of doing that is that it enables us to tackle as a nation problems that are pressing to the nation. So let's have a look at a couple of examples. What data environment is needed to understand where and how to build in bushfire prone areas? I come from a story where the devastation of those fires that took place a few years ago was extreme. And we don't know where it's safe to build anymore. How do you in fact enable research to make policy on this problem? What about understanding the largest living thing in Australia, the Great Barrier Roof? In the news just recently there was discussion around how we balance the use of the Great Barrier Roof as a national treasure versus the work that occurs near the Great Barrier Roof that is developing particularly the economy of points land. How do you get the right balance in that space? What about the effective use of Australia's soil? Australia's soil is fairly ordinary. So we saw early on in European settlement of Australia just how badly that soil could be used that led to irreparable damage. So we now know that Australia's soil is quite fragile. How do you use that? What data environment is needed? Because you'll need to look at that from a remote sense perspective. Looking at the whole of the country you need to look at it from a point of view of micro-analysis of individual soil samples, flux towers that observe the environment in which the soil is existent for important accounts but also the genetic sequencing of all that data. Now typically when you have those data environments, each of those are created individually around a particular research question that's being investigated. The notion here is that you want to have a data environment that enables you to explore broad columns in this space. So we can see the national news there. There's also a need from a research perspective. Research data intensity is increasing. Now I really mean by that not just the volume of data available. The investigations over that data is changing. The data itself is changing in three important ways. First of all, the volume is increasing. Secondly, the complexity of the data is increasing. In the example I just gave of the soil, you can see that it's not actually possible to investigate this unless you've got a quite complex data environment. A third increasingly challenging here is that the data volume is going up. I mean by that we're going to have sensor networks that are detecting our environment but we're not actually going to be able to capture that. Consider the sociologists that are looking at Twitter streams or looking at Facebook interactions. That data is not in fact able to be stored. So how do we use that data in these sorts of environments? Next, the problems are at the largest scale. Just given you three examples of the fact that we can't simply go down the traditional path in many cases of simply saying, well what's the next research question I and individual research would like to explore? If you're going for a multi-million dollar research grant this would be typically done with many people and at a problem scale that means that simply entering a research question isn't the right granularity anymore. So if we're going to build research data environments around that it's pretty hard to imagine creating research data environments for every possible problem. And if you make them for some, how do you translate them to others? And can you do it at the research group level alone? We think not. We think that the scale of data that's needed to investigate many of these questions really is beyond the resources of individual research groups. So the question of who else is going to play and how that play comes into question. We have a strong view that an institutional approach is needed to data because it will lead to greater research efficiency but we'll see there's a number of other gains available as well. But even there that's not going to be sufficient if you want to deal with the data that's coming off its three kilometer array. If you want to deal with the data that's being generated by geneticists around the world and making that available to researchers in the best research groups in Australia. How do you do that? So there's a need for a national approach to data as well to tackle larger problems, to build coherency, efficiency and synergy so that the institutional approaches are working in concert to give Australia a research data infrastructure advantage. So this is going to require new partnerships between researchers, institutions, research service providers, disciplinary initiatives, national initiatives and international initiatives. There is no simple answer to this. It's actually quite complex to create an environment that provides this sort of advantage and yet we're on the way. So the Ann's argument is that richer data enables better research. If you improve data management you can support research integrity. If you improve your data capture by capturing the relevant metadata at the time of the data capture that enables discovery you can improve your research efficiency. If you've got improved research data availability out of the results of the research in a form that's used by others that supports evidence-based policy. So the way Ann's has approached this problem is to take a collections approach and I'll get to what I mean by that in a moment. All we're trying to do is move away from an individual table listed in a vast catalogue of data tables because we feel that that would be unlikely to deliver the infrastructure that's necessary in this space. So in order for us to do that we need to work with the other large data infrastructure investments. Things like the Atlas of Living Australia which is trying to provide a comprehensive view of all living organisms in Australia. The Integrated Marine Observation Network which has lifted the ability to observe our oceans very substantially but it's not only said that we need to observe more it said we need to organise those observations and make them available and build a special portal that enables one to access that data. This train and research information network is another interesting data infrastructure investment because that's saying that people investigating how cities work effectively need to have access to data that is simply not the data that's generated by the researchers themselves although that's important. But much of the data that they need will be coming out of other entities in particular state government entities are often the sources of major information that will support there. So how do you in fact bring all of that data together? We're arguing that a locus of it particularly around research data management is added to research institutions because research groups come and go. They don't live as long as perhaps is available for the data to live. So how in fact does that data look after and care for? So why might an institution care about data? Well first of all research has become more data intensive so the needs around data management have gone up. So this simply needs to be an institutional capability to deal with that. Next, data is increasingly a research output rather than a research by-product or a research input. The data itself can make a difference to the research reputation of the research group in question. So you need to have data infrastructure that enables the data to be managed as an output of research rather than something that simply lies on individual laptops and local computers. Next we've seen that the excellence in research is correlated with size of FF and data outputs and so it's important to think about the preservation of that data so that the data that is created as a result of the research is able to be cited which enables the research group to lift its profile very substantially and of course not only lift its citation profile but lift its ability to interact with others over the outcomes of that research. Next, institutions have signed up for the Code for Responsible Conduct of Research. So a possible approach to that is to do so individually, working out what will each institution do but there's a real value in taking a collective approach to this. It's not an area of competitive advantage, it's simply saying that we have a responsibility for the management and care of research data. Perhaps most importantly though, it's a key opportunity to engage in large research challenges. In particular an institution can take a position around research data such as James Cook Universe that they recently did by saying it's tropical data hub launched recently is going to be a place where research data is available. Now they're not saying it's about James Cook Research and James Cook Research Data it's saying there is value in thinking about research data for the tropics across the world and James Cook desires to have a role in that. So many institutions see research data as an opportunity to develop their research agenda. So how and who? So I've argued that it can't be achieved in research labs, it has to be a partnership between many different players. Now one of the things about that is you can't expect a researcher then to sort through the complexity of providers and partners in all of this sort of space. So we need to understand what the roles will be and how in fact to make this work in what's clearly a complex environment. So the roles we see in research labs clearly have a role in the planning and using of data. Research institutions have a central role in data management. Disciplines are going to be establishing both the standards and specific collections that will enable new forms of research to be conducted within the disciplines. Deep research providers are going to be really important in this space, whether they're state-based, whether they're at the institution or elsewhere to provide customized tools that enable the most effective use of the data. We can't have every research group creating their own tools and never sharing in that space. The data holders clearly have a key role in making the data available and this will include government, but it will also include large instruments. It's a reasonable think of the synchrotron as a generator of data. Clearly the square kilometer array is going to deliver a frightening amount of data, but equally those little sensors that are bobbing up down in the ocean that are generating data on a daily basis are, from my perspective, a data instrument. They're generating data that needs to be made available. Then there's a role for national services providers to ensure that the data is published effectively, connected, that it's promoted and coheres. And by that I mean that the individual approach that will occur without a national approach is unlikely to deliver the sort of value that will occur. Now most researchers will say, well, yes, they kind of live in Australia and they kind of work at a particular institution, but their real partners are their international colleagues. And so the international consortia are really important here as well and we need to be playing in an international space. So we say data's important, but data isn't data. Data can be more or less valuable depending on how it's managed. That is more valuable if it can be used beyond the purposes for which it was initially construed. It can be more valuable if it can be used by more researchers, i.e. beyond the particular lab. It's really valuable if you can use it to answer new questions and it becomes incredibly valuable when it can be integrated to explore new data spaces to explore the big problems, sorts of ones that I was referring to earlier on in the talk. So to do so, it's got to be managed, connected, discovered and then reused. It has to move out of the lab. So Anne's role in this is to enable transformation of data that is disparate to structure collections that are managed, that they're connected, they're findable and they're reusable. So the Australia gets an advantage of being able to use a really rich research data collection that enables them to work on the biggest problems with the best partners in the world. And that drives up research value. So how? We see this boundary that exists between the data that exists in the private domain where in some sense you don't need to capture terribly much of the context of the data because you kind of know it. Even however if it does live only in the private domain, imagine moving three years hence. It still needs to migrate out of the current private domain into the form where you can get that data again. I know the most valuable collection of data that I generated in my research career lived for the life of the experiments in a couple of years before that. And then two years after that, when I was asked about that data, I didn't have access to it. The systems had moved, the data had moved, etc. So we do need to think about a migration process that makes the data available for a wider space and in a space where that data can be made accessible. So what is it that you need to have to describe the data and make it useful in this context? We think there's a key role for the Australian Research Data Commons, a meeting place for research and data that enables data collections to be discovered, shared, using rich descriptions of those collections, and importantly, noting the relationships that exist between the data, the researchers, the problems, the institutions, instruments, and indeed the research projects that are underway. And then the infrastructure that is needed to populate and explore that comment. So here's a picture of what that looks like. It's a network where data flows, often won't flow out of the institution or the custodian of the data. We're living data stores, but what we hope is that the context will flow. The context, which is often captured in a metadata store, will make data information about that data available through an institutional portal such as CSROs generated, or indeed the portal that is the tropical data hub, which is arguably a domain portal or the Australian Oceans Data Network portal, or something generic like the research data portal that provides a description of collections right across all of the research institutions and right across all of the disciplines. Our aim is to provide a comprehensive view of the data that's available for researchers in the country. So what we want to do is we want to drive up the availability of that by adding layers of information. What's called metadata, but in this context, is seen as context, information that helps you understand whether the data is relevant to you, understands what your ability to use that data are, provides you with the information that I know is you do integrate that data, maybe apply relevant tools to the game. So we keep on adding these layers of information that makes that data more valuable. So moving from left to right in this diagram, we can see that we're just trying to add value to the data beyond the raw bits that exist in a particular computer on a single file system in a lab. So the result leads to automated metadata gathering because it has to be manual, the cost is often prohibitive, enables problem formulation rather than research questions that derive. It enables you to discover data that would be relevant to your work, integrate that and then investigate that. And what that really does is provides you with a much richer data environment so that better research can occur. So we want to try to capture this rich context to enhance the data's value. So ANS has been running for about three years now. We've run projects at most of the research institutions in the country, 120 different projects. We've got an operational research data commons, 35,000 collections across all disciplines except philosophy and we're assured that the philosophers are going to make data shortly available. Institutions are actively engaged. We're particularly pleased with the level of partnership that exists in this space. We have many rich discussions with institutions on engaging on their research data. It is also the case that ANS is particularly interested in establishing international research data infrastructure agreements and is working actively in that space as we speak. We do expect there will be a forum for international research data infrastructure bridges to be built between different areas, different players, etc., within a year. So this picture shows you where we're investing in this work. You can see that we're building new pipes or new bridges between the data, the data sources, the institutional portals, lots of work on capturing new context and making that data available. So these tools, these bridges, these pipes, this data provides a way richer environment for researchers in this space. So to conclude, Australia has made substantial investments in research data infrastructure and it is the case that Australia has a research data infrastructure advantage in the world at the moment. It's important that we continue to maintain that advantage and the work together will be all about doing exactly that. The way to do that is, in our view, bipartisanship in particular with the research institutions that have a key vested interest in how research data is made available and used at institutions. We do see a value in moving up from an individual data set to a collection to you because it enables us to ask the question, do we have the data enables to tackle big problems? The investments that have been made today provide a research data infrastructure for the country in our view and the way that will be exemplified, I think, is that Australia's researchers will be partners of choice in big problems because we will have the ability to store, use, manipulate, exploit the best tools over that data possibly. That will work most effectively if we partner. Thanks very much.