 Thank you to everyone for coming today. We're very lucky to have Max Wilkinson from the University College London Research Data Services and he's given himself a wonderful title of Delivering Comprehensive Data Management Services at London's Global University. Thank you Max for being with us today. Thank you very much, thank you very much. Nothing if not an ambitious title. What I'd like to talk to you about today is how we've approached this particular problem at the University in Central London and in order to do that we need to understand a little bit about the University of University College London. It's quite an unusual university and has come about through a variety of means. It's consistently ranked as one of the world's top 10 universities depending on which tables that you look at. It has alumni which is quite rich in Nobel Prize winners. It was founded in 1826 and was the first to admit students without regard to race, religion or gender. So that meant people from other countries, women and people that were not either Church of England or Catholic or indeed had any religion at all were allowed to join UCL as a convocation as a student. We have about 8,000 staff, about a quarter of which come from countries outside the United Kingdom. On university standards in the UK quite a high student population with over a third of these as post graduates. We have about 5,000 researchers and an annual turnover around 900 million pounds. We're quite multidisciplinary. In fact I believe that the only two disciplines that we don't provide are veterinary science for some reason and because of our founding fathers quite appropriately theology. Large universities have their visions that they attend to and this is ours. It's called UCL Grand Challenges and more specifically for research activities, UCL research frontiers and they build themselves around four main areas of global health, human well-being and to cultural interaction and sustainable cities. Collectively these are the visions that UCL stood for. They're under review right at the moment. We have a brand new provost and president who's come in and to take us into the next decade. Specifically about research IT services we sit ourselves or we were created within our central IT provisioning. This was a strategic decision to build on the skills that already existed in a very large and dispersed IT service but within that there was a dedicated unit that was specifically designed to attend to research IT needs. We were established in June 2012 so a relatively new establishment within the organization and our brief was to support IT across a support research life cycle with regards to IT and there's a little diagram there of what we consider to be a very quadrantized IT research life cycle that concerns innovation, active research, publishing that research and exploiting, removing the nonsense and going into the cycle again. Because we're relatively new and because we don't generally fit with either an academic organizational unit or an IT organizational unit we had to focus on innovation and skills. Innovation was our brief and in order for that to happen we had to invest in skills. Investing in skills means investing in people and expertise and we've found that this has been required in more established areas as well as the newer areas. So for example our high performance computing provision has been expanded. We have a brand new software development initiative that supports software development activity and also we have the group that I work for and that's the research data services. We see this as a collaborative environment so we don't only look within our university, my brief extends to consortia that exist in the UK and are constantly being rebuilt in the UK but actually the wider research community is all active in this area and we're all actually doing it quite differently which to my mind is quite a refreshing approach. There's no top-down activity as yet however of course that means that's quite a fragmented community and it needs to be brought together. We are fortunate in supporting this within central IT is that we can build on a lot of vendor partnerships as well. This is about technology and this is about culture. There are challenges in both of them and I don't think it's an unusual mantra that most of the A&S community will be familiar with. This is not necessarily a technological challenge, it's more of a cultural challenge. Well actually I think it's a relatively equal challenge between the two and we can talk about that later on in the discussion to show you how we've we've approached it at UCL. So to my mind a research data service has actually impacts all the quadrants of the research life cycle as we as we diagrammed it. We have to support the access and innovation to initiate the research. I think it's well established now that increasingly hypothesis-driven research is being complemented if not supported by a more data-driven activity. The idea of reusing data is well established in science but the critical need as we see it right at the moment is in the undertaking of research. We see a lot of people engaged in what would be considered not very good practice. However it's not bad practice. What it is is practice that is born out of necessity because there is no alternative and I think this is something that is sometimes lost in the research data management field. People have always been sharing data, have always been managing data. It's just the burden has become too much for a number of reasons and we've taken that abstraction down as a way of approaching the providing solutions and enabling a much better practice. Now you can't talk about research data unless you talk about research data sharing. Sharing is a very important part of research and as I just said I believe that it's done and has been done for a long time. My own background and most of my colleagues' backgrounds are in the research domain and most of our research would not have really progressed very far unless we were able to share data from others and that is something that we attend to as the second concern. Innovation and exploitation, well building on the shoulders of giants or just simply reusing data that stops you from having to generate it again is really the foundation of the business cases we see it if you don't have to generate data then that saves you money in the long run. However the other angle to this is that actually this is good practice in the research life cycle. People share publications all the time they're unable to do that by the publication paradigm and actually if you look deeply into many publishers guidelines they require you to provide data that underpins your publication upon request. Now I know as a molecular biologist that's being sometimes difficult to do however that paradigm shouldn't be lost because that's one of the fundamental tenets of good research practice. So UCL's case for a comprehensive research data service wrap was really elicited through three main concepts and that is that most departments and research groups have a long history of excellence but there's a lost opportunity here for building on old this is attending to the reuse the reuse issue. UCL is highly multidisciplinary and for a long time there have been some very concerted efforts to bridge boundaries across disciplines. I'm a very big believer in cross disciplinary research. I'm also a very big believer that you have to enable this to happen you can't force it to happen. I believe there's a lost opportunity for cross disciplinary research through the reuse paradigm. Unmanaged data sets are lost data sets. I think that really goes without saying the number of organizations and people that I go and talk to that have racks and racks and racks of stuff sitting on discs on tables on draws on everywhere it never ceases to amaze me. In fact you can generally pick when someone moved out of a research activity by the format media that they last have in their collection of draws. Mines were zip discs which I think had a life of about six months before they were super seeded and that's where mine stopped but increasingly people are buying the two, three, four terabyte hard drives of retail grade and using that to provision their storage. This is just as bad as the bottom draw activity as is the the racks of servers that people put into their closets because they don't have the alternative. So this is a burden content. This has increased burden to researchers and if we can't help that then we've failed. They constantly talk about backup. Backup in my mind is also another consequence of having to manage their own hardware and that's to fight against the failing USBs and hard drives and controlling the proximity issue that I'll talk about in a second that has to do with authentication and deciding to whom you share your data and when. So our approach was threefold. We started by saying we have to remove the burden from our UCR researchers to attend to this PC world conundrum or Curry's conundrum. I don't know what the hardware provider is in this country but people can go down to PC world that is is not 200 meters from UCL and buy three terabytes of an external hard drive for close on 140 pounds. Actually that's about half that now. This is an old slide. If we can't compete against that then we're in trouble. More importantly for the organization is this hysteria that's been created created around compliance and policy alignment. I understand that the ARC is starting along this route as well. About two years ago the research councils banded together actually three years ago. The research councils joined together and produced some research data principles from those principles. Many of them aligned their own data sharing policies some of which were very mature and quite well established some of which were not. The EPSRC the engineering physical sciences research council was tasked with the strongest language in that they wrote to all vice chancellors and provosts talking about their expectations that organizations will provide services that will enable researchers to keep their data for long periods of time and share it appropriately. They used very strong language and they started to talk about time scales and spoke of a period of maintaining data for 10 years past its last access. This created an absolute hysteria across UK universities and I think probably in a lot of ways has been misunderstood. Research councils primary goals in the UK and most elsewhere is that they support good practice. They see this as good practice. Most people understand this as good practice but there's a big gap between what we can do right at the moment and what is required to fulfill that good practice. So there's a very rich and dynamic policy landscape both from the publishers the funders governments charities and actually UCL and we'll come on to that in a second. But of course you can't remove the burden and you can't encourage people to come to the compliant unless you provide an incentive for them to do that. Some of the ways that people store their data right at the moment really feel to this and if we can't provide an incentive framework then actually I don't think it's really worthwhile starting down this path at all. So UCL and a number of other universities are thinking about this as a two-stage way and that's what the next couple of slides are going to be talking about. So we're talking about not only an infrastructure change we're also talking about the culture change this bilateral conundrum is not new to anyone at least of all parents. So is it all carrots and no sticks? Well we're providing an offering project will and they cannot doubt. We intend to remove burden of managing storing. We're trying to educate and consult our researchers about resiliency and how backup is managed and how data remains stable and managed effectively in a technology environment and removing this burden of compliance. Well so it's not really all carrots at all there are some sticks and those sticks are the compliance. Research councils are increasingly expecting compliance. We've been given deadlines in the UK to provide elicitations about how data should be managed in the form generally of data management plans but more more practically in the form of of infrastructures over which people can manage data. So it's not all carrots it's not all sticks and one of the first things that I did when I came along to UCL about two and a half years ago was to say what we need is a roadmap to how we get from where we are right at the moment which is a rather mixed bag of service provision and intent to the place where we want to be which to my mind was attending to the first critical need which was providing infrastructures over which people could manage their data and we split that into six different work streams. I'm not going to talk about all of them in any great depth but just to overview the first one that was very important to me because there was only myself and one other person in the organisation when we came on board and that was to put together a team and build the skills that were required to do this. I was asked to write a policy in partnership with our Head of Library Services and this was attending to the concerns that the organisation and the funders had that was talking about how to make responsibility chains available to individuals and the university to make this work. I'll talk about policy in a second. Infrastructure this has been talked about for a very long time in many different areas and I was determined that we weren't going to do anything anymore talking until we had some people in front of some infrastructure and doing stuff. I'm pleased to say that at UCL we've benefited from central support. I've considered that to be absolutely essential to get the ball rolling at UCL for this. The stuff I don't lose sleep too much about is data management. That's not so much a time bomb but more something to look forward to and a challenge to look forward to in the future. I was very concerned about providing infrastructures over which people were able to manage their data rather than trying to get them to manage their data first and then get them to move that into an infrastructure that was resilient and safe. There is an awful lot of data that can be used already. There's lots of registration and grant management systems and application services. We need to think about service integration because if there's one thing that will turn people off, it's having to register something multiple times. That's an opportunity for us. It's not a high priority right at the moment but it's something that we have to keep an eye on. Of course, even though I enjoy the benefit of centralised support right to the very top of the university, I have to make this in some way sustainable. That doesn't necessarily mean it's a full cost recovery but what it does mean is that we have to be open and honest about how much this costs. I've actually been enjoying myself immersing myself in the business of procuring, managing and storing data in enterprise-grade storage facilities. I think what we're coming out with is going to be certainly something that's workable. However, what needs to be sitting across all six of these work streams is a communication activity that starts to elicit the responsibilities of the individuals and roles within and across the entire institution. We have to make sure that the policies understood and supports those roles and responsibilities and the infrastructure and the data management facilities are capable of providing those facilities. This was a roadmap. I don't know whether it's publicly available just yet but it will be shortly and anyone can go to our website and download both the version one of our policy and this roadmap shortly. Just quickly, these are the organizations. We are a band of four. Three of us know what we're doing. This is Daniel Duggan and Alastair. They both have high degrees in high technology areas in a variety of disciplines and I'm very fortunate to have a very enthusiastic team but the research data policy was the first goal on my track and that talked about three core activities. That was responsibility. Who in the university is responsible for this? About compliance. How do we make people compliant and what does that mean? Good practice which is really talking about we're helping you to undertake and better the practice that you want because you're doing stuff because you have no alternative but there's also a carrot there and incentive in the way that we have the value add services of long-term provision of storage, accessible and signable objects. That's all contained in our research data policy. I welcome feedback from it. There's been a rash of policies that are being published in the UK and no doubt much the same is happening here. Of course, when you talk about infrastructure and going out and collecting requirements, you don't get surprised by what you hear back. Researchers generally want everything. They want everything that's familiar to them but they want more. They want it faster and they want it shinier. They want their NFS, SIFTS exports. They want their secure transfers. They want high performance transfers and then the dreaded cloud storage that do not speak its name. This is something that I'll talk about in a second but suffice to say if you cannot provide these then you have failed. So you have to work out how to provide all these things but still maintain an administrative overhead that is manageable in a group of three. It has to be value for money. I'm very open about how much it's costing us. That generally transpires into a cost per terabyte. And of course, we're in central London. I'm fortunate in that I'm not constrained by financial constraints right at the moment. What I am constrained by is very expensive real estate and a limited power supply. We run two high performance data centers, managed data centers within our footprint in Bloomsbury. That is a very costly enterprise. So the solutions have to fit all five of these requirements. And the choices that we made and these were designed to choices that first of all, we're going to start with very, very simple service propositions. We want to start with infrastructures that's been tried and trusted and make a challenge to this big data hysteria. I don't buy into the big data hysteria. I buy into people having more data than they're able to handle but for very clear reasons I think we can solve that. We need to look at strong abstractions because we want to avoid locking to proprietary technology. I've procured a single multiple component technology but we are open to joining together with distributed technologies that are already embedded in the departments. UCL is a large and complex organization that is grown by consumption and we have to be able to attend to that. But also we need to hedge our bets in migrating between storage solutions. If everything is on high spinning, high value, fast spinning disc, that's not a very efficient way to manage storage solutions. We look at the conventional tiering systems and see that they're very useful in business organizations but possibly not so useful in a research data environment where we envisage a much richer landscape in both media and storage facilities and we have to be able to join those together whether they exist in departments or whether they exist in centrally managed services. We have to be able to provide for people that have to and that's really borne out of the idea and the notion that different disciplines are funded to different degrees and if we can't provide all disciplines which is my brief across UCL then actually that's not working very well at all. Being an IT department we're based on project development but that's not a great way to be innovative. It's a great way to develop stable and operational services. We have to split our time between project space development and we have a high degree of innovative development which is very small-scale ad hoc and one particular area that we're doing that I'll talk about in a second is our cloud evaluation. So the first thing that we started at was that we separated the concerns that people generally get mixed up. Having a live area where data are mutable but attended to the critical requirement of stopping people from going down the road and buying more terrible drives but in a private and project level agreement was our first concern. The central concept was a project not an individual. Allocations are provided to projects that have individuals and that's very important for two reasons and we'll come to those in a second. But moving away from a live environment and transferring data into an archive environment was something that had missed a lot of the previous work done in our organization. So what goes on there, you get a transfer of responsibility, things that go into an archive go in there and stay there unless they're thrown away. There's no point in archiving something if you can't access it in some way shape or form and exploit it but most importantly there has to be this extension or this transfer of responsibility. If an organization is to look after data for long periods of time they have to be able to take responsibility for those data and make decisions on those data and that's a departure from the way people think. So we said this was going to go in two different, this would form the basis of our first two services, a live area which was just a simple allocation and an archive area which guaranteed preservation for long periods of time. The second principle is that our project was the central concept. That's really important for for two fantastic reasons. Firstly it's bound by time so projects have a start and finish they well good projects have a start and finish, bad projects go on forever and that's bad practice. We don't want to do that but if you bound your allocation to a project that has a start and finish date you can measure your storage allocation and data storage as a direct cost of the research. There is also a very rich group ethos that exists in research with regards to projects and groups. So when people come to me looking for allocation they get a project run but also it's changes people's behavior and this this proved to be one of the one of the real eye openers here. Instead of the normal I am going to do my project generate data and right at the end I'm going to have a mad run around to try and find how to look after the data. It gets people thinking about what to do with their data after their project period right at the beginning of their project rather than right at the end of the project and that's been a real breakthrough in changing the way that people think about data management in a research environment. So they will be ready to archive well I say they will be ready to archive we'll have to see. I think there'll be fewer last-minute rushes but I can't guarantee that there won't be any. The third principle is that we were going to use established technology to attend to the early adopters. We did this in the form of IBM's general parallel file system presented as GPFS as the technology we're using as DDN's group scaler. It's conventional and POSIX file system as high performance and these are the people that have large amounts of data and actually can teach us about how to move data about. It's very important that we're able to connect to the high performance computing resources. This project used to be connected to HPC but was moved away in order to encourage people who have no need for HPC resources a so-called long tail of researchers to come on board and that's proved to be quite a good strategic decision. However we're starting to move together now and anyone involved in HPC will know that there's a common problem of people using scratch space for storage because there is no alternative. We'd like to apply that alternative to HPC users as well. There are many options for exports here either and the native file system, SAM or SCP or NFS. It's non-trivial to manage and it has quite a high administrative overhead but it's familiar and the earlier adopters were very important to get on board. However our fourth principle was that we required room to innovate and we innovated by looking at the different technologies that were around that attended to the big hysteria of data is that it will grow in an undefined and unquantifiable way. This needed to be scalable. Generally speaking file systems are not scalable unless you can increase your administrative overhead almost exponentially. A relatively new technology at the time which was object storage exceedingly flat service that essentially exists as a digital object that has a metadata tag and location to it that sits behind a management system that is automatic. There's a REST API and there are lots of policies that manage how the data objects are replicated. It provides a resilience that is much more efficient than RAID and is highly scalable. There are actually there are greater than 100 petabyte deployments right at the moment as a low administrative overhead and we have a native IRODS connector which I'll talk about in a second. So storage access we have three main activities the HPC, our grid scaler system is a SAN presented as GPFS but the most important thing for us was developing the object storage which essentially is a NAS presentation sitting behind a very high availability and high performing virtual infrastructure which exposes a native SIFS or NFS mount points and is resilient in a very efficient way. And then there's cloud you talk to anyone they want to use something that looks like Dropbox because that's what they're using right at the moment and there are very good reasons why people use that they're not alone Dropbox, G Drive, Sky Drive, any number of third-party cloud providers. However it's not performant, it's not cheap and it's also not very secure. So if we can't provide an alternative then we've failed before we even start. So we went out this thinking that actually there must be some quite mature clients out there over which we could build our private cloud. These cloud activities are very widely used quite unofficially in academia but we found that actually the landscape is evolving rapidly and we've been evaluating these four cloud clients. We're not being particularly happy with any of them but they're still quite a mixed offering and we don't have a solution right at the moment. We're very interested to hear people's experiences on how they've been finding this. Each of them has a unique selling point which is great but fails on a couple of other activities. So we're very keen for this to go forward very rapidly. There's two things that have been quite useful in driving this forward to us. One is that hold debacle with the NSA and GCHQ. The great thing to come out of that was that it gave people an understanding of the value of metadata which is really great. The other thing that came out of that particularly in organizations such as higher education where the researchers are particularly paranoid is that they much prefer to use private clouds than public clouds. So we're intent on providing this and we're very keen to hear about people's experiences on how they've been finding any clients. So do feel free to drop us a line. We're not convinced that it's evolved enough to provide this a stable and production level service with a low administrative overhead. The component that makes us agnostic to vendor snake oil is iRODS the integrated rule-oriented data system. It's a very generic policy engine that can be used to integrate multiple storage technologies. So generic in fact that it needs quite a lot of effort to configure. There is however a traction gaining in the open source community to support iRODS. It's been going around for a long time and there's also traction in the vendor community to support iRODS which we think is a great activity. So this helps us bridge between the different areas and storage types from our conventional object and tape storage that we control but also departmental level activities and the archiving components that we're piloting this year. There's also a metadata store that we would like to help people use to enrich. I don't spend too much time on metadata. I'm a big believer that people will enrich their metadata as they see value for it. This is about building standards. This is about community conventions and this is about me an accidental IT person getting involved in activities that are not appropriate for me to do. So I'm happy to provide facilities for people to do that and we'll have to wait and see how that goes. Speaking of metadata, anything you can do I can do meta. My belief is that less is more and so when we register projects for allocations we collect a small amount of metadata that sits principally around the investigators, the projects, the members of the project groups, how long the project goes for, if a funder is available what that funder is and a small amount of narrative that they provide. The enrichment when it moves from a live area to an archive area is domain specific and therefore the responsibility of the researcher. We intend to help them as much as we can but right at the moment we're quite content that what we gather is sufficient. Now that will probably create a house of laughter across the community. There is no semantic or syntactic interoperability available here at least at the very lowest level. That I think is something that we can help to change but we can't change in our own right and I don't think it's something that can be done at a university level either. We have to maintain a degree of independence and that anything that has to do with metadata and providing comprehensive metadata has to be done at an international or an international level. We are very happy to be part of international and national activities but we're not in the business of providing that at an institutional level. The second part of course of data management is data preservation moving from one side of the concern to the second and this is the second service offering that we're providing. We're piloting it this year and bringing people around to the idea that data preservation is not like Pickling Onions. There has to be a degree of planning, there has to be a degree of intervention. I don't expect anyone to focus too long on that image down the bottom that is a skeleton of data preservation activity but this one actually that I lifted ruthlessly from Archivem data preservation organization and third party in the UK is quite sweet. If we look at this as a 20 year time span and each transition from grade yellow indicates an intervention is required. So when you're monitoring media in a large tape library for example you have to do that constantly and tape libraries generally do. The monitoring and the maintenance of the hardware is something that happens periodically as part of maintenance. Integrity checking is a much more involved process but has to happen quite regularly as do hardware upgrades and software migrations then hardware migrations as we move for example from LTO 5 to LTO 6 or LTO 6 to LTO 7. We can't ignore the fact that this takes people to do a lot of this not all of it but staff generally create the large cost here and then every now and then you're completely influenced by a format migration particularly if you have a format or use a format that's in some way proprietary locking. All those transitions are required to be looked after when you preserve data for long periods of time and this is only data preservation this is not digital preservation the idea that having software in order to render the data is equally important as preserving the data is not attended to here so that was an added cost on top of that. Once we start talking about that we really start thinking about how much this costs and costs spiral quite quite quickly. UCL being quite large service integration is something that really plays in my mind it's a great way of collecting metadata about the objects that you're storing it's a great way of enriching the scholarly record so the idea that an institutional repository that attends to the open access agenda and publishing can be supported by an even opener access agenda in data is something that I'm very keen to develop but this is about professional record people don't publish these these articles for anything other than their professional record so supplementing that with the data that they generate is something that I think we're all intent on delivering. Of course I'm not into the idea of duplication if data can be held somewhere else then I'm happy for that to happen as well the idea that we can reference data that was generated at UCL as a record of generation rather than anything else is sufficient for the service integration that I see. Business analysis four points to measure here the financial landscape which is quite hysterical right at the moment the responsibility which is almost non-existent right at the moment the continual mix up with licensing and the idea that when you say you have a data asset that you can then go and sell it troubles me deeply. If there were one thing that keeps me awake at night it's the mess and misunderstanding of licensing research data but the second thing is long-term commitments are required in order for this to be sustainable commitments that are so long-term that they generally go outside the employment of any of the staff members that are involved in it including the provost and a quite an unusual time frame for an IT department six and three to five year hardware refresh cycles. So my business analysis doesn't only attend to the costs that researchers and their grant funding bodies bear but also the responsibility that my bosses and their bosses play in this analysis. The current state of play we have about 40 projects this has all been done by a sloth launch through word of mouth that's about 200 users which is a rather modest collection of users. We have about 300 terabytes that's been uploaded in the last six months people are testing the water with this but we are starting to develop a bit of a registration backlog around the registration time so we need to attend to that. I'm quite concerned that because this is new technology we need to deploy slowly and expand users appropriately that will all change when myself and the head of library services do a double act around all the departments and deans talking about both the open access agenda and the data management agenda and reminding them of UCL's position in doing this I believe that we will start to see an increase that will be a lot more impressive for me. We have four main challenges that we've identified I don't think any of these are really new we're very concerned about authentication the idea of shared services is in the UK as well as I understand here that will not work unless you have stable trust networks and authentication mechanisms. We don't want to start our own although we do need to find a solution. This is something that we're hoping to attend to in a variety of manners one of those is an organization a project called Project Moonshark run out of Janet it's not a product per se it's more a standard to write your authentication services to and with all standards they're long to develop and difficult to understand. We have very big concerns about networking we're going to be flooding our network with data movement and that's going to be something that will bring the attention of our very well developed and very extensive network in UCL we have the largest private network in London and we have licensed to annoy commuters by digging up roads so anything that starts to affect that will bring the attention of very large and powerful people plus some departments are not provisioned very well and we need to identify those and attend to them as quickly as we can. We want to provide multiple access mechanisms that presents the Ministry of Overhead and a software and virtual infrastructure overhead that creates challenges those are generally attended to by having a large team while I have three people that know what they're doing we will be needing more people shortly and then finally licensing and I'm very happy to take questions about how people see research data licensing building one thing that we managed to get through in our policy was that by default unless there's a declaration made anything that goes into our archive will be dedicated to the public domain via a CC0 waiver we think this is a very important development and we're going to push to have the majority of data in there dedicated to public domain unless we have obligations under statutory legislation or any sort of contractual arrangements so I just to finish up if you wanted to read more about how we've implemented all of this then there are a couple of reports that are out we contributed to the science as an open enterprise both as UCL but I also contributed to that when I was at British Library recently there's been a Leru report the League of European Research Universities that spoke specifically about a joined up roadmap for research data in there we talked about how much it costs to establish the services that we have and how we attended to our policy decisions and there's a rather ambitious press release from one of our vendors data direct networks that talked about a 100 petabyte cloud to share and preserve data I think we can safely say that we've taken a step in that direction but I don't have the space for 100 petabytes right at the moment that too may change I'm very interested to see how our services taken up as we see so if anyone wants to send any questions through I understand there's a facility for that and we can read them out and attend to them otherwise I thank you all for listening that's my email address or a more generic one if you don't wish to talk to me and down at the bottom of the URL that takes you to some of our rather sparse web pages where you can download some of the some of the questions we have one question how actually talked about changing behaviors with data management how was this tackled at UCL did the library also work with you on this the library is one of my key stakeholders we work in central IT and that that generally puts you at the bottom of the university as far as I can make out and I only have four people we need reach across the university and I'll just do that the library is our key stakeholders for that so when I say that we're myself and the director of library services is going around doing a double act that is the beginning of our outreach layer that that that has feed us into all all the departments the library is looking for new roles and I think this is a fantastic new for the new role for the library changing about changing the behavior of of researchers is something that's going to be time consuming and we need help for it the reason I say that is because our experience has been that when you go and talk to people and I'm I'm very much at the opinion that when you gather requirements you look at see what people do and try and help them do it better rather than try and second guess what they need and so we went and looked at what people did and what they did was mess around moving data between the portable hard drives and their their computational resources so we said well if if you had a central facility perhaps you could think about doing it this way where all your data your initial data your abstract data goes in the cycle from from a resilient centralized service to your local provision where you know essentially no network is ever going to beat the performance of of of having your data locally but then feeding it back wiping the slate clean you have a central resilient service that holds all the products of your research and the artifacts of your research and you can pull those across as and when required from your from your inquiry and actually it only took a couple of minutes to before they see actually that's a much better way of at least managing the technology and we've had a lot of people start doing that which which actually increased somewhere so that's the beginning and that's only the beginning I'm very keen that that that's recognized as only the beginning I think there's a long road ahead okay you mentioned some data is held elsewhere e.g. in discipline repositories do you know how much of the UCL data is not under your management I don't have any idea right at the moment anything that's held outside of UCL is likely to be represented in the archive rather than any live environment the live environment is for active projects rather than static data so anything that's held outside of UCL I would consider static and so represented in the archive duplication is not ideal but it is required sometimes and that's what the that's what the live area is for I don't expect anyone to try and deposit stuff that exists elsewhere but I'm open to the idea that mistakes can be made or there is a requirement that we haven't seen yet how many research projects currently using iROGS generally or or in UCL at UCL no one is because we're using it as an administrative interface right at the moment we haven't fully implemented it and when we do that's when it will be opened up to to the users so we use it as a way of managing the different storage components that we have about the place and it's not not in a production service just yet we we need to understand and we need to configure it the way that we we wish it to behave as with most policy engines you have to write your own policies generally the next one is when a researcher leaves does UCL keep an archive copy of the data product anything that goes into UCL's archive will be considered the responsibility of UCL unless there is a case that individuals researchers can make then it will generally remain in the archive we don't throw stuff away unless we're required to that's the starting point of course the the upside and the benefit for that is that they don't have to look after it themselves so when people archive into the UCL research data archive they they're provided with stable objects that they can reference and then you know that can provide data site DOIs or any other persistent identifier is there a discovery for UCL not yet that is the third service offering I've been very determined to separate out all these concerns I'm very concerned that I've made it a third service offering there's no point in archiving something unless you can get it again but it will follow closely behind I'm very mostly concerned about getting people understanding the importance of archiving data and the implications of archiving data for long periods of time of course anything that goes in there has to be able to come back out again do librarians assist researchers with challenges other than storage most definitely the librarians have very rich domain level knowledge and they will not only assist the storage there will be the king makers in research data management because they will be the ones that will be able to advise on the value of metadata and I think that that was really going to be an important process here that's the degree to which we can act locally so when I talked about national initiatives I don't know whether you can have a semantic and syntactic interoperability around the world but you can have it in your domain I have to say that I'm also a very big believer that people generally don't stumble across data that they find useful they actually haven't a priori knowledge of what these data may look like and they go looking for them I think it's it's quite rare and often only serendipitous of course we have to allow for that but I can't design against that need do you have a takedown or removal policy for public data we don't intend to store other people's public data the data that we do store is taken on the understanding that people have the rights to dedicate those data to the public domain if it turns out that they don't then they'll be liable for takedown absolutely you know we can't ignore requests but the requests will be looked at on an ad hoc and as required basis it won't be a default situation that anyone says takedown we will we have security quite quite a large security data security team and we work quite quite closely with them in these sorts of situations I'm a big believer that we should start from a situation for making the most data available most appropriately I will act on my advice as to whether something needs to be taken down or not has there been any dialogue about creating or maintaining stable access to global sharing network or does one currently exist there's been lots of talk I think about a global sharing network this happens in a number of domains particularly in astronomy or particle physics anyone that has large amounts of data or enormous facilities generally have a sharing network but there's no global network as far as I'm aware for all types of data so you'll see domain specific examples but you won't see a generic sharing network they're quite difficult to sell out I hope that answers all the questions indeed thank you very much max it's been wonderful to have you here and actually have you here in person as opposed to this is my old university as well as undergraduate here fantastic and thank you once again to everyone for attending and thank you for max thank you