 Welcome to today's ANS webinar. The topic for today is sharing scientific data. Astronomy is a case study for a change in paradigm. Many of you will have heard people talking about big data. When people talk about big data, they often say astronomy is a good example of that. Or in fact, there's one meeting I was in recently, someone from astronomy said, oh yes, we've solved all these problems. So astronomy has been at the forefront of dealing with data and very large data volumes for a long time. And today we're delighted to have a visitor to the ANS offices, François Genouver, to talk about how astronomical data resources have been networked and then integrated globally through the Astronomical Virtual Observatory. François has been managing astronomical data for a very long time. She's probably best known as the director of the Strasbourg Astronomical Data Centre, that she will speak about today, but also as one of the authors of the Riding the Wave report and the follow-up document, the Data Harvest. She's associated with a number of European projects that she'll also talk about. And she's a member of the Research Data Alliance, and as of a week ago, two weeks ago, co-chair with me at the Technical Advisory Board. So she'll talk more about some of the projects she's been involved in as part of this talk. So I'll now show her and hand her to François. Okay, hello everybody. I know you are here. I don't see you, but everything is okay. So the idea is that astronomy, I think for me, data is one of the research infrastructures. The elements you need to be able to do research. And in astronomy, it's interesting to think about it, because we are one of the disciplines who are, obviously, having big research infrastructures. The ones we have on top of the mountains are the ones which are on satellites to make observations. So we are one of the disciplines which are associated with research infrastructures, physical ones, the solid ones, you know. But what is really interesting is that in this world where we have these big things, which require lots of efforts and so on to be built, we also now consider that data is part of the research infrastructure of a discipline. And I think this is the changing paradigm which is described in the title. There are disciplines in which we are not this big physical infrastructure, but in those cases, data can also be a research infrastructure, and I would say it should be a research infrastructure in all disciplines. And I think many people are by these days working on this to make it happen. So the idea here is to tell you the story, or at least part of the story, or my view of the story of what we did in astronomy a long way to go through that. So what is data in astronomy? I told you it's evident, and actually it seems evident, that we have these big things on the mountains, or in desert, because not everything is on mountains, or in space. And there are observations, and it's evident that a large fraction of astronomical data is what is observed by the telescopes. In general, we have competitive goals for proposals, we have community answers, people get observation times, and then this builds a set of observations from that particular telescope. And in that case, the agencies in charge of the telescopes take care of data and disseminate it, and I will go back to that later. We also have big projects which do systematic surveys of the sky, homogeneous observations, not a small bit of the sky observed from a proposal, but an homogeneous observation of the data, of the sky, which can give billions of objects with an homogeneous data set of measurements, or images, spectra, and so on. This is on the observational side, but there are also the results. And in the research process, we are doing modeling, and we want to compare the modeling results with the observations. And this is also part of the astronomical data we have to deal with, the results of modeling. And we have also all the results of the research process which are stored coming, linked to publication. So all this is data we have to deal with if we want really to be serious about building the data infrastructure. In our case, we also built a long-range, value-added databases. And the one I was the director of in Strasbourg as this kind of role, it was created in 1972, which is long, long ago, to deal with electronic data, which was rather far-seeing at that time. And the idea is to gather information in particular from publication, but to check it, to homogenize it, and to distribute it. So we have, for instance, I will give an example because this does not exist in all disciplines. We have a database which is called SIMBAD, which contains information about astronomical objects, something like 8 million objects now. One of the interesting information in that is that it has all the names of objects. And you will see here an example. You don't go into, I'm sure you cannot see it very well, but in that case, this object has 78 different names found in the literature or in catalogs. So this is typical of what happens, and there is an added value to put together all the names of the objects in a single place so that people who do infrared astronomy, they know the Thomas catalog, they know that they are colleagues from the radio community called the object another name, and it's the same object. So this is cross-cutting across sub-disciplines, in a sense. Another thing is that we store all the publications in which this object has been cited. In that case, there are nearly 3,000 papers in which the object was cited. And so this is very good, but this is also a nightmare for users because in some cases it's just cited because someone says, this object is like that one. And so we have, along the years, been able to develop a tool which sorts publication by relevance, which means we know where the object name appears in the paper and this gives us an idea of the importance of the object, of the paper for the object. If it's in the title or in the abstract or in the keyword, if it's in the table, this means there is data about the object in a figure, caption, and so on. And also something about the weight of the paper itself because we know how many types it has been cited, how many objects there are in the paper, and so on. So this is a recent tool, but it is a powerful tool to help the users to find their ways in that kind of crazy publication records. So why sharing data? In astronomy, at least, it's at the real core of science needs of scientists, of the astronomers because now it's nearly impossible to do astronomy by only observations with a given telescope. You have to put together observations with different techniques, with different telescopes, different wavelengths, and so you need to use data. You, in general, did not take yourself that someone else has taken and compare your own data to understand the physical processes at work in the object. You have the example here. I like this one of the crab nebula. There was a star which exploded there in 10,000, sorry, 1,000 something long ago. And if you look in optical wavelengths, you have here what remains of the objects, matter filaments which continue to go further from each other, and in the end there is a very compact object and what you see in white is the matter. What happens to the matter inside with respect to the object? This is x-ray, and you understand what happens, what the object is only when you do this kind of thing. We also have more sophisticated views of the sky at different wavelengths. There is also the question of time variability. Most objects are variable. Comparison of theoretical models with observations and so on, so you need to share data and different kinds of data. And of course, if you are a politician, these infrastructures cost a lot and you are required to optimize the science return so the data is reused for different aims and for the first objectives of the first robot. So what happened for us is that data is available and it is used. So this means we have done this kind of paradigm. Astronomers use remote distributed data in their everyday work. There are more papers from data retrieved from archives and from original observations. I have here an example of a Hubble Space Telescope. The raw thing is what the papers written from the archives download and you see that there are more papers from the archives than from people who just got observation time, GO. And also combination of new observations with archival data. If I take my small example of the CDS, we had more than 800,000 queries per day on the services last year. And it's only one of the components of the network so this means it's really used. So the basic elements, how we have a common data format and it began in the 70s, there are defaults but at least it's here and it's the basics of data sharing. We have a very strong tradition of international collaboration. We share data, it's open and to be fair with people who do very good proposals to get observation time, they get the data for one year and then the data is open to everybody to take advantage of their good ideas for everybody. And also what we did with the data is fully driven by community needs. Which means that when you have an observation archive or when you have a service, you think about your users and not about yourself. And I think this is one of the key of success. So networking, we began early. We were early adopters of the web and we began to network, I did value services and journals in 1993 at CDS. I will show you a bit. And the archives joined very soon after that so before the end of the 90s we had a network of online resources around archives and bibliography. So this is an example of this early networking. You have the ADS database which is a bibliographic database maintained by NASA which has links to all online resources. Among these resources there are tables published in journals we publish at CDS online which are easy result page. And here you have a landing page of links to online data here. And on the other side this also links to the archive of the ISO satellite which is at ISA in Spain. And so you see you can build these powerful links between the results in the papers and the original data and the catalog list of objects which results from the studies. So this was before 2000. Then it was all good and all used and all useful but we wanted to do better so we decided to build seamless access to data not only to network resources. The networking took full advantage of the fact that HTTP allows you to click from one page to another but there is text to explain on the pages to explain what it is about. So this is the full power of the internet that we began to use early because it was useful. Going towards the virtual observatory it's to have seamless access so it's a bit different. You just click on one button and you see all the available data in one direction of the sky for instance. So we began that at the turn of the century. This means it's a very hard task because you have to define standards to make all this interoperable to share the way to build a query to share the registry of resources, a list of resources where their characteristics and information about what they do and so on. So the virtual observatory framework is for interoperability standards and it's also tools to access data and the idea is to be able to discover, to access and to use data. So we built an alliance which is called the International Virtual Observatory Alliance to define the standards. It's an alliance of national VO initiatives. There are procedures in spite of the World Wide Web Consortium but adapted to our own needs and every time it was possible we made sure to use generic elements because we are not an isolated island. We can be seen as a successful but isolated island. We have OAPMH registries. I know those of you who are in libraries know that very, very well and we have vocabularies which are W3C, SCOS, RDF, so we are clean with that and other people can reuse what we have done and other people can link to our resources. But we spent lots of time and efforts and it's not easy to define the disciplinary elements of the data framework. So if you are interested you can visit the virtual observatory page, IVOA, the IVOA page. You see that here you will recognize if you are in this country you will recognize this. We were meeting last weekend. I just came from the meeting to here yesterday. We have two meetings each year to discuss the standards which are milestoneed in the standards definition to exchange on tools and so on and the last one was here in beautiful Australia the last days, let's say. So here if you are interested there are things which are for astronomers. You have here the suite of standards if you like standards they are all here open and you are here if you want to know how we work this is for what we do and it's there is a weekend so you can see what we do and how we are organized. So this is the standard framework. You see the starting point of the architecture is very simple, it's functions. We need to find data to get it, to use it. There is a sharing and we need some things in the middle to be able to do all that. So this is the functions. Then these are the groups which are working to fulfill to enable these functions. The registry of resources and finding, getting these data access protocols. Here we have query language semantics, data models, formats and so on and here the users and here the providers which can be computers and archives and so on. And the users can be both people and computers so we have to be careful with that also. The 800,000 queries on the CDS most of them are for programs which query the databases otherwise it's crazy. You see that guy on the web typing the number of queries? No, it's not that. So here you have just for fun to frighten you a bit the real standards. I'm not sure it's up to date completely but here you have each data access protocol as a corresponding data model for instance and you have all the things we have developed and when we have a new standard here it appears here with all the standards which are related. It's a complex picture. It takes time. It's very difficult because you need to have international agreement on standards but I can say that after it began the IVOA was created in 2002 and now we have something which is operational and it has been operational for a few years. So I think that it's interesting there are several disciplines or many disciplines who are building a disciplinary interoperability framework or they want to do it. These things can be very different because the disciplines are different they are different organizations, cultures and so on and I'm not telling you you have to do the same thing as what we do because it's not true. What you have to do you have to find yourself. But the characteristics of what we do is that it's inclusive and open which means we are in a world if you do particle physics very soon and then there are things but it's pyramidal can you say pyramidal in English? It's some kind of pyramidal world. In our case we have these different telescopes on different mountains in different countries most of them international by essence or global like you certainly heard about in this country in particular so we do not have we cannot have a central point or the top of the pyramid this does not exist. We are, we work, we live in a multiple world and we have if we are serious about sharing data it may keep the global endeavor so this was the starting point of what we did with the virtual observatory and before that also the networking of the resources so the idea is to have a model which is open and inclusive this means that you can, anyone can participate and put the resource in the registry or build a tool to access data and you can do that by putting a fit interoperability layer on top of the data we don't care about how you deal with your data this is your business it's not the view of business you do what you wish but if you want to be a member and if you want to participate you have to be able to query to answer a query with certain protocols you have to have certain vocabularies attached to your data and so on and this works but do what you wish these are things which are under your own responsibility one of the problems we have is the view is invisible if it is successful so people will think that it's useless because you don't see it and so there is some kind of difficulty to get you have to have clever research agencies from these agencies to make it work but it is used people use it every day because when they use my service we will link to another one they use the view, they don't know it and since the services and the tools are used people use the view every day we have gone recently further in that which means that data providers we put lots of collective intelligence in the building blocks of the view and the data providers themselves began to embed some of the view building blocks in their own archives and services they are not asked to do that but they can do it and we see more and more we have an excellent example in Australia we discovered one last week people who never talked to but they have done their things they provide their data in the view and they have used building blocks in their own services and so this means it works so this is also to say that it's not only data which is interoperable but also tools which means that you can broadcast results we have a standard which allows to broadcast results from one tool to another so you can go from an image to a table of objects and their characteristics just by clicking on one star here or one object here you can find where this object is in a table and in a diagram the protein, the table and so on and this is a spectrum so you can play a lot with different tools and so you can have different views and study objects and study the observations with different tools which are connected to each other seamlessly so I think this is also a very important thing so case for success seen from the science users it's exactly that seamless access to data and also interoperable tools which means that you navigate from one place to another you don't know it's another of the problems eventually you don't know but in the end you get the information you need seen from the data providers because you know we need to have an update it's not so... it's not immediate to update the view framework you have to do some work on your site to be able to answer the queries and so on we made a workshop with European data providers two years ago and we were surprised in a sense to see how many people that many people already took at least part of the framework of the whole framework and we asked them why and they understand very well that it gives more visibility for their data and also I spoke about collective intelligence before they insisted on the fact that when you see the framework and the standards and so on and the tools you don't need to reinvent the wheel because people already worked we worked as big groups there are something like 70, 80 people up to 100, 120 people attending the IVOA meetings and people really work the whole year to build the standards and the tools so these people worked and they proposed solutions and as I said both for data sharing but also elements which can be useful for the archives and the services if you provide a very good visualization tool there is no need for a particular telescope to provide its own one because they can use the one which works well for them also so this saves time so the current status of the VO is that the VO framework is operational at least first you know you can also always improve things so we want to make it better for advanced usage and to gather priorities linked to the needs of future large projects of a discipline I will come back to that so to give you an idea of the methods we have been doing as Andru told you I led and we have had a series of European projects funded by the European Commission and it gives an idea of the methods this is why I give a few view graphs on that here the method used to build this kind of framework so we have national initiatives in Europe there is one in my country but there are some in other countries also and we thought it was useful to coordinate these activities in Europe and doing that we understood that the three pillars to build this kind of things is that you have to provide support for the data providers to help them to update the framework we have to support the astronomers in their scientific usage overview by giving them tools and also by providing tutorials and so on and of course there are the technological activities to update the view framework so we had several family projects the last one is the one which has its logo it finished at the beginning of this year and now we are in another phase which means we have the view operational and we want to do better taking into account the future of very large projects so that their data is usable at best by the community so we are one part of a large project which has been set up because the European Commission put a call to make a cluster of S3 what is S3? S3 is the European strategic forum for research infrastructures so it's a list established at the European level of very large infrastructure for all disciplines you have data infrastructure of humanities in the S3 list and in our case I will show you the kind of things we have in the S3 so the idea is to make sure that the data from this big S3 will be fully available in the virtual observatory so that they are interoperable with the rest of it fully in the international framework and so this is fully aligned I told you that the current IVOA priority is to do exactly the same so we were happy the core of the proposal from Brussels was to build a cluster of these big projects and people agreed that it was important that data was an important part of this clustering so you see here this is the Cherenkov Telescope Array SKA I'm sure you heard about these are neutrino telescopes at the bottom of the bottom of the sea and this is a gravitational telescope gravitational wave telescope which is not an S3 but it's big and it's useful to have it above so what is interesting here I told you in the VO we have the people who develop the VO in IVOA we also have people from Data Archives which represent big guys which have big archives and so we are sure that we develop things which are not in the blue and that this fills the needs of the Data Archives in this case the project puts together the VO people in Europe, France, Germany, Italy, Spain and UK very well trained teams of people who develop the VO framework and also the representatives of the large projects not the archive but the project themselves and their pathfinders because the S3s are not here by definition so we need to have real data and they have pathfinders and these people are included it's not only astronomy it's astronomy and astro-particle physics which include new messengers I told you about neutrinos and gravitational waves ESO, the European Southern Observatory the big telescopes in Chile are associated and ESO, the European Space Agency is not associated but they are working in very close collaboration with us they are the current chair of the IVOA from ESO and we work together so I am near the end so all this was for the big data and for me I don't like big data I like data so I really think that astronomical data is both big and smaller data we have these big things from observatory archives and disciplinary data centers but we also have all these data from publications we began to put data from publications online in 1993 with the European Journal we have data which are tables which used to be printed on paper and we put them on the web so that data can be reused the numbers can be reused there are also images, spectra time series and so on so you have a small table of 100 lines in the same service as a catalog from a sky survey with 2 billion rows and they are accessible with homogeneous metadata so this is data we have 14,000 of these data sets in CDS in the service which is called Vizier and this is what is important for us is that data is validated the data which is on the computer so and so but it's data validated by publication it's fully discoverable and usable in the virtual observatory and it's together with the lab survey this is a map of the sky with intensity depending on the number of points which are in the database for that point of the sky and if you do your metadata well you can do extremely powerful things by extracting information from the rule collection here this is a spectrum for an object which is built from points extracting from the rule data set at different wavelengths and each of these points is linked to the original so you know everything about where it comes from and you can reuse the information so the conclusions I think that compared to when I began the way we do science in my discipline changed completely not a lot completely because data is available online and we have interoperable data and tools so I already told you but I want to insist on it data in astronomy we are seen as one of the big guys but it includes big and smaller but useful and validated data otherwise we lose all the results and I think it's too bad so the research, smaller data has researched results and they are at the same level as the big ones and so to go back to long tail as it is called there is this as a huge diversity and it's supposed to be one of the characteristics of some of the things people call big data and this is part of the game of this to go back to something I said already this is a user centered approach not to keep data not to preserve data this is a collateral consequence but the starting point is to do research so technology and it's not to do technology neither to preserve nor to do technology it's not the point the point is to enable science so technology is a tool not a name I should have added preservation is a collateral consequence not a name at least for us and that is sharing this science with us we spend lots of time at CDS to evaluate relevant new technologies to identify whether they fit our needs but we know very well that we have to beware a lot because people will tell you but why don't you use that look it's a miracle it will solve all your problems and two years after there is nothing left and when you are here for 40 years or more than 40 years you cannot afford to follow the buzz you really have to study what is going on and to choose what is useful for us for you and which has a chance to remain at least for some years you understand very well that in 43 years now we change technologies many many many times so I'm not telling you you have to stay the same otherwise we would have been dead long ago but you have to be reasonable and to understand to make serious assessment of the new technologies and to see if they fit your needs then some elements of what we do are used by nearby disciplines and again I already said that generic elements which are the key points the registry of resources and the discovery and vocabulary it can be interfaced with the generic data framework so now if I go to a more general conclusion sharing scientific data now it can be called open science it's very bad to speak about the G8 in this country I know because you are a powerful country but you are not members of the G8 to my knowledge but I am sorry to say that so I should not speak about the G8 but it gives an idea of the level at which these things are taken and tackled now in June 2013 the G8 ministers of research made a very very strong open data statement you can find it if you look for ministries G8 data June 2013 it was in in UK so you can find it under the British government there is also more and more demand from funding agencies at least for data management plans and so on there will be demand for trusted reposits the data from project is deposited in trusted repositor is very strong so this is moving towards the kind of thing we have been doing for a long time so everybody is moving on that towards that I think we have been pioneers really in those endeavors and we are eager also to share lessons around and one of the good vehicle to share is the research data alliance this is my last paragraph maybe you all know about it but it's good to remind that the research data alliance was founded in March 2013 by Australia the European Commission and USA National Science Foundation and NIST so there are at this floor many people deeply involved in the research data alliance there are something like more than 3,000 members which is a lot after so few such as short time in more than 100 countries this is fully bottom up to tackle all the aspects of scientific data sharing technological as well as sociological so maybe you have heard that before but please have a look and join thank you okay thank you François so we now have time for questions let me start with a question at Stephanie's prompting so one of the comments you made towards the end of your talk was you were talking about data that was validated by the publication so do you mean by that that simply having data associated with the publication means the data is somehow more valid or that there is something else going on I think that it's exactly what I said but the fact that the data is linked to a publication means someone at least looked at the research process not at the data itself you know you can put up a catalogue which has been done by your own way in your own computer and with your own telescope and it's completely different from having a full publication so I really think that there is the publication does not validate the data itself the bytes in the data but at least it gives an idea of a method the reference that the results are meaningful and so it's different from getting a list of a table from someone like this what we do when we get in the in this year the thing we when we go back to you know in this frame here before we put data data comes with a description of course and we check at least some of the values because we have a description we can check some things the referee cannot check but if we have one of the sky coordinates which is out of value for instance these things we check and we also have checks because when something is wrong we have users who complain so there is some kind you know when something is here we have to say that we curate the data some people don't like it but in my discipline people want the data as good as possible so we curate the data we keep track of the changes and this is done that way and I think that at least the fact that the research process is somehow validated by a referee is an important point but in this case the data is validated not just by the publication but also by your archive because you're adding additional value we add additional value the fact that the kind of description we attach to the data is something which explains what are each if it is a table for instance each column has a unit each column we know what it is and so on so we will see it's something in which we put added value and the validation is the science validation not the added value thank you so we have a question why not a 12 month embargo asking is that the book does everyone use it why not 6 month embargo why not an 18 month embargo and what's the starting point is it the start from the date of observation or from the date of lodgement so in that case I think it's something which is practical a pragmatic solution to the fact that we need people to make very good proposals for the preservation time on telescopes because we need to get data which is as useful and as good as possible so we need people to spend lots of time and effort to do the proposals and then the selection process can be awful you know the pressure is extremely high in some cases so we think it's fair with people to give them some time to have the data for themselves because which is made public and it's also a way to keep having people motivated to improve the quality of the data in the end so this is a pragmatic solution I know some people shout at me because it's not normal and so on but I really think it's a pragmatic solution to continue to improve the quality of the content of the data framework of the data infrastructure because you get extremely good data if the proposals were extremely good and then people you have to be fair with them so this is it's not default but it's in general it's 12 months there are things which comes with in some cases there is no embargo in some projects data is immediately available not everyone use it there are still places which don't want to share data because they have a feeling but in many public places when they work with public money they are more or less obliged to share it there are private telescopes in the US and now they share data but it has not been done from the very beginning 6 or 18 I think it's simply pragmatic it gives a chance to write a publication in one year 18 is too much and I think I am not sure about the start date really I think it's the date at which people get views the data but I am not completely sure and the pipelines are done so people get the data quickly so it's more or less from the data observation we have another question asking how can other disciplines emulate the success the astronomy has had is there something magical about astronomy that makes this possible I think well I think that the fact that people decided to share the data from one radio telescope with people with another radio telescope 35 years ago or more was one of the key I think you need to find people who really want to share data for good reasons and that if you are lucky these are the good people and there will be a snowball effect but you cannot so I have to answer quickly because there are many questions so I really think it's up to each discipline to organize itself I am sorry to say that in the talk I tried to show up some of the handles what we have used to make it but I cannot give advice when I am in front of people and discuss with them but there is no magic so if I can be slightly naughty I have heard that one of the reasons that astronomers are so happy to share the data is that there is no commercial value in the data I think it helps a lot usually I write it on the view of the telescope the fact that there is no commercial value means that we do what we wish for ourselves so a question asking about the conventions for citing astronomy data at the point that there is a strong culture of sharing the data and standards for doing it what about citing the data I think that when the data is in vizier which is on the screen people cite either the paper or the vizier because vizier is site ever not with a DOI but with a PID which is from the bibliographic database if you use astronomy data from a telescope people are supposed to cite the original to say that it was observations so and so from that instrument from the telescope so we are currently working on that at the IOU we were discussing DOI we had a very lively discussion on DOIs but there are ways of citing which are simply to write that you have used the data from that telescope observation number so and so and this is something which is supposed to be done by good practice that's a human readable but we were discussing exactly that there are ways we have ops ideas which are well defined and so there are at least PIDs and I think everybody is willing to go to DOIs so it will be done very soon a question about the main data standard that's used in the astronomical community and is that the standard that's used by astronomers in their day to day work or is that a standard that's used primarily for sharing after conversion? No the data is produced in FITS what we call FITS and so everybody use FITS there are tools to visualize and to use FITS so FITS is ubiquitous but correct me if I'm wrong there are different versions but basically at FITS there is the core FITS for images and data cubes it's less well developed for spectra and so on but FITS is really when you have a telescope the data you receive in general is FITS so some things like SKF are trying to use other kinds of formats because it's a bit different with our data cubes so we have also a very lively discussion on data formats but at least in everyday life FITS is very useful and very present and for the benefit of those who have not looked at the FITS format I encourage you to do so it's very nice because it's self-documented and it looks like a punch card because it was done at that time so there are lots of problems to be wider than 80 colors which is very bad when we program in 4K. Yes exactly the Writing Away report which François has contributed to as a member of the Hydrolexa group mentioned the need to develop and use new ways to measure data value and reward contributors how does the IVA engage with their users around the value of data? IVA is not a political organization IVA is just doing interoperability so it's not one of the questions we tackle I think that measuring data value with a kind of citations which can be done we can count number of times things are cited in publications but it's not completely done for the moment and I also have problems with citation because we all know it's not a good criteria, a good matrix so this is really an issue and I agree with what we wrote I think that for us for instance even for CDS it's very important to be able to say that the data was we had this number of queries I gave you we can tell every time there is a table in this year people can know how many times it has been queried so this is one of the things you can find on the website and with full search I can also know that the CDS services have been cited in 1000 papers last year which is a lot but it's because they are full text people don't care to site services in their reference list So if I can use the privilege of the facilitator to ask a follow up question on that I know that the analysis that you showed us at the Hubble Space Telescope showing that more than half of the publications were from data reuse rather than from initial observation that in order for them to collect that information was actually quite painful they had to do a combination of text mining and manual searching and all sorts of horrible things to work out which publications were associated with which data which is what we do every day for Sylvain it's not what we do for HST but there are like variants to that so it's one of the reasons we are discussing DOI's but it's not because we will have DOI that it will be easy to find the information because the whole chain is not set up so we can push to enable an external infrastructure to make recitation but as long as the publishers are not completely organized to do that it's not free enough So no magic answer there for us I think it's magic. So a question about the underlying data in a database that a paper points to may change as time goes by particularly as you have more observations added do Symbatovizia make any particular attempt to handle this changing and the question makes the point that in a DOI based data publication this should not happen I'm not convinced I agree with that I am not sure how do you do In our case for Symbatovizia we are changing data all the day because we add things and we correct errors so we have in Symbatovizia data coming from publications which is wrong we have a wait for people to put annotations on objects saying this is wrong and we correct and we also find errors when we include other data so we correct data because people don't want something static they want something which is as good as possible so we do it. In Vizier we also correct things and we keep an history of the changes and I think that in DOI we discussed this two days ago and it's clear that some people think you never change anything when it's DOI based and there is an understanding of DOI which is you have the current version so this is something which is to be discussed I know people shout at me when I say we change data but I know other it's not astronomers who shout but other people it's people from the outside because they think something is here we will check some and we won't change anything but well so this is a good discussion a good topic for discussion but we are user we follow user needs you know too bad Okay so the pragmatism over purity this is a good question so a question about data disposable do astronomers routinely dispose of data or do you keep everything just in case it would be useful I need an explanation about data disposable so I'm not sure about well so let's pick the easy example so we know that with some instruments like the SKA you can't keep everything you have to throw most of it away I don't think the question is talking about this I think the question is talking about once you decided this is a useful observation of that part of the sky do you keep that forever what happens is that the telescopes keep the data you know most I cannot say but the agencies which built the telescope feel that they are the mandate to keep the data and to make it available on the long term for the moment it never stopped so you know you have all the history of data from the ESO telescope which is online I know for space that it was not probably the case at the very beginning but now there are archives nearly everywhere and open archives so there were more difficulties for instance for planetary missions for some of the space early space missions and now I think it's well said that data is kept by the agency distributed by the agency and open after some time so again let me go back and ask a follow up question do you keep the data forever even where Simbad has looked at it and said it's wrong of course in that case it's the publication you know the rule for the publication is that you don't change your publication in some cases we have in Simbad attached to the object or to the when we find an error in the publication there can be a note from Simbad people attached to the paper in our database saying this is wrong there is an error so we try to keep track of these things but the publication stays as they are because it's the way it is but data is not the publication okay I was going to say the last question but that's in fact not true when data is used from the IVLA is there a process for allocating credit to the observer or the archive and follow up on that are there procedures for facilitating collaboration on new publications based on shared data so maybe the first one first is there a process for allocating credit so I think it's simply good fair practice when you use data what comes from the IVLA is that you get data but you know where it comes from so you are supposed to follow good practice and so I have no other answer but when we send data we get data from the IVLA it's written where it comes from it's part of the data you get when you have this photometric viewer here each point has a link to the original paper and facilitating collaboration I think that on new publication based on shared data I think this is part of routine work of the astronomers there is no tool to do that but shared data you have all the tools which allow you to access the data which is in the archives in added value databases so it's just that the data is here and then you manage the person who asked the question about whether the underlying data would change was motivated by the problem of reproducibility but I think we are less obsessed by reproducibility than other disciplines well in part I guess because it's very hard to reproduce an observation no but the observation the observation is you can reprocess the observation but the observation is the observation I don't think we have the kind of problems in some disciplines where the original material is found to be fake interpreted and so on in this kind of ground so again I know reproducibility people say you have to be able to reproduce as the starting point of open science but not really for us for us it's just to do new science the core point is not reproducibility but maybe we are naive so the last comment in my question box is thank you this is an excellent question of answer session I would agree with that it's a good question it's a good result to speak to a computer but this makes me feel you are wrong yes there's really real people there we didn't queue the questions up so thank you for the excellent questions thank you for as well as for the excellent answers and thanks again for your I think you listened to it thank you