 Okay, good afternoon everybody and welcome. Thank you for joining this webinar this afternoon This is a passionate partnership webinar between the ARDC and ADACs Programs and today we will be looking at the fair data principles in astronomy research We have three speakers today, so I will introduce our first speaker who is Keith Russell from the ARDC He's our partnership programs manager and he will be talking about the fair data principles. So Keith So you should now see a rather perhaps slightly corny slide about Fair in front of the stars. Is that is that up and running? Okay, so Today I'll kick off very briefly talking about the fair data principles I work for the Australian research data commons an organization that is not specialized in astronomy so the perspective I'll bring to this is just a general introduction on the fair data principles and then Later Katrina and Luke will talk about what how the fair data principles have been implemented in astronomy And I'm also really intrigued in what they have to say. So I'll try and keep it short So and I'll try and advance to the next slide that we are the fair data principles. So the the principles were drafted in a workshop in at the Lawrence Center in Leiden in the Netherlands back in 2015 and Those principles were Rapidly received more and more interest and Attention from all around the world first of all they were described in a nature article and we'll put up on the force 11 website and Slowly, but surely organization started to pick up and recognize them and say these are actually really useful way of thinking about how you can Maximize the reuse of research data So I think there's a few a few elements there which are important to us to keep in mind That that sort of have been very helpful in making them so successful One of those is that the principles are technology agnostic And they will work across all sorts of different technologies Another aspect to them is that they are discipline independent There's no they're not drafted from the perspective of one specific discipline But they can be applied across all sorts of different research disciplines Another I think another element in their success has been the fact that they address both Aspects of the data, but also the metadata on top of the data That will enable making the data more reusable And I think final point to keep in mind I think a very important point is that they look at not only look at it only from the perspective of the human wanting to reuse the data, but also from the perspective of Machines wanting to pick up and bring together huge amounts of data analyzing that and enabling Further research across a wide array of data assets So from the perspective, why would you bother making your data fair? I think there's a number of things to keep in mind. Well, first of all making the data fair enables valuable reuse of valuable research data outputs or research outputs and Their research It will enable research to be more reproducible and verifiable it also Makes it possible to bring together and start building up a rich set of data assets and the data set of data assets that you Self have control over but you can also share with others and That can form the basis for collaboration with research partners both nationally internationally, etc. I Think a very important point And that comes out of that point I made on the last slide about bringing together data sets I think that the machine readable aspect in the current day with the huge emphasis on data intensive research I think making data fair and especially interoperable enables novel and innovative research and A final point in the in the current day is also the emphasis on impact and making sure that data has impact and Research outcomes can be translated and be used and picked up by business and industry But also by policy makers in the general pub and general public. I think Fair is a very important aspect in making sure that data can enable that so since these Principles have been drafted. It's been picked up in a lot of different policies all around the world and you're seeing so for example Publishers have in the past already had data availability policies and those data availability Policies are becoming stronger and stronger and Recently that has resulted in a coalition of so it's called cop des the coalition of publishing data for earth and Space sciences has set up a statement of commitment in which they describe there they ask publishers and also data data infrastructure organizations to commit to trying to make more data fair and That's received a lot of interest and now has been signed by a number of publishers in and quite significant publishers too So that includes Elsevier, Wiley, Springer and Nature. So not the smallest publishers Here in Australia Funders have also picked this up and there's been a bit of interest in it And the funders are starting to talk about fair and how that could be incorporated in what the funders are doing The The University's Australia DVCR committee set up a Set up a working group to look at what fair could mean and that resulted in the fair access policy statement Which is now also available online and received quite a lot of attention Internationally funders also looking at it and of a requesting data sharing statements alongside Alongside grants and the European Commission has set up an expert group on fair data saying well What does it mean if we want to make more data fair? How does how would that happen? What what are the actions that would need to be taken in that space and they've recently Released an interim report for comments So what are the fair data principles the fair data principles for letters findable accessible interoperable reusable there There's actually a little quite a bit of level of detail behind those four letters, and I'll very briefly try to unpack that here so Findable in the context of the fair data principles means that Researchers should describe their data and make it well described make sure it has a persistent globally unique Identifier and the thinking behind that is that data does not get lost If you put your data somewhere that even if the data moves that a researcher or somebody wants to reuse the data can still find it Also that should be findable through discipline specific search routes and generic ones the a stands for accessible and Accessible does not mean open necessarily that does depend a little bit on the context So one way of describing it is saying data should be open where possible, but closed where required and in some disciplines that can be important especially when you're thinking about a Sensitive data culturally sensitive data privacy privacy issues related that commercially sensitive data, etc The data should be made Accessible through appropriate roots and one way to do that is to deposit it in a repository which provides those routes to access the data and One thing to keep in mind there is that data sometimes needs different services over it So some data you could see if it's a small data set you might make it downloadable But if you're talking about a very large data set it actually can make more sense to provide the data as a service so others can Approach the data and pick up the specific bits they need for their specific purpose rather than having to download huge data sets Finally, if you are talking about closed data at least provide information about how a Reuser of the data can get access to it and some background information so they know what they're actually looking at Interoperable so this comes back to that point about Data intensive research and how fair data can play a role in enabling data intensive research If you are going to make your data available Make it available using a standard for what file format that other people can understand and use and For the content of the data try to think about using a community agreed vocabulary Hopefully there is one already in your in your community in your research community If there's not it actually makes sense to come together with others others in your Space to agree of a cabri so that people know what the terms are that you're using and there is an agreement around that That goes for the data and also for the metadata on top of on top of your data Finally include links to relevant information about the different players the different people involved in creating the data also the different The different projects involved the publication that was involved that sort of information is extremely valuable Especially if you tie that together in the interoperable fashion, it makes it possible to find that information The last letter in the series is reusable. I always say don't think that FA and I are not about reusable I think it's just the extra bit on top of FA and I that make it reusable so to make it reusable It has to be findable accessible interoperable, but there's a number of further aspects You might need to you need to think about to make it reusable So one aspect there is include not only some information amount What so so people can find it but also some richer information in the background about More describing the actual output. Usually this is more discipline specific information metadata around the data Include information on how the data was created some provenance information And that's extremely useful for re-users of the data to understand what the settings were what the and by which instrument the data was Collected the different analysis tools that we used to actually create the data and the settings on those tools, etc Finally a very important point Assign a machine readable license to the data and we recommend a creative commons license Machine readable so that machines can also harvest the data and find it and pick it up and use it And make sure it has a license because data that doesn't have a license For a re-user is actually really difficult because there's no clarity on what use you can make of the data So wrapping up If you want to learn more about the fair data principles and the different Materials to support you in that the ARDC Australian research started commons over the past years started to collect all sorts of materials in that space We have a whole series of training resources Training resources and materials are broken down by the fair data principles So have a look at those if you visit the the URL there you'll be able to get access to these materials And what we've also developed is a fair self-assessment tool which allows you to Use your you hold it up against your own data set and have a bit of a check and see oh How fair is my data? What sort of actions could I take to make the data more fair? It's an interpretation of the fair data principles and it's sometimes a useful way of looking at How you can make your data more fair? so Thank you. Well, this was just a very brief introduction. I hope this is useful and now I'm really intrigued to hear what Fair means in astronomy. So I'd like to hand over now Thank you Keith next we have Dr. Katrina Sealy from Macquarie University and the AAO who will be talking to us today about the Australian astronomy And fair data principles in astronomy. Thank you Katrina Thank you so much So I'm going to talk about the all-sky virtual observatory and Are we fair? We're fair in this context is exactly what Keith was just talking about So this is going to be a story of collaboration for a shared vision where astronomical data is findable accessible interoperable and reusable and Within the Australian community, this is something we've been thinking about For a very long time and we'll get to that in a few more slides So just starting off. So yes, I am the head of research data and software at AAO Macquarie But I am also the ASBO national coordinator. So I'm just going to take a slight Interlude at the moment and just talk about Astronomy Australia limited and again a lot of what I'm going to say ties back directly to Keith's presentation It's particularly around Funding bodies and governance Around the fair data principles. So Australia astronomy limited is a not-for-profit joint venture organization that has 16 Astronomical member groups and these are groups that are very keen and very active in doing astronomical research It has a board that Consists of astronomers as well as some independent community members and The the board has two advisory committees a scientific advisory committee as well as a project oversight committee So Australia astronomy limited as an organization works to ensure that the funding that comes into Australian astronomy meets the Priorities and needs of the astronomical community. There's an astronomy decadal plan and As well we follow the increased roadmap So part of the governance around Australia astronomy limited is to make sure that the funding goes to the right priority Areas and one of those areas is a research and under that umbrella Australia astronomy limited has various groups working beneath it and Part of that is asvo Coordination role So let's have a little bit of a a chat about The all-sky virtual observatory So this is the Australian part of a much larger international virtual observatory community so going back to the Probably the early 90s the astronomical community started talking about standardized formats for their data and Protocols which to work under so by 2002 the international virtual observatory alliance IVO a I'll talk about that a few times was born and the Australian community has always been part of this international virtual observatory alliance community Now there's no membership fees This is just an understanding and a collaborative environment that all the astronomers want to work under to ensure that our data does Meet those goals of being findable accessible interoperable and reusable Remembering that we were trying to do this long before the fair became fair So we were originally Osvio and in probably four or five years ago there was a Reboot around to the time that AL became became more involved within the research part of the Australian community and We really rebranded Rebooted as the ASVO so there are five Australian nodes underneath this banner and We're all there because we have a shared vision and we want to be part of the all-sky virtual Observatory, so I'm going to have a quick look over each of the five nodes so that you can see that they're all run and organized By five very different organizations for very different purposes So firstly we'll have a look at their merchants and wide field array So this is an international consortium led by Curtin University. I Love this radio telescope. I love the little spider. So that's why you get two of those images there And you can see how that ties into their logo MWA is one of the four SK a precursor telescopes and they currently have 28 petabytes of publicly available data So what I find amazing is that each Observation can be up to hundreds of gigabytes in size So part of the MWA ASVO nodes goal is to take that data and make it into smaller more usable size chunks for the astronomers The next node that we're going to have a look at is Completely different This is the theoretical astrophysical observatory the town node and it's led by Swinburne University So tau has within it data sets so simulations of cosmological and galaxy formation data But you're also able to go there and generate your own virtual Universes so you can see there. There's the box. You can basically set up what you want Put in the different sort of filters that your telescope might have and then produce the data so theoreticians and Observational astronomers alike can use the theoretical data to see if it matches the real world data We then have an optical node. So this is the skymapper node. It's again a consortium This one is led by ANU Australian National University It's a 1.3 meter telescope. It's siding spring observatory And what skymapper does is continually maps the sky So it has different filters and it maps the sky Continually through all those filters, which means you have a huge time base of different observations So astronomers can go back and have a look at different times to see if perhaps a star is exploded or to see if there are Very fast moving objects that have changed position in the sky. So it's a very active virtual observatory or database of objects here Optical datas tends not to have as large a size data sets as the radio telescopes So the total survey size so far is one petabytes of data But it's still very impressive that every second there's a hundred megabits of data being produced The fourth node So we have the CASDA node. This is a collaboration between the CSIRO astronomy space sciences as well as the CSIRO information management technology and pausie super computer groups This is a data archive for Australia's SKA pathfinder It's when it is in full operation, it will have five petabytes of data a year So it's just in its testing sort of preliminary stages You can see 36 of the antenna and some of the images that we get from the CASDA node Now I want to talk about a slightly different node the last four nodes all hold data from one particular telescope Data central started off as the AAT node of the all-sky virtual observatory and AAT standing for Anglo-Australian telescope and you can see a picture inside the dome there and started off by taking some of the survey Data that was taken from this telescope of national significance But now we have over 10 data sets within data central optical and other wavelengths more importantly we house 44 years worth of legacy data taken from this telescope Very soon. You'll be able to just download your pipeline data reduced data and There's other data coming from other telescopes at sighting spring observatory that will do a live data feed-in so data central is a repository of Primarily optical data sets, but there will be others there as well now I'm just going to Go forward and the next Few slides are nowhere near as pretty and as graphic as these ones So I asked each of the five nodes To tell me how they found themselves to be findable accessible interoperable and reusable So this is not yet looking at the 15 fair points. We'll do that as well, but let's just have a little bit of a walk-through each of the nodes now in these tables y is a yes and P is in progress. So usually there's some degree of yes there and we're working towards it So for each of the nodes in terms of being findable The nodes have or all the observations have a unique ID We all are working towards the International Virtual Observatory Alliance standards Follow the tap protocol now I've pulled that out separately. That's actually one of the IVO a protocols. It's the table access protocol It's particularly important because it's a way that you can query using SQL or the astronomy Version of that ADQL you can actually then go and any of the data sets that are registered with the IVO a you can go and query them and That's not just our ASV. Oh, that's all the international data sets as well So it's already a way that you can go out and be findable accessible and interoperable and reuse data You'll see that Most of the nodes are IVO a registered or have their DOI's their digital object identifiers MWA is only just getting their first data releases together. So they're in the process of minting their DOI's and data central has recently gone through a transition from the Government sector into the university Sector so during that stage there was a URL change and a some other sort of Details that needed to be worked out. So we're in the process now of registering with IVO a and working on the DOI's So it was covering off findable So if we have a look at accessible now again a little bit as Keith said before talking about whether your data is open or Perhaps closed just depending on on your what you require astronomers Have open data there might often be a period of time where it's proprietary data That might be typically 18 months, but generally astronomers will share openly their data So all of them are open a skymapper has initially Australian data releases before going global But astronomical data is there for the use. So again being accessible. We use the same standards We're making our data more accessible by putting in the UI's the user interfaces Our data can all use standard data viewers and we'll use standard tools If we have a look at interoperable Again standard data metadata survey IDs. So yes, we think we're interoperable and Being reusable So as I've said we believe we have an open Policy so please take take our data stable data releases and the signed IDs and metadata So that was sort of how we viewed the four terms I then and then very glad that Keith showed this I then asked each of the nodes to go to the ADRC or the and snick to RDS dot org dot au site and look at the fair tools and Actually use the tool to find out Where they sat with each of the fair as per Keith's talk principles or guidelines so I've just said below What each of the The 15 Principles are now that doesn't look very easy to read So I've popped this over the top. So yes, the wise Yes, they may not be as far as we would like to take them But we have them in place and P's are areas that we know we need to improve or are in progress So one of the first areas just to have a little look at is specifying the data identifier Now he spoke about this and this is Something that We do However, the tap protocol being quite an early developed IVO a protocol Does not transfer through the DOI. So if you do a query, you won't get the DOI Another example is data central has a user interface and it's very Configurable so the user can actually choose what parameters get set back and displayed on the screen So you can turn off the data identifier if you wish. So this is a clearly an area We need to then look at and improve if we then go across the a's We can see that we cover off the accessible quite well If we have a look at the interoperable There's a couple of places there to improve Caster and TAL both acknowledge that they could improve their metadata to include more of the vocabulary That we use there is clearly a Lot in the metadata, but this can always be more and If we have a look at i3, this is about talking about having qualified references to other metadata So that's just making sure that you might link off to your your definitions for your vocabulary and so forth So we all agreed that we could actually improve And that area as well So if we just go across to the reusable we can see that we probably need to Again with in terms of licenses That we have two of the nodes there have the creative common license in place But to astronomers we've always felt that our data is open however in the internet world That is not the case. You need to have a license that shows that your data is open So we're working on putting the creative Commons license in place on the other nodes and the last area We really need to work on and tweak is some of improving some of the provenance information in the metadata So taking again some other examples, we might have some recent surveys within data central and The data might be data products. So the raw data is come from the telescope It's then being calibrated and then it's had a whole lot of manipulation done to it So all of that information needs to be within the metadata or be machine readable or Providerable and so that will be the case for a lot of the data But if we have a survey that is from 10 to 15 years ago, it won't necessarily have all that information Fed into the metadata. We might need to then Go back and link the papers that describe the data reduction or the producing of the data products or Pop it into the a read me file Some of the nodes Have videos as well that explains how the data is taken how you can reduce the data where you can get the software to reproduce So it's all about making sure that we have transparency and reproducibility across all the data and that's something that as a community we know that we need and we're strongly working towards So just to kind of pop across to the the last slide that we have here So we've talked about each of the five nodes and we're all working towards being fair or fairer but we'd also like to see ourselves as one and be ASVO fair, so we've got a whole lot of different I Guess activities going on at the moment as a community. We're working on a shared authorization and authentication mechanism Because the five nodes all belong to different organizations For examples CSIRO would need to have their author authorization mechanism to log into their databases data central has its own So we're putting a thin layer across the top so that you can log into data central Then if you do want to query across to some data Casda or MWA you can then be authenticated. So we're putting that in place at the moment We're also working and we've done some pilot testing so that we can work with the distributed data sets that we've got and now I'm talking about through the user interface so not through doing the sequel or ADQL querying So we might be able to do an optical to optical query find that there's data Bring it back open it up in a tool that allows you to overlay and visualize and within the next 12 months We're looking to do some pilot testing with optical to radio data as well And then spreading that out across all the nodes that we have and then across the international virtual Observatory community as well. So not just with our Australian data But also looking at incorporating other large status and we've also got a pilot test going on with ESO European Southern Observatory So international data archives As a community I said, it's a very it's a collaborative arrangement. We're all there because we want to be working together We have monthly ASVO technical meetings. We have by yearly workshops We're working on a coordinate vision coordinated vision that we're all doing together Unifying our websites are one entry portal for everyone and of course as a collective we engage with the international Virtual Observatory Alliance community as well and hopefully will be in a place to contribute some of these tools that we're putting back Into the community So just to sort of find I guess finish up. Are we fair? I would say yes, we are But could we be fairer? Definitely, or could I even say we're working towards being the fairest of them all Thank you. Thank you Katrina. Very interesting Next up we have Luke Davies from UWA and IKRA He will be talking about hosting the next generation of galaxy evolution surveys at the AAO Data Central so thank you Luke over to you Hi, okay, so I assume you can all hear me and see the screen now So there's a little bit of a change of pace from the other things that have been presented here And I'm going to talk about this from an astronomers point of view as someone who actually goes and collect some of this data I'm currently working on two large surveys which are being hosted by the data central node of the ASVO that Katrina mentioned just now So one of these is and the devil's survey and one of them is gamma gamma is a survey that finished a while ago and But we're in the process of bringing over all the data into data central and devils is a program that is ongoing at the moment So we have some different extra challenges that we're working through in terms of managing and running a survey With the help of data central So I'm not going to be talking too much about fair data policies because I don't know a huge amount about it But I'm going to talk about some of the data that we take How we'd like to use that data and how data central is helping us do that So firstly just as a bit of a quick background on these surveys Gamma is basically the galaxy and mass assembly survey. It's a survey that's sort of designed to Measure the evolution of energy and structure in the relatively local universe by measuring lots of galaxies So basically we measure the properties of about 300,000 galaxies in lots of different ways Don't worry too much about all of these numbers But all you really need to know is that we cover quite a large area of the sky And we try and pull together a lot of different disparate data types from lots of different surveys and put them into this one large overarching database that we try and then Link up different properties of the galaxies How devils is this new ongoing survey and you can kind of think of this thing as being very very similar to gamma in terms of its Goals that it's hoping to achieve But it's actually hoping to do this in terms of evolution over the last sort of eight billion years in the universe So gamma looks at the very local universe and devils then looks at the very distant universe And we try and link those two things together And it's one of the quite important things about these surveys is that we try and use the same data analysis techniques The same data types over extensive sort of redshift range in the universe and try and piece everything together So it's important that we are consistent across all of these surveys So just a bit of a general background to try and frame how we might want to use some of this data if you look at The the research field of galaxy evolution and you want to try and understand how galaxies have changed as the universe has evolved There are tons of different things that you'd like to measure about a galaxy So you can measure things like how many stars it has how much dust it has whether it has a central supermassive black hole Dynamic star formation where it lives in the universe Now in order to form this complete picture of galaxy evolution, we'd like to measure all of these things But all of these things come from very different observations So if you want to measure the stellar mass you want to look in the new Optical in the new infrared going through to things like the gas mass that you measure in the radio You need spectroscopy to measure where the galaxy lives in the universe So actually measuring all of these different things requires you to use different telescopes and different instruments So within gamma using the AT and this Katrina mentioned Recently and with devils using the same instrument we measure the spectroscopy But then we also compile together all of the data from all of these other Facilities and all of these other wavelengths to try and build this complete picture of galaxy evolution now these Different facilities and telescopes have completely different data types that are stored in very different ways Which have lots of information stored in different ways and catalogs in different ways It's very very tricky to try and combine all of this But if you want to do interesting things with science You have to actually draw lines across this diagram and link up the different data types that we have So only by being able to combine all of this in some massive database Can we actually do the really interesting cutting-edge science that we'd like to do? So what are the kind of data products that we measure within gamma and devils? So firstly one of the primary things we measure are Imaging data sets. So we basically have a series of very large fits images Coming from different facilities. So on the right in this image here This is just a sort of brief pictorial representation of some of the the data that we have each one of these images Represents about an 80 gigabyte fits file showing you different wavelengths of light as you go through from the UV into the far infrared Now these images sometimes have very very different fits headers. They have different pixel scales They can have different rotations and sizes and will coordinate systems applied to them as well We then also have a huge number of data tables So this is just a list of some of the tables that we currently have in gamma each one of these links on the left Is a different data table, but each one of these can contain sub tables as well So some of them have up to four or five tables that sit nested within them We also have non unique matching between these tables So some are matched on a base catalogue ID, but then everyone's have multiple entries for the same ID Which I can explain in detail if people are interested We then also have descriptive files that go with each of these tables And there's a minimum of two per table So one that describes all of the columns and the the ucd's and things and then one that is a general more basic Description of everything that that happened to generate the table Then we have spectroscopic fits data files as well So these are essentially 1d fits files which describe the spectroscopic data So these have very different headers and things as well to the fits imaging data They also come from many different facilities which have different resolutions and different headers and things So we have data that we compile from the at But also from the Sloan Digital Sky Survey and things like said cosmos, which is run on the vlt as well So we have to try and combine those into some consistent manner And then finally we also have lots of png images which display data products Such as sed fits like you have in the bottom right here But also ones that describe group diagnostics and stellar mass fitting and things like that So there's quite a lot of different data types and data products that we produce for both gamma and four devils So I'm just going to return to this slide a few times through this talk to talk about some of the problems that we face in Hosting and serving this data to the community so we can maximize the scientific output of these surveys So I've just on the left here Written out a few different things that we might want to have from our data products This is purely focused on trying to extract the maximum science that we can So we really want things like easy access for team members, which is fine But we also want to have things like Um being able to cross match data products across these surveys and with other surveys And also things like being able to manage the documentation and new data products really easily I'm then going to just compare this to most surveys sort of pre gamma What gamma currently has Before the move of the data central and what we now have with data central the bottom So last column on this is kind of the stuff that we have for for devils as well and we're building this as we go along So just in terms of most surveys Most surveys currently in astronomy Uh are focused towards access to the team They set up their their surveys and their samples and their data access so that their team members can access it But they don't go much further than that Some do stss is a good example where they've done this Well, but I think this is something in gamma We've tried to do and tried to go further than than previous surveys in terms of hosting and serving the data Most surveys don't have an easy intuitive interface. Some do have restricted restricted public access Which is built into their their databases, but a lot don't And then pretty much everything else on this list most surveys pre gamma don't have So there's no way to easily combine the data across different Um facilities and match up with other surveys. There's no way to provide a long-term to stable data access and things like that So what we did with gamma um was we actually have our own bespoke data access portal that we set up This was designed by Joe Liske who's at Hamburg University now And it basically allows you to access all of the gamma data But in a very isolated sense So there's no way to combine this data with any other facilities or any other surveys It just allows you to pull out all of the things that we've measured from the gamma database And this is great if you're trying to do a project just within gamma And it's also great if you're one of the team members who know exactly what all of this means But it's very difficult as a independent user who's just turned up to find the right things that you need to use And there's a lot of documentation to read and it's a pretty steep learning curve We also have a big issue on here that the documentation is provided as a text file and then ingested by a single person And then it's completely uneditable by the person who created that documentation without going through quite a a long drawn-out process And then finally I think it's probably most important For us here is that we have a single point of contact for this entire database So if this person is very busy and can't do things on a short time scale Then things don't get updated on a short time scale and this has become a problem for us mainly because Joe is incredibly good at running these things But he set it up when he was a postdoc who didn't have much work to do And now he's a professor at a university who doesn't have much time to do this But nobody else understands how it works and can update it And that's actually caused a bit of a sticking point within gamma in terms of getting our data out there We also do have a cut-out service which allows you to Pull out individual images of sources from the large Imaging data sets that we have This was essentially designed by myself and Liz Manning who now works for data central So it can kind of be thought of as a precursor for some of the things that data central is doing Um, it basically is a base level from where data central started in terms of their cut-out service But once again, it's very isolated. It doesn't link up with any other surveys It also has a single team logging for everyone So we have no idea who's using it and how they're using it and things like that Which is awkward and it's very hard to go back and replicate anything that you've done afterwards So it's kind of useful as a base level It's kind of I would say better than most survey teams produce from them by themselves But it's way below where we should be in terms of making this data with fair data access and usage So just to return to that slide and to say what we had before data central We definitely had access to our team members. We're getting better with this easy and intuitive interface We do have restricted public access for parts of the database Um, you can actually start building up some quite large samples and looking at individual sources And you can cross-match data between The gamma data products but not outside of them We can use this cut-out tool to extract individual images from these large fits images But then below that we're struggling because we don't have Easy to access and manage data products. We don't have stable long-term access We had a problem that our data service went down They're stored in one place at the moment and they went down and no one could access the data for a long time So that was a big problem and then the bottom one when I say latency for problems with people I mean just Joe being busy and not being able to do stuff which causes us issues So Since we've had that for a number of years We're now moving all of the data and all of our data accessing gamma over to data central So data central allows us a lot of functionality to be able to solve some of these issues So firstly as I mentioned before the cut-out service that we had for gamma has now been moved over to data central It works in a lot better way. It's quicker You can actually run much larger queries You can access the data in a more easily way and it's much more standardized as well In terms of the one that we previously had we already do have some of the devil's data In this cut-out service as well, which we don't have anywhere else You also have a query form where you can query all of our catalogs Cross-match them all together and find all the documentation from those catalogs Which works in a much easier way on the gamma database as worked previously And then there's also a cone search which allows you to pick a patch on the sky and find out all of the gamma objects that sit within Um that particular region of sky, which we don't currently have in gamma at all anyway. So these tools are basically Replicating what the gamma database has done for a few years, but doing it in a much better way and adding more functionality on top of those The other thing is is that you have user histories and log-ins and things like that that we don't have currently Um, so it basically provides you with this ability to to reproduce what you've done and to actually check how people are using The database as well And then finally because it also has all of these other surveys as well linked into it You can easily cross-match between the surveys to do interesting science cases So we're no longer limited to just the gamma data products when we um want to Identify interesting samples for doing our science and pull out interesting data products There's also document central within data central as well Which allows us to update all of our documentation for all of our data products So as before we would have uploaded a text file and it would have stayed static forever The person who actually created the data product can now log in and change documentation update products and things like that Which is super useful and it also has version control and things so we can go back through and find out what people have done And most importantly, we don't have this single point of contact anymore because anyone who's allowed can log in and change the information themselves, which is great So returning finally to this table I think pretty much in terms of the things that we want on the left hand side in terms of serving our data and host To get in a useful way data central covers all of these I would say In helping us to Serve and support this data, which is great So we're very happy with how things are going with this interaction with data central One other thing that I wanted to point out in terms of how data central is helping us They're also helping us in terms of devils with the observing that we're doing We have a kind of weird Mode of observing with devils that hasn't really been used before In terms of astronomy and it's something called this nightly feedback observing mode Traditionally in astronomy people would go and they they take their data and then they go away and they reduce the data They look at it They analyze it They get all of their results and then they publish a paper and that process can take anywhere from weeks to months to years And within devils we actually operate in this way where we take the data on One night and then we actually reduce all of the data analyze it all Measure everything we can from the spectra update all of our catalogs and then re-observe the data the next night So we try and condense all of the process that we would do over probably a year In a traditional sense into about 24 hours and that causes us quite a few problems which data central is helping us with So in this process we observe the galaxies at the at we then reduce it automatically Combine it with all of the previous data that we have in our archive We measure red shifts for all of those galaxies We then work out which galaxies have secure red shifts and we chuck out everything that has a secure red shift We then keep all of the sources that don't have a red shift yet We rerun the software to actually decide which things we're going to observe next the next night We prepare for our observing and then we observe the next night Now the the complicated thing is that this part of this diagram happens at the siding spring observatory at the aat And this part of the the diagram happens here where I am In Perth at Ikra mainly because we need to run all of this software on our large machines that we have here We need this process like I said to happen in about 24 hours. So this causes some complications So just to describe that in a bit more detail When we have an observer at the telescope, they basically take The the data that we record on the night This gets passed to our server machines here in Perth We run all of our pipeline to analyze all of the data and reduce all of it We then generate the files that we need to do the observing the next night And they pass back to the telescope so the observer can use it the problem with this In terms of getting it done in this short time scale is there is a firewall between the aat and the outside world Which makes this very difficult. There are ways to get around it that are slightly complicated But I didn't know how to do it in a very easy way But we needed to operate in this mode and kindly the the people from data central said yeah We have a solution that allows you to get away from this and adds you a load of other functionality as well So what we did was we stuck data central Basically in the middle of this process and linked up the telescope and our database here both to the data central functionality So what then happens is that you upload all of the data from the telescope through the data central own cloud Which you can see on the bottom right here This is then Automatically synced to the database here, which runs the pipeline makes the fiber configurations They automatically get synced back to data central and then synced up to the telescope again And this whole process takes about 12 hours So they basically have put themselves in the middle and allowed us to easily transfer this data from the telescope But this also allows us a load of extra functionality in terms of using the data So we basically can access it Via mobile wherever we are and we produce a lot of diagnostic plots to tell us how the observing went So I can now Log in on my phone and just see that all the data was reduced correctly and everything looks fine from wherever I am in the world And then also gives us a latency for if things break locally So we had a problem that on christmas day Our server machine here went down and I was running around in the middle of the night trying to sort Stuff out to fix everything and then I realized that I could just download all the data that I needed from Data central run the pipeline and upload it back up to data central and it solved a massive problem that we would have had in terms of Breaking the observing over christmas So the functionality that's provided from data central in terms of actually running the survey as as basically saved us from from Having to lose a night's observing over christmas, which is really useful So just in summary Gamma and devils are these two large surveys that are aimed at studying the evolution of galaxies and structure Over the last seven billion years to do this we compile these massively large data sets spanning loads of different data types with loads of different intricacies and and weird things we have to worry about To maximize the scientific return of both of these surveys We need to be able to combine it in loads of interesting different ways And we want to link up different data sets from different samples and things We want to be able to update documentation and help run observations and things like that And then importantly we want to be able to provide this data to the public in an easily usable and understandable way Data central is helping us with all of this functionality And actually adding things on which are better, which we haven't actually asked for which is great and then finally most importantly Is data central doing this actually frees up more time for people like me to do science So I spent a lot of my time over the last few years Writing stuff for having data access and being able to provide data to the public in teams and things And now we have a dedicated team of people who are doing this for me Which means I can go off and do more of my my own work, which is always great And cool, and that's about it. So I'll leave it there Thank you Luke There's a question to Luke. I was just intrigued When you were talking about talking about combining data from different instruments and different wavelengths and bring that together Is there an astronomy as sort of a standard in which you say well that part of the sky that section actually has a Sort of unique way a standard way of describing it so that across all sorts of different data sets you can say well I actually want that little bit Is there a standard around that? Currently, uh, it's it's what people like Katrina and her team are working on And it's very difficult at the moment most surveys in the way that they've worked in the past is that they've completely Worked in isolation and then they tried to sort of post engineer a way of combining their data and they usually leave it open to um individual people who are doing individual science projects to do it themselves Now what this does it brings in a Massive problem in that anyone who does anything slightly wrong gets the wrong answer And even teams who are doing the right thing but doing it slightly differently get different answers as well So it's incredibly tricky. Um within gamma. We've actually tried to do this ourselves A bit and I think we're one of the first surveys that's actually put a huge effort into trying to standardize data That's in the same area of sky so that you can link it together in a nice way Um, but we I think the the the problem's been is that we're Astronomers who don't know much about doing this type of stuff and we're we're trying to do it ourselves And we're trying to make sensible choices, but trying to adhere to things like the fair policies is just It's probably not high on the agenda of an astronomer when they're trying to get some science papers out So now people like the ASBO here and Katrina are trying to do this for us And actually make sure that we do in a standardized way And it actually just it it really punts things out of the way of us from doing them So that they get done properly and it allows us to do science much more easily Like one of the benefits that came from gamma was that you could just Trust what have been done in a useful sensible way and then just do science really quickly Um, but we've moved into a point where we can't do this anymore with gamma There's not enough people working on it and there's too much data and too much too many problems actually as well for combining stuff So having people who can take over who are experts in doing this for us is great So just to add into that so we're working with the any new surveys teams just to make sure that they are trying to fit in and use the same Methodologies the same ids the same ways so they can all talk to each other But there are other ways that you can search the sky You can always use we have a standard coordinate system So if you know you want to look in a certain part of the sky you can actually cut out Pieces across the different surveys So there are different ways that you can then build up or look for what you're after in the sky But yes, we going back to Luke. Yes, we are trying to Get standardized ids or names In certain areas. I mean there are some Um Some some objects that have standard catalog numbers that you can or names that you can search on But there's so many different ways you can actually look for things in the sky It's kind of interesting as well because Gamma and devils I mentioned them are two very different use cases in terms of data central in in how that whole process works because gamma was done in the past before we had all of this on our minds And we we did things how we thought it was best for us within our survey And that means the data central now has to go back and post engineer all the things that they want Want it to be within gamma whereas with devils we're observing right now and we're we're taking the data so I'm working with the team there so that I actually structure the data and provide all of the information that they need As the data is coming in so we're kind of working in this transition where we we Focusing on making the data more usable right from a ground level with the new surveys that we're doing Absolutely, so we're very thankful to teams like lukes so that we can actually take it It's much easier for us to work with everyone going forward than trying to to retrofit But we will do the best we can And is there a is there an international standards bodies that ivy ivy? IVO a In that that's sort of the the organization that will sort of come up with community agreed standards and approaches So yes There is but again, it depends on the individual surveys as to how they talk about the sky But we can always find things in the sky from a standard coordinate system and and so forth So this but there are ways to make it easier I think one of the the toughest things in in astronomy is actually Convincing people that you've done it correctly I think that's the hardest thing because people will always they're in this mindset where they do something themselves And it's always been a case where you just download all the data and you just do it all yourself And then you trust yourself to have done it right whether you've done it right or not You kind of believe yourself And the problem is there's a lot of skepticism for these big archives and asbo tools that people haven't done things correctly And you have to kind of build that that reputation So it's quite tricky to to convince the world that you've done the right thing So coming up with some standardized way that you can show gets papers published and works and is the right thing to do That's kind of the the tricky bit But you're right, ultimately though We want to have the whole sort of provenance within the data So people can go back to the very beginning if they want to go. Oh, yeah, they got it right So we want to have all of that in there Do you make your code publicly available as well so that as well as the documentation and the data you can see the code as well Yeah, yes, everything's available Yeah Yeah, we're within the survey teams. We're actually moving towards that But I'd say when we're not at the moment in terms of making everything publicly available There's there's I'd say within gamma there are a lot of things which are Hidden that people take our word for that we've done it correctly And I think we've built a reputation as a survey team that people believe it But I think we are moving into that realm where we we try and make everything public as well now So people can go back and redo things themselves if they want to Okay, well, I think that's that's time now. I'd like to thank all of our speakers again. Thank you very much And that concludes our webinar for this afternoon