 Good morning everyone or afternoon depending on where you are. My name is Matthias and I'd like to thank you for attending this webinar I would like to begin by acknowledging the traditional owners of the land on which we are all On which we all are on today for me and Perth. That is the Wajuk people of the Nungar Nation I pay my respects to their elders past and present So the Australian research skilled workforce summit took place in Sydney earlier this year on the 29th and 30th of July And we are pleased to bring you a reprise of some of the discipline specific sessions Today that will be the sessions delivered by Leslie Weiborn and Robert Shen on geo science and astronomy respectively After each of the presentations will have some time for questions And you can use the question box in go-to webinar to ask them So first up, I'd like to introduce Leslie Weiborn Leslie works collaboratively with the National Computational Infrastructure Australian Research Data Commons and OSCO. She's currently chair of the Australian Academy of Science Data in Science Committee And on the AGU Data Management Advisory Board over to you Leslie Okay, so This one is about meeting the fair data skills challenges in the earth's in the Geosciences and I can get the thing to regret Just giving you an outline As some of you are aware, the earth sciences have been leading in a project with the publishers Whereby you will no longer be able to put data in a supplement It has to be in a repository and linked by a DOI And have a landing page and we're finding it's bit of a challenge I'm calling it the IR challenge the interoperability and reusability And so I'm going to present some work. We went back I've been working on for quite a few years where we actually said well There's something called a hybrid if that's person who's got a foot in both a data science world But it glirs well understands the domain and to ask to move forward on the IR challenge It's how do we quantify and grow the hybrid? And I'm just warning you each section is going to start with a boring trip down memory lane and So the biggest challenge for the Geosciences in this AGU fair project I'll just give you a bit of background to it The driver for changing how the sciences manage their data actually came from a professional association The American Geophysical Union and it has in this position statement that earth and space sciences are world heritage Properly documented credited, etc. They will help feature the sciences understand the earth planetary and heliophysics systems So if I go down memory lane, it's a centenary year for AGU and When AGU started computers and the internet had not been invented Most data was analogue and was published in research papers With the end of the computers in 1940 data generation started become Automated and with each new generation of instruments and I actually watched this happen Resolution of data and the volumes grew exponentially and by 1970 it was hard to publish all scientific data in typeset tables Data was accessed by the author Then our supplements and we say this is the dark ages of scientific data because we lost so much So again just to remind you some you probably don't know what a typewriter looks like and on the left You've got what was typical a typeset table in a volume and this volume here had all Australian Geochemistry back to the 1890s and I would argue that in probably six months a laboratory of today would generate the data that's in this and you can see where the disconnect came and So we had this fair data probably project where we were going to a line of publishers and repositories to make this connection between the data and Help them enable fair and open data and to create workflows so that researchers would have a common experience When submitting their paper to any leading earth and space science journal as some of you know Every journal does that differently, so we're trying to get a bit of coherence there And what we wanted to do was accelerate scientific discovery and enhancement of the integrity Transparency and reproducibility of scientific data that goes with publications and the results The fair principles which also apply to software are that as you all know You will assign persistent identifiers provide rich metadata and register in a searchable resource It's accessible by retrieving through the idea using a protocol or metadata Even when the data are long no longer available The interoperability and the reusability that's where you have to have the standard back cover is ready to your domain and Reusable means the metadata has to be accurate Have provenance and use community standards and a lot of things that people don't realize was fair is that is emphasizing not just human readable But machine actionable data and it's the machine actionable data is where the training is desperately needed So these are the groups that were involved in this project. You can see the communities on the left This is an international project and three Australian groups signed up for it NCI Oscar from IRDC As well, we had the publishers involved and some 300 international repositories And what we wanted was the publishers to adopt a common policy which agreed We're not going to do the supplements and that it's documented and preserved The repositories also had to come on board and be able to support authors and researchers by providing Services that ensure that they can meet the requirements of the publishers and Then above there was the all there was the researchers because they needed to understand How to share document and reference the data particularly that that supports its scholarly publications So we then realized that training was a huge need in this project if it was to succeed and The first thing we found and but involved with was the earth's science information partners data management training clearinghouse which IRDC by the way is now collaborating on and So this is the home page and on the right you can see well, you know What do you want to train on and what we've done is globally using crowd sourcing Obtained links and then vetted Training materials that are available online for all these Activities related to fair So we knew that there was sufficient training to help people get going the outputs from the project are here on this web page which I've got at the top and The key thing is this Commitment statement, which is where people are agreeing that this is what they are going to do and So you can see how a lot of societies signed up for it In my field the USGS and the British Geological Survey have signed up for it and in Australia We have those and Federation University. We're pleased to announce has now joined in and And here are the publishers and the publishers already starting to fight because they're actually knocking back papers From some of these journals because the researchers cannot put their data into a Requisite treat. I don't know how to I don't know where to start So this is our training gap And so what we thought of is this pushing impact onto the researchers. We did a survey and Asked what are the main problems and They all felt that again the interoperability and reusability because the requirements of having these people that are in both domains Was not very but they're just not there. Okay, and So what we were trying to do is that we wanted to get the researcher communities Get them into areas where they can make these and then they can make the data machine and human accessible It is the machine accessible that is causing the grief So whoops So what are the school sets for the fair data principles? We felt that the finable came mostly from the librarians and they're very good at assigning Identifiers providing metadata and registering in a research searchable resource Accessible a lot of that's just protocols and machine actionable and that's from the computer science But as I said, you need what we call the hybrids to be able to do both. And so the key question is How do we recognize the hybrids to help us get to the eye and the art? So a trip down memory lane and this is a wonderful publication If ever you want to dig yourself into it and it was Microsoft research towards 2020 and I remember when it first came out And it's amazing. Wow, what are they gonna say? We're gonna do in 2020 and you can see that they are saying Scientists need to be computationally mathematically literate and by 2020. It will be simply not possible to do science without this literacy And I think this is actually changing because I remember when we first started building our first in the earth sciences Virtual laboratory there the skills just were not there But they're coming on and what we're trying to do now is to harness people developing the right skills and This hybrid was recognized in this other Seminal document the fourth paradigm and it said we need two new branches of science in every discipline computational X versus X informatics known as the X factor I have domain expertise, but they have sufficient computational skills to do this bridging and We're going to need these hybrid functions going exponentially and again another 2020 working multidisciplinary teams that combine the domain the computational and data expertise on is going to come to the forefront So this is how great portrayed it as what we call the X factor and it's not your neck to the earth sciences It's in every domain we work in And so what I'm trying to do here in this next figure is to kind of portray this idea of you know You have the pure scientists and you have the computational, but these hybrids sit in the middle And they're a bit of both and you know, how do you grow them because we don't tend to in Australia? Have courses that are trying to train this group of people and Again, this is just my trip down memory lane. I can remember when I was doing this kind of work in the 80s and 90s Yeah, well, you know, I was just the scientists to do a bit more on data than everybody else did But by about the 2000s we're saying, oh, well, you need to put them with computational people but that kind of didn't work because You know, you just didn't get that crossover and So we started to see the 2010s people who had these dual skills starting to come and if those papers that I Read were true then they're going to be a lot more of them But how do you grow them and so this is just a piece of work I did in 2012 Now this wasn't related in Geoscience Australia It was not related to the fair training, but it's equally as applicable Because we're trying to understand what skills do the hybrids have and how can we recognize them to accelerate what we were doing? And so what we were doing was running this pilot to assess if there were business benefits to GA in applying Advanced ICT technologies, particularly HPC to enhance scientific outcomes The pilot was successful. We had a wonderful team, but we couldn't quantify the skills And so the professionals in GA's human resources team agreed to come and work with us and Look at the skills of those involved in the pilot to see if we could actually understand what we needed to put emphasis on in our training And so really what we're saying is that we're at a bit of a time in change And one of the things I argue is tall ships failed because they couldn't get any bigger mass than the tallest tree on the planet And we've got oceans of data and we can no longer handcraft solutions. It has to be machine to machine And so we're not thinking about what we're doing. We're asking carpenters to become mechanics and that's why I brought HR in so We the questions were designed by human resources and they generated Documented the data gathered and each participant was asked to define what they thought were the core elements of the research and Then we went through systematically and looked at the qualifications job experience skills knowledge behavioral attributes are more importantly We actually said to them if you're gonna keep doing this work or organizational support do you need? Now look, this is not a statistically viable sample, right? So I'm a scientist. This is not statistically viable We had 12 scientists in GA and 5 from the sorrow technical team But the results were wonderfully internally consistent So the core elements that both defined as being able to connect to electronic data Enable scientists to work on the data rather than on synthesis of the sub-sample data able to do probabilistic analysis with multiple scenarios and It's something where you can do technical innovation and I love this quote. It requires computers to enable it in humans to drive it We looked at the academic qualifications and both teams for BSE to PhD What we had in the science team we noticed was that compared to normal science graduates There was at the time there was much more qualifications in math supplied maths geophysics With some computational science modeling in numerical Meanwhile the technical team that implemented the solutions Had some scientific qualifications, but much more emphasis on computer science software engineering than in the science team So on the job training, what have they done since they did their skills? And they had done more computational programming skills in this GA science team Spatial skills data analysis The one thing that was really unique to them was that they were thinking on a bigger scale Which I guess when you're trying to introduce HPC to an organization. It's kind of important There's technical teams experience was in information systems design But they had also done some courses in physics to understand the problems that we were trying to address They had done spatial data design Engineering, but they were very experienced in developing a research tools and applying new technology So that's probably why this went very fast The overall characteristics I Said that they thought people needed to be intuitive logical non-linear thinker willing to try new things early adopter The technical team was really interesting because they've had these teams, but they emphasize teamwork But the ability to listen and communicate and actively be there with the scientists to understand what they were doing The organizational support desired by the teams and they both said you've got to have an organization It's willing to jump around and try new things They have to be an organization that will foster the early adopters You needed a specific research team, which we had a small specialist team trying to support the scientists and Recognize the high-level skills that the developers needed to do this. It wasn't just any programmer And the organization had to recognize that you do have people with a foot in both camps I know we went through a phase in GA where you could be one or the other And we weren't rewarding the people that were sitting in the middle or actually almost critical getting this thing off the ground And I love this quote from sorrow our team is built on ex sciences or reform software engineers who are trying to bridge that gap every day So in conclusion Right now that you science have a huge challenge because of the fear We have trained get me the f in the aid that the iron the art is a massive gap because we just don't have The people there a minority who can develop the frameworks Understand the standards the community vocabularies that support machine actionable transdisciplinary research And so again, I mean this was just an experiment We did and brought in professional HR people to work with known successful teams And I know across the ARDC we have more than a few successful teams Is it worth start interviewing the way we did seven years ago to better understand what makes them tick and work ahead Blown more of them. Thank you Thanks for that Leslie. Yep. Thank you. So we'll hold until the until after Robert's presentation for questions So now I would like to introduce Robert Shen Robert received his PhD in information technology from the University of Sydney in 2006 and joined astronomy Australia limited as a senior program manager in 2016 before that he worked at the Australian National Data Service for seven and a half years Over to you Robert Today, I'd like to talk about leverage a deck to support astronomy skill development So as I was a acronym a deck stands for astronomy data and computing service Before I start talk about perhaps given by contract. I need to give you a quick rundown a L L is one of the increased facilities. We are set up as not for profit a company Our member are Australia universities and research institutes with a significant Astronomical research capabilities. So so far we have 15 members covering the whole astronomy research areas Before I'm talking about a dex, I perhaps should talk about the Decado plan So the in 2016 the Academy of Science released a second Decado plan called Australia in the era of global astronomy Which has five equally weighted a priority and one of them is world-class HPC and Software capability for large theoretical simulation and the resource to enable processing and delivering a large data set So this for your information, this is second Decado plan and this is also the first time the Decado or Astronomy has a one priority related with data and computing to better addressing this priorities So L commissioned a working group called computing infrastructure planning working group By drawing data computing expert from research units and also from the research sectors So working group work together and create a report in later 2016 called computing infrastructure planning working group report Which has a set of concrete ideas or suggestions and guidelines for a guide ails future investment In summary, I think the report can be categorized into the key two key recommendations First one is suggest a to set up a dex to provide astronomy or in the training support and expertise services Second one is to invest into hardware, which is invested seven to fifteen million dollars every five years towards data and computing resources With limited funding and the budget al start set up a dex We started to stage process stage one called your I express interest So general purpose for your eye is we try to see how many research providers are king or happy to partner with a Provide astronomy oriented support and services we received a lot large applications and the panel reviewed and Select a few move to the next stage, which is for request for tender So general purpose for request tender is we try to evaluate So which research service provider research service provider is capable to provide astronomy oriented service As a result a dex was officially set up in early 2017 So a dex currently has two nodes one node in Melbourne hosted at Swinburne University Professor Gerard Hurley is the head of Swinburne node second node is in purse John the lead by Jenny Harrison from posi and Andrew roll from the curtain professor and roll from Curtin University The general aim for a dex is to try to provide Astronomy oriented data service and aim for enable astronomers to maximize the data and the computing investment Currently, I think a desk can be categorized into the three service component Service one as similar as organizing deliver is training and as usual, you know, a dex is also offering the mix of face-to-face online learning Hexone and other style events. I should have saying before I'm talking about the detail I should say why a dex need to deliver the training given, you know, nowadays the University deliver the training ARDC also deliver the training the short answer to this question is we feel there's still a gap based on our Survey to the astronomy communities. Let me take a quick example So when let's say Python is very popular these days When institutions organize a training, I guess institution perhaps will go for Python while one or advice the Python trainings But when a dex offers training a dex perhaps will focus on extra pie Which is a more astronomy focus the package to Python language Processing the astronomy data set So that is why we aim for a dex to draw to address specific a gap in the training areas Not just to repeat or reinventing the wheels A dex as I said before offering a set of training. This is the first This is the one of the, you know, the face-to-face training organized at a Astronomy and new scientific meetings I reason for me to use these pictures I should say when I first heard of this while I was a little bit worried because the training time is on Friday five o'clock to seven set five to seven p.m I thought after a week's meetings everyone would just want to go home instead of attending this training Until I received these pictures. I was a bit relieved. Seems, you know, the community is quite a passion on this sort of the face-to-face machine learning other trainings This is a second type of training, which is a dex partner with industry partners to organize relevant training In this one a dex co-host these events with nabida and also with arc center of excellence for gravitational wave discovery To organize the training on CUDA For some of you running the hpc simulations, you know, CUDA is very popular for gpu cluster programming optimizations And this is one another example. We are not only deliver training ourselves But also partnership with our industry partner and other organizations to deliver the training This is another example. Last year a dex organized the hack song event You know this typical that was astronomy various data available To enable astronomer work with data computing experts my term you can call the it developer, you know, whatever terms To squeeze value from astronomy data to make new discoveries Of course, a dex is also offering In this way offering outreach events together with purse observatories So this event is called cloudy skies. So if you want to know what Jupiter looks like, this is really good good opportunity for you to Have a set of the images from NASA's Juno machines available from Juno camera Enable you to apply machine learning or other different algorithms to processing the images And at night you also have the opportunity to go to purse observatory to observe the Jupiter's life Personally, this is one of my favorites. It's mixed with the learning and observations Of course, we understand there was no one solution fits for all. So a dex also offering online learning systems If you log into this url, you will see the extra introduction to astro pi To the pi song to the machine learning and to other slums and other parts of trainings Of course, you know, if you are a big fan for the youtube a dex also has youtube So for you to watch some short clips while you are not that busy If you are developer a dex also has github pages to all the relevant training materials are available github as well The last bit under the training is a dex also offering the internship. Um, this is great. I might another favorite So nowadays, you know a dex perhaps is more towards the biggest addressing the biggest science bigger data than needs Which um astronomy is quite popular in a way to addressing the phd students from astronomy society working on the real data computing problem and hopefully this can help new windows for their careers In fact, um, recently, we are really glad to hear after a few uh rounds of internship There was a few phd students from astronomy working on the different the big science areas That's the training second a desk component is called the national support Which is one of the largest investment for a dex point of view The idea is embed data computing expert to into the research team for the short term and also for the longer term It's could be, you know, the data computing expert simply just starts, you know, smooth up your raw data You know, maintain the device for you or even moving to the, you know, optimize your data pipelines So have a seamless data pipeline suggestion Or optimize your hpc code to running your efficient hpc simulations So as I mentioned before at this stage a dex offering the supposed short term and long term when I say short term Is up to six months when I say long term is in bed at least 12 months into research teams At this stage based on the, you know, the overall subsequent rate, it's really popular service both short term and long term a dex also offering the data management collaboration platform maintenance on this one So id is we are collaborating with the astronomy major publisher pasta So enable research to deposit your data set into the gd mcp behind the gd mcp is c can repositories So enable you to deposit your data set and, you know, to In the future, you can just publish your data when at once your general is published The last service component for a dex is to the national infrastructures So the general aim for this one is to make sure there will be sufficient data storage and hpc resource available for the h astronomy communities um for over the last few years A or partnership with all star to offering all star is swing by hpc service Which is hpc or for designer for astronomer or for a maintained by of astronomer and serviced by For the mainly for astronomy communities For the last few years a was responsible to Purchase the 7.4 million cpu hours available for astronomy communities Recently, we are also partner with ncr and negotiate additional 10 million cpu hours available for the astronomy communities Of course, we are also understanding the cloud computing is for quite a popular So in the near future, we're also for offering the cloud computing resource available for the communities And before I finish the talk talk about the future work. So um in the next two years We'll invest a large amount of more than more for more than five million dollars to the data centers So we're hoping a desk can better support the data center development Particularly for example, we are investing into the gravitational wave dot national data center Adex is becomes a body to hide the star data computing experts and operate Running this development under the adex banner, which is approved to be quite efficient way Second part is we try to hope a desk can develop a large pool of expert to addressing the community needs I'm sure we also feel you you may face the community difficulties. You have a small project six to three to six months It's difficult to get, you know, the data computing experts through the running the recruitment So adex how it currently has a larger pool and can easily to addressing this needs to offering the data can be expert for short term And also adex starts to um working with research astronomy communities To underpin you support your discovery ARC discovery project underneath of the data computer to underpin the data computing needs Last bit is we hoping adex to better support to have the industry engagement First one is to leverage the national National and international commercial provide cloud computing resources available for the addressing community needs Secondly, it's also commercialize the relevant astronomy technologies to the industry Um, this is adex team. I should say without them nothing is possible and I like to stop here and happy to take any questions Great, uh, thanks for that Robert um Now we don't have any questions in the um question box just yet So while we're waiting for that I will uh Just share some information about the next webinar in this series. Um So there will be a webinar on e-research skills and humanities arts and social sciences That will be happening on wednesday the 30th of october And registration so that will be available on the ardc website shortly I guess while people are typing the question Let me ask him leslie a warm-up questions. Surely lots of people ask leslie questions Really great talk leslie. So in order to achieve your Fair principle the i and r and I'm just asking what's the biggest bottleneck at this stage is people or because infrastructures or both well the biggest thing on the No, hang on one unmute it right. Okay. The biggest issue Is the ability to make the data machine actionable Which means you need the standards Now some parts of the sciences like sozmology are just so far ahead. It's not funny Other areas like geochemistry. It's nothing. It's just nothing there And then when you go into it trying to find the people who've got the skills who understand vocabularies who understand Um, what a machine actionable one means, you know, like you're kind of spilling mistakes in it just even though it's most basic things um and then The reusability is also a challenge because a lot of people particularly when you're in what we call the long term communities Um, they're what I call the usb drive factor that they can fit a career's data on a usb drive Therefore, that's how they transfer it And then they explain to the person as they give it over It's the ability to actually put it in somewhere That's compatible with what everybody else is doing Um, it's really proving challenging and so like just again for example with geochemistry What I did was I started working with the international union of pure and applied chemistry Telling them to get the periodic table machine readable because that took a headache off getting the scientists to do it um, but I think we've actually Got human readable part of fear and a lot of our science is the machine actionable And that's the part that people can't get their heads around and if they're in a domain where there is nothing It's very hard to find the people They're willing to put the time and effort in it to make it work Is that the answer you wanted or Thank you. That's really great. And I share your pains for us astronomy, particularly with radio data, you know generate Ska is a precursor 32 gig per second We are also quite stressed on this one Yeah, and I sort of done a few studies on Normally if they're a big Data community they have actually got their act together because I have to store it in a repository and you know There's a few things that make big data be reasonably conformant except for when it's super big Whereas when you have communities that have the most valuable data That they can store on their laptop, you know, you're pretty sunk I think again the interesting one who's done that properly is crystallography And so when they went to digital publishing in the 1990s They took it the whole way and they said oh, it's not just your paper that it has to be digital your data also has to comply with this standard And so again, that's why I think the htu project was interesting because that was a professional Society that brought this in it wasn't government regulations or NSF or something like that So it came from within the community to do something But as I said getting people to actually do it is very hard because I don't see the reward in it And again, there's also just a bit of chat is like the e-research Sorry e-science project in the uk They had trouble once that um, you know h-factor when you know the judging how good a scientist is And that they're publishing the high impact journals Okay, so that caused people to split to either computer science or to the domain science Because those journals had higher impact factors than the ones in the middle Yeah So it's going to take a while to get this to I mean a community is either there now Or they're going to have to get there and for some it's going to be a very long process Hmm Thanks for that. We have a comment Who says that even after the data is in good shape and ready to be used The The education and training can be a missing part That can help change the paradigm current research style Yeah, I'd agree without I guess as I was coming at it from the perspective of If you want to train people to do something to use something you got to have somebody develop it Right now people are recognizing that's a bit of a hole Yep, certainly Okay, now we have no further questions So I'd like to thank you Leslie and Robert for your time. I'd like to point out that Robert is He came in in his own time. Thank you very much for that one Enjoy the rest of your leaf And we look forward to seeing you at a future ARDC webinar Thank you Thanks