 Hello everybody and welcome. I'm Julia Martin from the Australian Research Data Commons and thank you for joining us today for the first in a series of webinars which will showcase the outputs from the AR2C funded RDC and Devil projects and how the outputs from one domain project have been repurposed by another. Today's webinar will focus on the EcoCloud project and how its outputs have been adopted and adapted by the HasDevil. Speakers today from the EcoCloud include Sarah Richmond, the project manager, Gerhard Weiss, the lead developer, Cyrus Jonathan New, who developed Knowledge Network, Siddhiswara Guru, who will talk about the virtual desktop Coesra and Nick Rosso, the HasDevil lead who will give an overview of how the HasDevil will reuse the EcoCloud software stack and we'll talk to the benefits and challenges. Just some housekeeping please be aware that during this webinar you will be muted and that this webinar is being recorded. There will be a short amount of time for questions after each presentation so if you do have a question please put them in the question pot as we go and I'll now hand over to our first presenter Sarah Richmond. Thanks Julia. Here we go. Okay so thanks for the opportunity to talk a little bit around the EcoCloud project. This was one of the RDC slash Devil programs that was funded last year so I'm going to give a bit of a brief or a high level overview of some of the work we did last year and some of sort of the outcomes from that, not only around the technology side of things but also around some of the training and engagement activities as well. So I usually like to start with kind of taking it back to why we why we're building what we're building and why it has an impact and I'm going to talk to this around the environment just because that's sort of the domain we're focused on. So dealing with a lot of researchers in this area understanding and predicting changes in ecological systems is very complex as it is with many domains and it's often that you're required to be able to go out and find data from multiple different sources that might have been collected for different reasons, for different research questions and also for potentially different backgrounds. And then quite often what a lot of researchers do is sort of combine all of this data into some kind of analytical workflow. I've put that here under the broad banner of biodiversity modelling and it's a pretty complex and very time-consuming task. So I mentioned before that this requires a lot of data and this is a very quick slide I pulled together of a few logos from some of the data repositories that are quite prominent here in Australia around environmental and ecological data sets. I could probably make about 15 slides with different logos of different repositories that researchers access. So it requires a lot of data and a lot of analytical capability to be able to assess changes in environmental systems. Not only what's happening now but also how it might change into the future. So some of these data can look like occurrence records that might come out of places like the Atlas of Living Australia and citizen science programs or from museums. There's spatial data collections that might come out of places like Cyro or data.gov. Also a lot of data that's being generated by universities and other institutions as well. There is also a whole heap of tools and services that you can use to access this data and to do things with it. Again this was sort of a very quick slide I threw together of some of the ones that we started to look into last year and also sort of continuing engagement with this year. So what our main remit was was to bring all these data services and tools together in one place so that people were able to not only find and discover data sets but that they were able to access it and connect it to analytical workflows in a cloud environment. So that's kind of where we got to at the end of the year. We launched in October last year the EcoCloud platform. It's available at ecocloud.org.au and essentially what we tried to do here was bring together all the different working parts that already existed. We didn't want to continue to create new things until we'd kind of connected what was already out there and made it easier for researchers to do more end-to-end analysis from data collection to access to analysis to output and even to making decisions and putting things into policy. So we worked with a lot of our partners around this framework and these are I guess I'm about to show a few screenshots from the platform but essentially what we what we tried to do is bring these tools and data services together and to do that around a command line application. So in ecology there is already a virtual laboratory called the BCCVL or the Biodiversity and Climate Change Virtual Laboratory which is essentially a point-and-click tool that enables researchers to design model workflows and in the back end they run off R scripts. While that suits a number of researchers quite a few of them wanted access to command line tools like direct access to the R, the ability to create and run R scripts not through a point-and-click interface. And the more and more we spoke to particularly the up-and-coming next generation of ecologists they were all learning R within undergraduate curriculums. We wanted to make sure that the platforms were able to move with science priorities and the skills that were involved for researchers new and new and old I guess. Here's a few screenshots so what we did ecocloud essentially has a few core components one is the workspace that is essentially persistent storage for each user. It's 10 gigabytes of persistent storage. The Explorer which this is a screenshot of it here where we teamed up with the Cyro Knowledge Network team from Cyro Land and Water and essentially connect to their API service which goes and crawls a bunch of different online repositories and brings forward the information about the datasets available there. And it not only brings forward the metadata such as title description date it was updated the providers licenses and those sorts of things but what Knowledge Network does is it also brings forward the actual data access link so in this case this is a Melbourne water use by postcode it's a CSV file so what we did in ecocloud was built what we call a data cart you're able to select that you want that data or resource it adds it to a cart and then you're able to click through to a series of code snippets in either Python R or we also in bash that enable you to essentially copy and paste that into your coding environment and download the data. We've also just commented in a few components to enable better reuse such as publisher information the contact point the license just to make sure that that provenance data stays with the dataset as you as your code evolves. A few cool things about this is you no longer have to go and go out to 10 different data repositories download and upload it into a new system also if you're working with collaborators as researchers often do you're not having to share the data along with all your code all the calls are available within there within the scripts as well and this year we're starting to really advance these snippets by making them more content aware and and things like that as well. The next component of ecocloud is our tools page where users are able to essentially the click of a button run up either an art or Python server they can also connect up to turns virtual desktop service as well if they want to access software like QGIS and things like that. Each of these server environments are optimized for ecologists so the R and Python environments have sort of popular eco packages and libraries pre-installed so they don't have to continue to install them themselves so they're very much optimized for researchers in the ecology domain. We also started to build out a whole heap of what we call microservices so we've got two at the moment but we've kind of got a very increasingly long list of things we'd like to add here but as an example part of the project the ANU Fener school created daily weather grids for all of Australia from 1970 till last year so that's a grid of covering all of Australia to one kilometer resolution by five different variables such as temperature max rainfall vapor pressure things like that and they did that for every single day so it's a lot of data and quite often researchers only want data for say Queensland and they might only want it for two or three months of a given year. Being able to go and interrogate that data and find the files you want at just those points is quite difficult so we built a service that goes and queries this data essentially all the researchers have to do is submit a CSV with Latin loans and a date and we'll go and fetch the exact values for those days at those locations and send it back to them in a CSV file so it can be a really powerful way to help researchers interact with large data stores and data sets. This is just a little bit about what it looks like when you start it up so users can start up notebooks such as this one this is an example Peter Scarst it up using biomass data from turn or you can also run up an art studio instance straight in your browser we've already had this being used by a master's course over at UWA a marine ecology master's course where within a couple of minutes all students were working within the same environment previously the professor said it would take them one to two hours just to set working directories and install libraries and he said it was a little bit of a nightmare trying to get them to do stuff in our on computer systems at the universities at the sort of midway through last year tinker also went out and spoke to quite a lot of their users and found that they were using services like Python and are to do a lot of their analysis so we kind of said well instead of building something from scratch how about you take what we've built rebrand it and and sort of trial that with your community and see how it goes I'll leave Nick to talk a bit about how that is but essentially eco cloud and tinker are running the same baseline infrastructure just with a bit of different branding and and some domain centric things applied just to finish up I wanted to touch a little bit on how training and engagement really underpins everything that happens within a virtual lab or a platform so last year we went out and ran some eco science pathways events across Australia was and this was really to get a bit of a pulse check on the community also as an opportunity to sort of say we're starting to build something we wanted them to come along for the ride and provide feedback as we went this was also wrapped up into an eco ed training program which we've had running for about two years now where last year we also built alongside the AIDC team or and as it was then 10 eco data things so very much around data management and we've now got five modules and we're working with the biosciences crew over at Galaxy to also implement their environmental metagenomics module into the eco ed program because there's quite a lot of overlap there in terms of the questions researchers might ask. We also worked with ABEZ to do some multi criteria decision analysis modules to help people then put what their outputs are into policy and management actions. This program has been hugely successful and I think is a really great way of teaching science and using innovative digital tools to do that and also ensuring that those who do these modules come out using real world tools and understand what kind of infrastructure is available to them. And we also taught a whole heap of what we call eco ed champions so we've now got champions from all of these universities in Australia plus three overseas universities as well one from Azerbaijan Union of Lincoln and also Union of Auckland as well so we have representatives across these universities sort of going out and training and doing repeating these modules across these universities which is a fantastic way to not only get your tools used but to make sure they're being used properly and for real science impact. That's me and I think I'm almost right on time. Julia I'll hand back to you. I don't know if we are doing questions now or at the end. We'll do questions at the end Sarah so I'll swap over now. Thank you for that introduction and I'll hand over now to Gerhard. So thank you for everyone and thanks for coming and joining here what I'm going to talk about now is more about the technical side how we were looking at especially about the reuse that's a topic for today what we were looking we were designing eco cloud around reuse mainly because first one first of all we are kind of lazy and want to reuse everything that exists already as much as possible and further on we also want to provide services to third party applications let's put it that way just to make the platform more open and more reusable and provide better value for everyone so that starts off at the platform level where we just use mainly standard software if possible try not to get locked in by some vendors which means we were looking at rather standard protocol support and standard tools then everything we've designed in eco cloud is around web services which can easily be consumed by anyone by even by desktop applications if necessary and our focus is mainly on providing services that offer some standard protocols so we've looked at mostly at OTC web services they are usually well supported by libraries and desktop applications and other web tools and around authentication we enabled everything by open connected or two so every service every single thing that's available in eco cloud can be consumed via our two webs and web services and also the other way around so every service we offer and integrating to is enabled by over to as well that makes it a lot easier for us and also makes it a lot easier to build tools around it the fourth part would be the programming environment so we focused on our and Python which ours is a very wide known environment for ecologists and Python is kind of very big growing tool in the ecology sector as well and the last part the most important bit is probably documentation as soon as anything is documented well it's also easier to be reused so on the platform side we are running a Kubernetes cluster in with a nectar that means we the deployment itself is totally abstracted from the infrastructure so we are in theory able to run the same thing on AWS and Google and OpenStack or the private vSphere cloud so that's totally reusable and transferable to any other environment one example for this as Sarah mentioned earlier was the has tinker platform which basically used our deployment and tacked on a few custom services and provides the same almost the same thing but customized to the has stable community as mentioned for everything is a web service that made for instance that made it easy for us to integrate Coesura which provides virtual desktop environments same thing Coesura can easily tack into from the virtual desktop within Coesura you can easily consume services from the eco cloud same as well it for PCCVL which is just pointed clicking the face that can intact with each the cloud and also the other way around and the duty of web services is that sorry got it in tools like there we have even tools like ArcGIS and QGS can consume services that are deployed in the eco cloud directly additionally we heavily rely on external cloud services like Google Dropbox for data sharing are net cloud store as an Australian service knowledge network is a data discovery service which is basically a one point shop to find all sorts of available data within Australia and most important find access to the data we've provided example to utilize object store where it's Swift or AWS access data from open depth services which becomes more and more widespread and more as well around the world and of course managing your code environment get up is widely used service as well within eco cloud itself as mentioned it's all around OTC standard web services reason for that is it's a well described well standard service plenty of libraries available to come to consume these services so everything we build and integrate focuses around those protocols as well and with using those services eco cloud has a small web user interface as Sarah has shown a couple of screenshots before it's equally easy to build third-party customized domain specific interfaces as well which just uses the same services in a back end transparent to the end user same services can be used from within your Jupiter notebooks our studio these which are all familiar tools research has developed us familiar with there's plenty of online documentation and tutorials available as well using the same tools and maybe one special thing we did in eco cloud is that all the notebooks and studio environments are backed by a fully customized condo package environment meaning there's one simple way to install all sorts of additional software the user might need might want which may not be provided upfront as the major use cases a ceremony before we've successfully run a university course a master's degree course biggest advantage is you get a class from up and running within minutes and if a lecturer can prepare the environment give the users a notebook or a couple of simple commands and everyone gets up and started another big use case is for development so it's a perfect environment to do prototyping for your to develop a new algorithm or whatever and then move it later into a large scale cluster if that scale of processing is required it's also perfectly suited to improve collaboration between developers and researchers for research is often helpful to have a professional developer around to help with tools and best practices around utilizing CPU resources and the third bit would be we can allow users to develop and publish services which can then be reused again by whoever and yes researchers are our main customers I guess and what we offer to them is data processing visualization publishing in a reproducible environment and so last step a quick glance around our tech stack that's more conceptual view around it so there's kind of divided up into layers so there's the application layer where for instance PCCVL sits on it provides various APIs about migratory modeling or traits modeling or projection into into future if you have if a researcher is a model already available the ecoclub ecoclub platform itself offers a storage environment that can be shared or reused as well at the alteration services are sitting there of course are Jupiter and user interface environments it's there as well there's the integration with the virtual desktop environment the little web service the microservice thing Sarah mentioned before this would be the splice and dice API or daily weather extraction and there are many more to come in the near future as well underneath that there's a little management layer which sits just there to ensure that everything is up and running the API is sitting there access control and securities enforced properly everywhere because we are not dealing just with public data we are dealing with users data as well which may not be publishable and at the lower level underneath and at the side there are external services which usually just provide similar APIs but these are things we can easily integrate because they're just using the same models to communicate with which would be a knowledge network as a data discovery API various different data services like a la gbiff various opened up services that are sitting around or to call the cheers service to you whatever platform sorry for got the name about it there's access to cloud storage APIs so manage your big data your private data whatever way you want it's easily possible and there are also the WPS services these are kind of the execution environments for pre-canned processes that potentially process large data for you and drill down the big bits of data to some small things that you can then that the research is interested in and use them in their analysis and underneath the core infrastructure layer as mentioned before it's a Kubernetes layer everything sits on top of this one and underneath that there's the actual nectar infrastructure which is the net the cloud that the yellow layer can easily be replaced by AWS Google whatever's required it's really not difficult and I think that's everything I wanted to tell about thank you and I'm handing over to Jonathan thanks so I'm my name is Jonathan you I'm a research data scientist at CSRO than a water here to talk about knowledge network and we've been mentioned a couple of times in this session already the real aim of this project and the platform we're trying to build is to improve the data search and access experience for researchers and anyone who's dealing with data really so if you haven't seen knowledge network before this is how it looks like it's available at knowledge net.co and so this is really the kind of portal view it's powered by some technology that we're co-developing with data61 called NACTA but you may not have seen this view but we've already seen a couple of views in in the talks today so this one is the eco card explorer showing search results that's powered by knowledge network I think in a talk to come it's also been used in tinker for the explorer and so this is a search on population with results coming from knowledge network so knowledge network really exists at this point of time to power access to data and comprehend and allow comprehensive search of data within analytic platforms by the API it's really been designed to target an API first approach just kind of cognizant that data itself that data work doesn't really happen in a portal you know we've got researchers accessing data and using that in Jupyter notebooks in Python or R or other workflows but what we try to do in this platform is to reduce that uncertainty that you get when you're dealing with data looking for data and accessing data whether it's looking for data so data search what we do is we provide search at the data set level and the file level so we index the metadata and the data set metadata itself knowledge network harmonizes the data set and file views so it conceptually it uses the DCAT the WPC DCAT schema to do that you may access data from different range of sources and catalogs and they'll have all different views so but when you come to knowledge network there is a consistent view so you don't have to wrangle that and we aim to have a comprehensive catalog of data for research and government data in Australia so the reason for that is so that we can help downstream users in the modeling environment or in tailored applications use data in in their environments and the reason for that is for those users to realize impacts science impacts or policy impacts through data driven analysis so whether it's understanding environment you have the environment or informing policy development around urban settlement that's really where the impact happens but before that you need the data to drive those decisions in knowledge network so it's some of the statistics that we have here we are indexing at the moment 28 data sources from a range of providers from inquest facilities to open government and off those there are about 1000 publishers and these publishers publish datasets so there's about 76 plus thousand of them at the moment and each data set may have what's called a distribution or resource or a file and at the moment we did a bit of a analysis and that's about over 166,000 of them available so just a little bit under the hood so as I mentioned earlier a core part of our platform technology is using something called Magda so if you go to magda.io you can get more details there but essentially it's the engine for which we connect with the different data catalogs register them in the system and expose some elastic search APIs as well as some custom search APIs combining elastic and the registry out into anyone who accesses our APIs and we have a very simple portal front-end that we develop just so that people can browse around but really in the red arrows is where we see kind of the use of knowledge network through APIs. There's a concept called Minions and so the Minions at the bottom here could be defined by the Magda stack or by knowledge network and what that tries to do is try to add value to the records by cleaning them up or by detecting certain things that you may not get straight up from the catalogs like broken links so that's kind of the architecture. In terms of reuse opportunities so the search APIs are open so anyone can come in and do searches by API either by the elastic API or the Magda API and as you've seen in a couple of screenshots earlier from Tinker or from EcoCloud it seems like it's domain specific but our search APIs are domain agnostic but what we do is we allow these platforms to customize search for their domain so we've been working with EcoCloud to do that specifically in the last few months. On the platform side it's also running Kubernetes currently applying to Google Cloud but any other cloud using Kubernetes could work. Our data catalogs are open so if you want to add some data sets or suggest data catalogs basically do a pull request in GitHub and we'll try our best to add it. I guess yeah this is the range of reuse opportunities here if you're keen on using the search APIs and you need someone to help you through that we're happy to talk you through as well. Just wanted to highlight some upcoming features we've been working on a version 2.1 release so as I alluded earlier those minions allow us to add value to the data sets that we're indexing so one of them is called the broken link minion and in the upcoming release we'll be providing that as via API that information so what it'll do is it'll go to each one of those hundred and sixty-six thousand resources or distributions and test to see whether that's accessible often when you're a data user you don't actually know until you try to download it and there's a there's a chance that you might not be available so that's that's this is just visualizing what that might look like on the portal might look differently on EcoCloud or other platforms but essentially we've got a traffic light saying whether it's available whether it's unknown that it's available or whether it's broken and in this case this particular data set has a broken website and yeah so you can immediately see that here. We're also working on some data previews API so this is showing a data preview API so if we can if we know that there is a file we can actually hit the file and get some statistics and summaries out of it so it's just showing one of the data sets a chart data preview so that might be handy in your analysis platform especially if users are trying to evaluate whether that's the right data set that they want and also the map preview is another one that we're rolling in from the Magda stack into Knowledge Networks so while these are showing views from our portal the the aim is to allow these to be appearing in you know your analysis platform so that your users can actually do this sort of thing there. Okay just just some of the challenges that we faced in this process of developing this platform is obviously this this varied quality and metadata and data standards in doing data public station but and we're encountering that in our platform and we're trying our best to deal with that but some some things aren't able to be dealt with with technology so we're trying to also have conversations around best practice around those standards. Our aim is also to minimize uncertainty for the user so trying to do with varying data quality and sometimes the minions might play a part in improving kind of visibility of quality as you see in the broken link kind of example and we're trying to address some of the issues that data users might have around data sets as well and provide confidence around things like uptime or doing some analysis around the data set itself and providing search where you can recognize some entities such as species or name things like place places for example so hopefully that that will help search experience as well. Yeah and we're trying to work out ways to incentivize improved metadata practices so that's something we're trying to look at. I've already mentioned some of the future workings but we'd like to continue that work in integrating those features into the platforms that we do see Knowledge Network results in and also working on some new data sets from Tern and NCI to index as well. Alright so just to wrap up so Knowledge Network aims to provide comprehensive data search and we're focusing on research and government data in Australia at the individual resource level the distribution level. We're aiming to improve data search and access experience as I mentioned at the start and hopefully this platform enables that and it enables minimizing uncertainty for data users in using data sets in this online from online and partnering with projects like EcoCloud to enable data to be used easily within analytics pipelines. We're always keen for suggestions or feedback so please get back to us if you like that. I'd like to do that. Thank you. Thank you Jonathan lots of food for thought there and lots of re-use opportunities so we might move on now to Nick Rosso who's going to talk about reuse of the EcoCloud for HAS. Great thanks everyone. Yeah so as was mentioned at the beginning Nick Rosso from the HAS Devil project team. I'm actually not the lead of the project as was said I'm actually leading the work package one which is the technology stream. So a little bit of a background first on the HAS Devil project. The project kind of kicked off in 2017 under the RDS banner of the Cultures and Communities project. This project was a lot around focusing on creating a data sharing model for the HAS research environment. It involved a little bit of community building in linking researchers back into their data sources and primarily the data source that we looked at in that Cultures and Communities project was the glam sector so galleries, libraries, archives and museums. So the lessons that we learned from that project were channeled in towards the the HAS Devil project for 2018 and we're really fortunate at our kickoff meeting for the HAS Devil in 2018 that we invited along the EcoCloud project so Sarah and her team came along and gave us some of their lessons learned because that project or the scope of work within the Eco environment had been running in quite a few years previously so it was a really good opportunity for them to let us know. Some of their lessons learned so I guess the biggest lesson that they sort of were able to show us and Sarah mentioned this in her talk was that training in community underpinned everything that they did within their project and I think you know Sarah said in the past that without that community backing the project wouldn't have been the great success that it is today and I guess the IIT adage of build it and they'll come doesn't really work but the Tinker project is kind of taking that adage and turned that around a little bit in terms of saying well the Tinker idea is allowing researchers to quickly try new things to have a play to not worry about trying to spin up this technology stack on their own but to come and play and try new things out on our workbench as we called it. So today I'd like to talk about some of the opportunities that we have seized on by being able to reuse the EcoCloud platform but it's not just the platform that we've reused. We've actually found the biggest benefit of teeing up with the EcoCloud frameworks is the framework that they developed over that period of time for community building and you know training and engagement that they've been able to share with us. So the scope of our hash devil was really led in 2018 in terms of three main technology components. So there was transcription, geo which is geographical stuff and text analysis. So we saw a big opportunity there to lower the barrier for those three main components for researchers within the house environment. There was a big understanding through in the project that the house environment is such a huge environment that we couldn't really dive into specifics. We didn't want to try and solve really specific problems for the for the house environment because it is so large. But what we found was that the EcoCloud platform was an easy win for us in that you know we still needed to deliver this workbench where researchers could come and try new things. And so I guess that's kind of why the word tinker was used as our branding because we wanted to to give an air of ease and just trying out new things. Because it was all about lowering the barrier for our researchers. Not only to try new things but to access data. We wanted to lower the barrier of that and we saw the siren knowledge network component of the EcoCloud platform as you know a great way that we could give easy access to researchers to data. Some of the challenges that we faced and I think it's worthwhile mentioning that the biggest challenges that we faced in the house devil project weren't technical challenges. We've already mentioned that you know the community building was the biggest thing that we've learned. So you know that was where we focused a lot of effort on last year. It was not just building the community because the house community already existed. It was about joining the community. So the project team had to become part of that community first. And then we could suggest you know well this is what we're doing to try and make it easier for the community to do research. So some of the framework that we borrowed was the pathways event and the champions program. And that's around kind of like pathways is about getting out into the community demonstrating you know what the community is doing and giving them an opportunity to share their work but also then sharing the work that we're doing to build new platforms, new tools, techniques and the champions program was around kind of like a train the trainer type scenario. So we would send out experts into the community to say these are some of the tools and techniques that we found valuable and then promoting the Tinker workbench as a whole. So the Tinker workbench as a whole was it's not just the eco cloud component. There's also linkages out into other tools such as buoyant for text analysis as an example. So it's around you know trying to find out what the community really wants. And so that's what this year is going to be. It's a big focus for the house project. It's about you know focusing more on what the community is saying to us that they want to be able to do more of and so we're trying to get more academic input this year and that will then hopefully drive the technology stack that we've deployed. We can turn off some components, turn on new components because as you saw when Gerhard showed the conceptual layer of the eco cloud platform it's very easy to pick up new modules, install them all, take out modules that aren't being heavily utilized. So that's all I really wanted to say today. Thanks very much for the opportunity. Thank you Nick. We'll move on now to hear from our Guru who's going to talk about the Coesra virtual desktop. Thanks Geri. Thanks all my previous speakers. So I will just give a quick overview of the Coesra virtual desktop and a bit of the architecture aspect. So what it's working on and probably summarize some of the work TPAC has done in adopting what we have done. Coesra is nothing but what it provides is a cloud-based virtual desktop environment which you can access from a web browser. So we provide a wide variety of tools. So typically a programming the Jupyter Labs and RStudio then we provide the IDE canopy and then for the geospatial analysis we provide QGIS and a couple of scientific workflow tools like Kepler and K9 and then some of the ecological related tools as well like Biodiverse and macroeco desktop. So we provided a client where people can sync their Dropbox and then the own cloud. For example, if they have a data in a cloud store something like that they can bring it to the virtual desktop and use that. So it's quite simple to use. Basically you just go to the Coesra.tumblr.au and then you click on the login either you can log in via the AF and then the Google. So once you log in so basically you will get this launch desktop and then the advance if you click on the advance features it will provide you an option where you can select whether you want to access the Ubuntu or then the CentOS and then you can select how many CPUs you want and then the memory and then how much time you want that to be. So if you don't go to the advance by default setting is a Ubuntu desktop with two CPUs and then the 7Gb of memory. So once you customize or if you don't even customize your Jeffense you click on the launch desktop you you know the desktop will be launched and then the running desktop windows will appear then you click on the go to desktop you will get to a desktop and then there is a menu in the Coesra and then you can see all the tools available as part of that. So there is a side button there's a file transfer so which provides you an opportunity to transfer files from the remote desktops to your local desktop and vice-versa as well. So we have done a file transfer of fairly pretty big files as well so it works fairly well and the other aspect of that is you know we provide an option where people can create a bit of a collaborative space among themselves and then work as part of this platform. Some of the researchers you know this is just an example where they created their own space and then work on their part of a platform. So in this so the main reason for this is so that they can bring their own data which they don't want to share as a public data and then run their analysis and then you know work closely. So we don't want to make it as like a by a default function so this one is more like a we have a gatekeeper approach for this they come and ask for the request partly because we don't want people to just use it as a you know a closed working environment we want this to be used as a like an open environment but the option is there for the users to use this. So quickly just to provide an overview of how everything is set up so the website what you saw it you know we call it as like a like a client and then the authentication we use a key clock so we support you know the AF and then the Google so in the back end we run the LDAP so if a user you know first time they locks in we check whether the user is already registered if it's not so basically it creates an entry in an in the LDAP and then creates this home folder and then and then create an account in the slum cluster as well for a user so once that is done so is gets an access token and then using that access token so basically the web portal will communicate with the what we call it as like a resource server it's a web service basically to to communicate with the slum cluster so which will create and delete the desktop so slum cluster I've just put it as a very high level so in in that slum cluster it has got a node and then the naming node etc so that is you know probably missing in that picture so the communication between the desktop and then the client is where the Gokomale so we run a Gokomale client on a portal and there is one Gokomale server so so once so for so it creates a SSH tunneling and that's how you can see it renders the browser the desktop is rendered via the browser using using Gokomale so we use the RDS store to store all the home folders of a user and then the even the any files or the data the user springs into the environment so probably you know just for the academic thing so as I explained you know at the high level we use a it's a it's a virtual cluster so by the way you know all this is built on a cloud Chris cloud environment it's not built on HPC that's why we are to create like a slum virtual cluster instead of a physical cluster and so all we have put up is that you know the cluster is you know created using a EAT orchestrated template so basically it is an open stack orchestration template which is called EAT and then so all these storages as I said is we used a RDS store to do that and then while deploying everything so all the application the the software everything we use the ensemble to do that and then each of the virtual desktop is basically a singularity container so each of the desktop is a basically the container we run so we run a in a slum we so generally we run a big node of eight node of eight cores and then there will be a multiple virtual desktop which runs on each one of those nodes and then so as part of the environment you can you can go to the to access a echo cloud platform the the notebook environment and then from the echo cloud platform as well you can come to the coser platform as well so predominantly so potentially so we have provided the API to access to invoke a desktop you can build your own client to get a desktop from from the coser platform so what the users get is basically it's just the flexibility in a you can do whatever you want in the in any like you can use it as a like any Linux desktop and then the one flexibility is that you have a desktop on anywhere you were so it's just on the browser available so wherever you are you have a desktop available for you so there's a fair bit of an application where people come and run use it for intermittently and then go back and then there is a fair bit of a people use it as a collaborative space where you know they run some experiments and then you know so three or four people you know working together to do that kind of a thing so that is the it's it's a fair fairly used so we have supported in a cloud certain dropbox technically we can support more as well so one of the thing what we do is you know we keep the the data whatever the user brings as in a persistent storage so that is one of the restrictions we don't want people to just come and you know use a massive amount of storage in our platform that's why we limited to the cloud stores and then the dropbox partly this was the two most popular clients people as far so that's how to use so I will just quickly give you an overview of what the TPAC has done and so this was I think year and a half back TPAC were also looking at a platform where they can use a web platform for to do the analysis especially for the marine community and then they looked at it and it's most of the things you know it maps into their requirement say they took the complete source code and then deploy on their TPAC cloud so there was a fair bit of issues when we are setting up you know because of the hardware issues and probably in Shelley we ran it as local more like an internal platform so it took say in a couple of weeks for us to deploy on their platform after that based on their requirement they customize this complete platform so last year what TPAC did was you know they make so by default they provide a Ubuntu as a desktop and then the the software application stack you know we use ensembles so they use a salt stack for for everything and the other thing what they do is you know so they mount all this RDS collection as part of the virtual desktop so that provides a fair bit of their usage set of the thing and then you know they use this platform especially for the training and as well as as a collaborative tool for especially the the students from the University of Tasmania to come and use the data and then play around they provide a multiple images as well so one of the things what they have done is you know they support MATLAB but it is accessible only to the U-TAS users purely because of the licensing issues so we are not you know directly connected to the UQ or per se so that's why you know it's hard for us to you know provide a MATLAB thing so TPAC has you know fine-tuned the web interface as well so so always runs on the AngularJS and then they use a typescript very recently and then they want to closely align with the bit of a nectar how the nectar dashboard looks kind of a thing so there is a typescript and then run it so they use a bit of a multiple clusters as well so we run only one cluster so they run a multiple clusters and then so that they can connect to that one the other thing what they even do is you know so basically they have added a fair bit of the administration access for the so that you know that is export as a part of the portal so that you know there's an administrative account and you can comment delete and then reassign the resources and everything and they've had a few functionality and the system monitoring as well so one thing to mention is you know in our previous version so it was so the question was quite aligned with the with the strudel web developed by the characterization watcher lab so last year we make we moved towards the key clock partly because you know we wanted a bit of a single sign-on functionality even in turn and then that enables us you want to you know use you know seamlessly you know work with the echo cloud platform access this as well so in a so if you are interested in the strudel web if you look at this so you can take this out and put a strudel web and then these complete things will work from here on it will work so instead of this this will be the replacement for the strudel web and then from here you don't have to change anything so it will work as it is so so so currently I think a t-pack was selling so they add a bit of issues with the NSF mounting when there are a lot of people using the infrastructure we are not sure why because we didn't had any issues until now especially when a lot of people are accessing it so that is the one thing you know we need to I don't know whether it's purely because of their infrastructure our infrastructure but that is the one thing you know we need to look at so apart from that I think the last year of the echo cloud as part of the echo cloud project you know we have streamlined the complete architecture and it's become a fairly lightweight now to anybody to deploy and everything so except the portal the front end portal you know everything is you know and symbolized and then you know it's a container driven so there was no use case you know we can do it even the you know front end portal code as well as a complete infrastructure as a code model but that is the one thing you know we are working towards so that you know a complete platform can be deployed as infrastructure as a code model so before I finish I just want to quickly acknowledge the people who are working corn and e and then the other echo cloud team members as well like Gihard and Jonathan and everybody so so this is a collaborative project we are working for the past couple of years I want to acknowledge Deepak Chris Cloud and then the UQRCC and Monashi Research as well thanks everybody thank you guru um that's the end of our presentations now for this session I've got a question for everybody if um you wouldn't mind I I'm conscious of the fact that people are able to take an instance of your products and reuse them but do you have any thoughts on how you might want to be informed about any extensions or adaptions that they might do to your tools or platforms I can probably have a high level crack at that question um I think from our perspective um it's something we've recently been discussing around just general software citation for how researchers use it um and those kinds of attributions from research projects but it is another thing to then um if someone because these are all open source technologies if someone wants to just take bits and pieces um of the overarching platform and that's why we designed it the way we did was so that that can happen um I don't quite know then how you would manage um attribution from say subsequent platforms that might be built using components um other than in terms of the microservices so the kind of workflows let's say if if um someone taps in to say the species distribution modelling workflow and puts that on their platform um you'd want that service to be attributed to say the vccvl who was the creator um same with things within the virtual desktop you'd want that to be attributed to the term virtual desktop um how we actually manage that and how we make bring that information forward for potential developers who do use those services um I think we're probably still yet to come up with a nice neat workflow for that does anyone else have something to add to that yeah I'll probably add something and that's um from a re-user point of view we see the biggest benefit in working closely with the the teams that built the the pass that we're reusing because without their input it's it's still quite technically difficult to pull it apart and put in new modules and things like that so we see the the benefit of working with them so that if we do build a new module to go into it that then gets redistributed to everyone else and makes it easier to use as well so I think that is the one thing you know we are working with tpack so that's why we have a common github uh so where the back end um so decide okay you know what is the common artist we want to keep so that you know we have a common github the back end sort of a thing is similar different and because they wanted their own branding the portal aspect sort of a thing is a bit different uh so moving forward so basically the github is a right way you know probably with a as Sarah said that you know right attribution information I think that one thing you know we are a bit slack is you know we haven't even provided the work license we are we have used to you know make the data make the code available because you know it just as a friend we are working so we need to formalize that you know to have a proper license attribution and then that is the point of collaboration for everything so the communication happens at the github level and then uh so that you know people can join as a project uh so they get an admin access and merge together uh you know if they contribute the code and everything so more as a like a community driven project set of the thing so that we may lose we may lose a bit of a control which is good we don't you know it's not on us on us to manage everything so we have started something but as a community takes up you know that is the one thing we really want you know community takes up and then build it and then own it kind of a thing we are happy to just in our side step as well Jonathan what about you for Knowledge Network I know there's a bit of uptake within Sirao itself so it is open source um so the technologies like ecochrone and others are open source um so anyone could theoretically spin that up as well um I think the benefit like all the other kind of open source projects would be contributing back to the community to add features to it um so that whatever deployments get deployed gets those features too it's kind of a um rising tide first all boats kind of thing um um so that's and I think for the Knowledge Network service itself um yeah we're just trying to build something that's valuable to analysis platforms and while it remains valuable to them it will continue to exist but yeah and if there's something else that's better then you know that that will be deprecated down the track um so yeah it just needs the community to kind of see the value and and use it and while that happens then we'll continue to build that that um code base with those features one of the challenges the ecocloud and tinker platforms have this year is part of their roadmaps is to work out a framework of code development and what that actually looks like so that that'll be something that we're working together to actually produce this year it was a it was a really big thing for us to make sure that it's all very well to kind of replicate an application and rebrand it and build in some bespoke tools but we wanted to make sure that these two platforms didn't completely drift off or run in parallel without kind of touching base so coming up with this um code development architecture to make sure that if we push a feature update or we do something cool in ecocloud then tinker has the opportunity to just pull that straight into their application and vice versa which i think helps with skill sharing across institutions with developers which i think builds for a more resilient um i guess development ecosystem you could say uh and also skill sharing across domains which means we've got better or greater coordination less repetition and and also understanding that researchers often work across domains as well um when you're looking at something environmental you often need to take social and economic things into consideration so building platforms that allow for that a little bit better um particularly on the underlying infrastructure means that we can kind of collectively work towards that as well thanks that's fantastic i was just going to add that we talk about data ecosystems but i think we also need to talk about as Sarah mentioned development ecosystems um and promoting um a community around that in the research space so that we don't duplicate and maximize kind of efforts in the infrastructure that we build definitely and i that's one of the things that what you've done is demonstrated how it's domain agnostic but also um transdisciplinary and sarah jerry you're wearing my um skilled workforce hat for the ardc i'm really interested in that co-development architecture and the sort of framework that um you and nick might come up with or your teams might come up with to particularly looking at that skill sharing aspect i think that's really important and that sharing across the domain so i may well get in touch with you offline about that so great to hear about it yeah sure um i think that's for me one of the really cool things i saw come out of last year was um we did a few development sprints where we physically put um you know developers from from griffith from the griffith team from the turn team and from the syro team in a room together and spent a couple of days just working and not only on our own system so jonathan and and people from jonathan's team were doing stuff in eco cloud and and likewise i left the room to go to another meeting and came back in and all the developers were talking about oh things that went over my head but but talking about ideas or things that you could do or barriers that they might have had and then the other developers were weighing in and was a really nice for me anyway to looking in from the outside it was a really nice way to see skill sharing across institutions um even working on totally different applications and tools so i think that was a really nice thing to come out of the program as well excellent thank you so i have one um we heard from tim from uwa about that he was taking a group of students across to rut ness island who'd never coded before i was wondering if you might want to share that story yeah sure so um as part of the pathway series we did last year uh one of those pathways events was in western australia uh in perth and just on the side myself and my colleague shantel organized a few just information seminars at the universities we just contacted the heads of schools in the ecology departments um and said i said we're hey we're in the area we can do like a lunchtime seminar or something so we went to uwa and there was probably about 25 or 30 researchers in the room we just kind of presented on the platforms like the cvl and ecocloud and coesra and things like that um that were available and said if you have any comments and there was a teacher in there which was tim uh who um saw great potential for it in his in his course and sort of said hey could i use this i said sure just get in contact with us because it's new and we said make sure we have the resources available for you uh fast forward to last month um and he had 22 masters students over on rut ness island um who were able to instead of having access to a computer lab uh which they usually would have had to go and collect all the data come back um head back over to the mainland and set up a computer lab a week or two later for a couple of hours to do all the coding component of the research projects instead they took their laptops over to rot ness and logged on to ecocloud and yeah within a couple of minutes tim said that all his students were working in a standardized environment he contacted us beforehand and we gave him a few tips and tricks about um loading all his data into github and we've got a an automated github connector in the ecocloud so all students all they needed to do was type in tim's name or the name of tim's repository github repository and they had access to all the data and the code so it was no need to hand around usb sticks with versions of the data to all the students and and we also pre-installed a bunch of different packages for them so that they were there natively on the images as well so um he said in previous years they've had to go to a computer lab there's been issues with different r studio versions package versions updating versions even setting working directories he said you know you'll easily waste an hour just getting everyone working in the same environment so um within a couple of minutes the students not only had access to all the data they had all the right libraries installed working directories were handled and they were doing science which is sort of the main point I guess of doing that so and they were looking at um as a marine ecologist myself I was very interested so they were doing stuff around western rock lobsters um in in and out of no take zones in sanctuaries over there and then they had to write up a report based on all their analysis which was all done in our in our studio using eco cloud they even did a Facebook post about how stoked they were about it so it was really nice to to get that user story and to to also just learn a little bit about how it might be used in undergraduate curriculum and now Shantel who leads the eco ed program is doing up a module around using eco cloud and and our for ecology and doing up a few really nice use cases using things like turn data but also potentially doing some stuff around iMOS data and things like that too fantastic now really really nice story all right well um in closing I just want to thank all of our speakers today it's been absolutely fantastic um and I want to thank all the people for attending um I know the majority of you are from other RDC and devil projects and so if you have a reuse story that you'd like to share please be in touch because we'd love to showcase yours in future webinar so um uh bye for now everybody and thank you so much for your time