 Hi everyone, thanks for joining us. Welcome to the introduction to the first round of platforms, projects webinar presented by the Australian Research Data Commons. My name is Kerry Leavitt. I'm the Platforms Program Manager here. I'd like to start by acknowledging the traditional owners of the lands that we are on today across the country. I'm of Garnerland and I respect the Garner people's spiritual relationship with their country and I pay my respects to their elders past, present and emerging and I extend that respect to Aboriginal and Torres Strait Islander people who are joining us today. So this webinar, I'd just like to talk a little bit about the platforms program before we get into the project presentations. So the platforms program is part of the ARDC's platforms and software thing and what we're trying to do with this program is enable the development of e-research platforms that are transformative. So we're looking to improve the way research is conducted or accelerate research. We are looking to enable sustainable e-research platforms, those that have strong community support and that will continue after the initial project investment. We're really looking to expand the pool of Australian researchers that have access to platform technologies. So that's across disciplines and also the number of researchers in general and we want to bring together the community of Australian platform developers and operators. So we're looking to support best practices and have the community support each other. So you probably know some of you here because you know about the platforms open call. So I'm not going to talk about this. We did have a webinar, an overview webinar with a question and answer session last week and it has been recorded. This webinar is also being recorded and I will send these slides out to all registrants. So this slide is here to see if you can follow the links. So on the open call page we have added all the answers from that we gave in the webinar. We've transcribed those and added them to the frequent ask questions and you can also find the request or proposal documentation on there. So I'd like to get started. I'd like to invite our first presenter today which is Sean Ross from Macquarie University and he's going to talk to us about FAMS. Thanks Sean. Thanks Carrie. So FAMS essentially it's we're calling it electronic field notebooks now and I guess the place that I would start to give you an idea of what we're doing is if your organization uses lab archives for laboratory work we're looking at doing something analogous to that but for offline or network degraded environment field data collection and you can see here what are our partners and contact information for the project. I won't read the text here I'll give you a few seconds to read that and then I'll give you some more context. So essentially what we're doing is trying to recognize the fact that field research whether it's in disciplines like archaeology or history ethnography or in field sciences like ecology or geology falls into a category that in the literature is described as small science or small data research which is characterized by its own difficulty. It may not get the press the big data does but it's characterized by its own its own difficulties mostly around heterogeneity and diversity of data. When we started this project in 2012 we thought we'd make data loggers or something like that but what ultimately ended up happening was that that was just not suitable for our community and so we built a much more generalized platform that I'll talk about in a minute and that was just really necessary because of the nature of the research in these disciplines because of its characteristic heterogeneity and also the fact that and again something that comes up in the literature about this kind of research is that the field approaches methods the digital tools things like that are often emergent from field work that it is from the research itself it's often quite hard to have a top-down designed system ready to go when you go out to the field although we do try to work with our our partners on doing data modeling to do that as much as we can but before I start getting into the solution let's go on to the next slide and then I'll and I'll start speaking to that so what the solution is that we're looking at and I seem to see all right and so what the solution is that that we're looking at doing here and this is based again on experience that we've had since sorry yeah so the the key thing to understand about our our platform is that that this isn't an app that you just download and start using on the field but in the field for data collection but what we've what we have done and what we hope to continue to do is to produce a platform that lets you essentially mint your own custom app and do it in a way that makes sharing and transparency in the sense of an encapsulation of your methodology being available in a more or less human readable way that can be then interrogated by an outsider so we have we produce or researchers produce currently it's an XML file in future in the next version you'll see on the next slide it's probably going to be something more like JSON a document that instantiates a particular data model and a particular workflow out in in the field so that you have the advantage of having a bespoke app but instead of having to spend you know six figures to have a bespoke app develop this can be done by once the data modeling and workflow modeling is complete it can be done by a student programmer in a week and costs you know five or ten thousand dollars or something like that to to produce so this and at the same time the core technology that underlies each of these does a lot of the heavy lifting around sensor support synchronization providing an api or export a facility etc um we are doing now a total technological a total rebuild from the bottom up because the stack we're using now which was chosen in 2012 2013 has really reached the end of its useful life it's it's stable it's in use it's been deployed at for about for over 60 workflows at about 35 projects but it's time for something new and what we're looking to do is maintain the kind of flexibility that I've described previously but do so in such a way that is cross-platform right now we're android only so cross-platform more performant than what it is now that currently we use a relational database that run that that is quite generalized so it we hit the kinds of performance bottlenecks that you'd expect with that so we're looking at other solutions that aren't relational and we're looking at again making it cross-platform eliminating dependencies to the extent that we can improving improving security and doing a number of other things so that that again are mostly around scalability usability we're looking at we're looking at producing a web application that if you're familiar with qualtrics or maybe line survey that that provides you a GUI along those lines to do your customizations that will then generate that JSON file that I had mentioned previously so that's um that's a basic rundown um and I think I've used up my three minutes so thank you excellent thanks so much John I'm sorry I forgot to mention at the start so we've got boy take up next um I forgot to mention at the start that please um participants please put in your questions now type them into your question box and we will try and answer them at the end but if you start entering them now in the order of the speakers that would be really handy so I'll hand over now to boy take and he's his web plan is not working but he is online thanks Kerry so my name is boy take because in scheme here to present the Australian characterization commons at scale so characterization is the process of probing and measuring the structure and properties of materials at the micro nano and atomic scale and it's essential across natural agricultural physical life and biomedical sciences and engineering across a wide range of fields and domains and it is represented in the increase in the increased field through microscopy Australia national imaging facility and so and significant investments of course through universities and other institutions as as well we do have a strategy that we've used to inform um this project and you can download that the link is in the bottom right hand corner but the three major aspects are scaling complexity working with digital objects is challenging and expertise is rare and so the project itself is addressing these through a national infrastructure program by helping to make characterization digital objects fair or more fair and a national program to spread knowledge and underpin change so the actual program itself is is quite broad across many many different locations across Australia including three increased facilities and it is quite broad because characterization instruments are very distributed across Australia they are typically in laboratories what we're doing is deploying what we call the characterization commons and this is a suite of or an ecosystem of computing systems data repositories workflows services and all connected with instruments it is not one thing it is a wide range of capabilities that are deployed across four locations in Australia alongside that we have three specialized programs in uh big data electron microscopy and correlative microscopy in biomedical imaging collections and national tools for scattering and beyond as well the way that we're deploying it or or um the the things that we're actually uh we're actually doing are on the left hand side here and and some of the techniques and and technologies that we're using are on the right hand side and some of the things certainly that we'd be very interested in talking to people about I'll I'll go over the the items on the on the right we're particularly doing a lot of work with instruments so we have been underpinning and integrating instruments for many years and we have various tools and and tricks that we use for that and and and environments we provide much of our software through a remote desktop environment available to researchers on the cloud and we do develop environments to provision remote desktops we're planning to do a lot more CICD type work and containerization to make what we do of a higher quality more redeployable and easier to share across other environments and landscapes as well we plan to do a lot of data movement because we do do a lot of instrument based work we have the need to move data from instrument to somewhere better so we will be working with a lot of instruments sorry a lot of data movement tools and we're also looking to integrate with institutional research management and repositories and that's really because we consider that essential to our sustainability we want to integrate with the actual institutions that are supporting the instruments locally thank you happy to take any questions at the end thank you very much for today and now we have Louisa John so thank you very much for the invitation to present the e-research institutional cloud architecture otherwise known as Erika project the problem that Erika addresses is one we're all pretty familiar with I think there's numerous examples of concerns around data breaches relating to personal data the original idea for Erika arose out of the issues that health researchers face in particular in trying to use large-scale health records such as MBS or PBS claims or increasingly electronic medical records that come out of hospitals clearly there's a massive amount of research value locked up in those data but traditionally there's been many many barriers to actually gaining access to them rightly because of the need to protect privacy but increasingly there are also issues relating to the size of the data and the computing capacity that we actually require to apply techniques like machine learning to these data which exceed the capabilities of existing secure platforms so Erika is implements one component of the the five safes approach to protecting the privacy of sensitive data which that diagram shows and in particular the safe settings component is directly addressed by Erika but it also provides mechanisms for implementing the other four safes as well for example we have a safe research or training program and I just say that I think Steve McEachan is speaking later about the cadre project which is a sister project of Erika and is also relates to setting up infrastructure to operationalize those five safes principles so what is Erika Erika is basically an orchestration framework it's infrastructure as code it's completely virtualized as far as we know it's the first facility internationally that actually utilizes public cloud computing in a way that is secure enough to meet the requirements of Australian Government data custodians it uses Amazon Web Services or AWS and as you'd all be aware there's a huge number of features there's immense scalability and with AWS increasingly you know a whole range of new products and services are released which we can take advantage of currently we're already offering different operating system and workspace configurations a range of different options for high performance computing and there are multiple storage and pricing sort of options very quickly there's four major components project workspaces virtual desktops through which the researchers access those workspaces importantly for sensitive data we have ingress and egress portals that provide complete copy audit trail and customizable permissions in terms of who is allowed to perform various activities relating to file upload and download and then we have a system administration application because Erika is entirely virtualized there can be multiple instances of Erika each one can host more than a hundred projects and currently there are three that are active one at UNSW Sydney and at UNSW we basically maintain the code base for all of the instances which basically are sort of clones of one another the other two current instances are the Australian Institute of Health and Welfare their instance is called the secure remote access environment or SRAE and then recently the New South Wales data analytics center which is a New South Wales government agency has also established an Erika instance so what are we doing with our ARDC platforms funded project we're basically growing the national Erika network funded as part of the platforms project we will be establishing three further Erika instances one in South Australia at the Samrey Medical Research Institute one at University of Melbourne one at the University of Western Australia it takes about six person days currently to establish an entire Erika instance and we hope to have those three new instances up and running either late this year or early next year noting that COVID has set our plans back slightly in terms of being able to employ staff to work on the project we'll also be exploring porting of Erika to run on Nectar cloud nodes which may reduce costs for some projects and then there's a range of enhancements to the software which basically are aiming to increase automation and reduce costs and in particular custom configurations for machine learning are becoming increasingly sought after by researchers who are using Erika and at the moment we don't have highly automated processes for project archiving and restoration and we'll need those and then we're also working on developing five safe enabled project governance pipelines policies and procedures that are specific to Erika but also will be harmonized with the work that the Cardray platform is doing which you'll hear about later I think in particular in our governance work we're going to be focusing on research using cross jurisdictional data it's a big issue in Australia that different jurisdictions have different privacy legislation for example so we need to develop some governance and policies that will allow us to bring together data from Commonwealth and all of the state jurisdictions then the other thing we're interested in is using Erika for international collaborations what are the the legislative and other requirements for researchers who are based in other countries to access Australian data and we'll also be exploring the potential to set up Erika instances in other countries because this is possible wherever AWS is operating and the bottom of that slide you can see the partners in the project thanks so much and happy to answer any questions thank you very much uh thanks always okay so now we have Ryan with the Australian imaging service so I'll be talking about the Australian imaging service which as the name implies focuses around image characterization but I guess a bit differently to the ACCS focuses much more on clinical dark data so roughly one third of the Australian population gets radiology images every year and about 3.3 billion is spent on obtaining these images to put that in perspective that's about four years worth of the cumulative NHMRC funding for the entire country now wouldn't be spectacular if there were a easier process to actually use some of that clinical data for research purposes and that's part of what this project aims to facilitate one of the options though with using that data is a lot of the tools and processes for de-identifying the data and moving it from a clinical site to a safe research site or very ad hoc they're almost always built from scratch by the research group which means they're very prone to errors last night and talked about the ones using Dropbox outside of clinical formats there are also a large variety of data formats with each vendor having a different one which makes it very hard to work with and of course there are a lot of related modalities to an imaging session, electrophysiology, blood work, etc that you would also want to handle in the same way and lastly there's a lack of consistent quality assurance and provenance on the metadata which makes it very hard for secondary reuse because you don't know how trustworthy the data is so our approach is to build a distributed national federation of research repositories that are linked directly to clinical instruments and analysis tools so four mainstreams the first is we want to de-risk the data transfer so by using standard technologies to bridge the clinical and research sites we can have much better control over the de-identification access controls and importantly auditing so the clinical sites can actually see not just the first researcher but all subsequent reuses of that patient's data the second is making all of these excuse me notes trusted data repositories having national imaging facility personnel approve quality assurance and quality control processes reproducible analysis pipelines and importantly in-depth reporting to see who's actually doing quality control who's doing the proper analysis etc third the technology we're using Xnet has historically come batteries included the icon so it means it's more or less plug and play with clinical settings but as I mentioned there are a lot of modalities that aren't included out of the box so what we want to do is expand the support for arbitrary file types this includes cutting-edge imaging modalities where the researcher may be creating their own file type and adopting open formats that the community has decided where that's that's the way to get forward and lastly no platform should stand on its own so we want to integrate with an ecosystem of tools used around imaging and radiology research it's cut off a bit but this is in two categories one internal so everything can be initiated from the browser from within the Australian imaging service and the second is external where we provide an API and authentication mechanism to integrate with other platforms and services so lots of colors what the Australian imaging service is is all the capability inside the dotted blue line so we deploy upload tools and de-identification tools at university sites and clinical sites that use a standard configuration and de-identification profiles for the entire federation the useful push to individual nodes using the Xnet platform which houses internal viewers for viewing the data a pipeline engine which leverages Docker, Kubernetes, Gadgetron and Clara for image reconstruction, machine learning in any prepackaged pipelines that might be appropriate and importantly an API that allows us to integrate with other platforms such as RedCap the characterization virtual lab SERP which is the I guess an equivalent of Erika used over in Wales and the UK which uses Xnet as the main radiology component and importantly you and part of this talk is how we can integrate so there's a whole API that we would basically want to act as a national service to connect to the instruments and data in a secure way to be available inside other platforms and that's it. Excellent thank you so much Ryan and now we've got Professor Stuart Barth. Thank you Kerry so yeah I'd just like to spend a few minutes talking about the Australian Transport Research Cloud so this is a platform grant that's been run out of Oren which is the Australian Urban Research Infrastructure Network a increased facility but includes a multitude of collaborations with other universities across Australia and what we're trying to achieve in the HRC project is two things one is to start to develop and build a generic capability for modelling and simulation across cities and urban areas within Australia and one which is scalable interoperable and develops an ecosystem approach to model coupling and analytics and in this particular platform grant doing that via a demonstration of how we can start to bring together different transport models analytic capability and relevant datasets into a scalable cloud environment so specifically about the ATRC project we're focusing on transport because we recognise it's a challenging area around and requirements for infrastructure that we require cost effective and reliable and resilient sustainable multimodal systems to reduce emissions and to reduce congestion and it's only by being able to generate evidence around our performance of our transport systems that we can start to develop coherent policy and investment plans into the future and all of this relies essentially on high quality timely data and the ability to be able to use that data to parameterise, calibrate and validate interoperable spatial temporal models and analytics that allows us to understand the performance of our transport systems so what we plan to do or will be doing in the ATRC project is we will be focusing on providing new high quality multidisciplinary datasets for transport research across Australia ensuring that that data is discoverable accessible and also critically for the modelling and the analytics actually presented in a manageable and ingestible way for the analytics and models we will on the analytics and modelling side be developing approaches for easy access to open source transport analytics and modelling tools from across the different research groups in transport research across Australia and developing easy to use interfaces between data and the models and the analytics in order so that data can be ingested easily and straightforward into the models and in order to do this our platform or the infrastructure that we'll be developing will be based around easy access to the data so we'll be focusing on accessible data based around fair dataset principles and then also the development of cloud enabled transport analytics and modelling that can leverage off those datasets and so the sort of if you like the little bit detail on actually how we're going to do this and Oren has an existing metadata catalogue which is extremely rich in terms of the disparate datasets across Australia it has access to with regards to urban areas and cities however we intend to extend that to the DCAT data catalogue recovery standard from W3C which will increase the accessibility and discoverability to federated data across multiple sites in relation to transport data we're also going to extend Oren so that it can handle the emerging data standards and metadata standards around transport data so GTFS and data that's coming out of ASGS as well and around the statistical data for Australia and take those datasets and allow them and meld them into manageable data queues from APIs that allow us to present those to the models and the analytics and then containerizing and cloud deploying the actual tools the transport network analysis tools and modelling tools out into as cloud services and for the research community the other area that we're very keen to look at is around interoperability of these models and of the analytical workflow and so we'll be looking at adapting dupe to notebooks and also using approaches like Apache Argo to allow us to event driven workflow containerization of the models in order to make them interoperable between each other so the overarching ambition of the HRC project is to develop this cloud enabled modelling and simulation framework around urban data science and focusing primarily in this piece of work around how we do that for transport research so that the tools and the models are made more widely available to the research community across Australia and if you'd like to find out any more about the the ATRC project then please feel free to either contact myself or Michael Rigby who is the ATRC technical manager. Excellent thank you very much Stuart next we have Nigel Ward who's going to talk about bio commons thanks. Thanks thanks Kerry I'm going to spend the next three minutes doing two things one is decoding the jumble of words in the title for our project there and the second is maybe identifying some areas of potential collaboration with other partners so I'll start by defining bio commons it's a multi-year multi-partner multi-million-dollar initiative looking at building digital infrastructure capability for the life sciences it's got a variety of partners a subset of whom are on this project and you can see the logos for them below so why does bio commons exist well like many of you life sciences or like many domains life sciences experiencing a data explosion rapid advances in sensing technologies speed of sensing technologies capability cheapness of sensing technologies meaning we're seeing a data explosion like the one that you're seeing in that graph on the right which is showing exponential growth in the data being life sciences data being contributed to international repositories so bio commons was created in reaction to that the platforms project I'm talking today about today is is addressing three problems that arise from that data explosion bio commons is addressing many more so the three we're trying to address with the with the platform program project first is around providing researchers with access to the methods and tools and techniques that can help them analyze those data that I just described and deploying them on a variety of different infrastructures the second challenge is to help researchers actually access those data we've discovered that the data is is very distributed researchers might have it in their own lab they might have it in the institution it might be generated by a national collaboration like the bio platforms os mammals initiative or it could might exist in those international repositories we want all of that data to be accessible from our platform the third challenge we're addressing in the platform project is providing compute to underpin those analysis tools on that data um uh the Australian computing environment is quite complex there's no computational environment that's particularly tailored for bioinformatics and bio and life sciences so we're providing access to we aim to provide access to institutional national commercial and even someone's laptop underpinning this platform so that's the three things we're trying to do how we how we're doing it and here I'm decoding two other two other words we're building what we're calling a bring your own data environment and that's where the research you can bring their own data and connect it with reference data with those tools and techniques and the compute infrastructure and we're doing that through three three work programs which are building on existing existing services hence the expansion and title of the project we're building a bunch of GUI tools graphical user interface tools on the on the web to access commonly used tools like the galaxy workflow engine we're building a a companion command line interface to to that graphical interface for researchers who who want to tinker with tools a bit more or might maybe extend the sort of analysis that can happen and we're having a major investment into getting data from all of the instruments that distribute around the nation through national investments into this platform there are three activities we're undertaking how are we doing that we have a we have a principle that we're not building anything new we're going to adopt existing technologies that are that are being created internationally maybe influence how they how they look internationally better adopt them and deploy them here rather build anything new and rather than go through all of that that word jumble here I'll just identify two areas that are for potential collaboration the galaxy workflow engine I mentioned earlier is a generic workflow engine that was built for bioinformatics that can be and has been used by other other communities we really interested in looking how we could redeploy it for other communities we're working with Arnett as a number of you are as well while using their cloud store program our platform as a way of getting letting researchers bring their own data in and out and getting data from those instruments into the platform as well it's going to be interesting collaborating there we're working with a number of international projects such as the CERN virtual machine file system and the genomic alliance for sorry global alliance for genomic health data repository service on distributing reference data internationally and tools and workflows internationally and we're using a bunch of container technologies to provide access to the tools and making sure that they're deployed on a variety of back-end research infrastructures things like bio containers workflow engines like snake bank and next flow that sit on top of Kubernetes so hopefully I've de-jumbled the all of the acronyms in our in our project title and giving an identified a few areas where we could potentially collaborate so I think that's three minutes Kerry over to you thank you very much Nigel okay so next we have Alyssa from EcoConnix actually we might go on to the next one because I think Alyssa may have dropped off so I'm going to forward these slides and we'll come back to her when she can come back on so next one hi hi hi kv thanks I'm Kumati I'm from Monash University so I'm going to speak about the environments translate the machine learning machine learning based discovery here so this is a collaborative project between Monash the data science platform massive and the university of Queensland so this project largely addresses the challenges of researchers applying machine learning or researchers developing machine learning techniques as a part of their research so as we are all aware the machine learning techniques are used widely across multiple range of domains right from neuroscience economic sciences to art and and business so the this started with the ARDC discovery activities as part of the discovery activities we conducted a survey across multiple institutions within Australia and New Zealand we surveyed 128 research groups out of which 68 research groups responded the survey largely focused on understanding the challenges of the researcher phase and applying the data science and AI machine learning to their research so the the graph summarizes the kind of challenges of the researcher phase the key challenges identified obviously as a compute capacity and the environments to do their research data accessibility the availability of data sensitive data sensitivity associated with the data skills and expertise and such in applying machine learning and other other researchers other challenges identified are the the knowledge around techniques and and the communities itself next slide please sorry if I can just add one more there so the the report from the survey can be found here there's a link which has been provided there and it's it's the report can be downloaded so based on the challenges identified we put together a program of work for the platform project which primarily aims at first accelerating the accelerating the research by first providing an ML environment in which the researchers can quickly to their research or possibly fail as well and and understand how how we can accelerate that part also promote the interdisciplinary research basically around the tools and the techniques in the libraries which researchers from different domains use and also to see how we can scale this across multiple national partners like NCI and POSC so the program of work involves around four major activities one is developing that integrated development environment itself which has all the required tools probably some reference data SDKs and libraries um two is also improving the end of the day the the the the underlying infrastructure are these expensive high performance computing so it's more about now utilizing the HPC itself in the most efficient way and also having that knowledge available across the multiple HPC centers and administrators three is about providing targeted training to the research community more around tools and techniques and also around um and also around developing these um developing these tools and techniques uh appropriate for their research forth is building the communities of practice um either around the domain of research or around around the tools and techniques they apply in in their in their particular research so we created three work packages as part of this platform project the first one focuses on developing the integrated development environment which first is providing an interactive desktop developing analysis workflow suitable for every kind of task the researcher perform three is improving the link to file systems on HPC environment and also container containerization the work package two focuses more towards efficient use of tools and and and um and knowledge around the tools and how do we provide support around these tools um so um which which evolves around creating documentation and user guides for researchers to use um the this work package also involves creating data catalogs and right libraries which can be readily available these are SPT reference data sets which which could be used across multiple um research areas and domains this also focuses on creating identifying tools which are not currently available within these environments and and and creating a catalog of tools and supporting them the third work package is largely around upskilling the community itself based on the based on the based on the survey outcomes which identified specific set of tools one of the interesting outcome from the survey was um 80 percent of the researchers indicated that there are five common tools or software packages that they use so which means that if we if we can and they also indicated that there are challenges around applying or using those tools and techniques so the focus largely is on uh those identified um five plus five 10 tools which are machine learning as well as the data analysis part of um staff um so the programs are tailored around uh particularly particularly focusing on those identified set of tools um but that also comes along with um workshops and and other collaborative events through which the the researchers and users can come together and identify common challenges and and solve and so so solve them together um that that leads to the development of community of practices so um some of the outcomes from these activities are largely understanding what what are the common um so for example if around computer vision what are the common set of tools and challenges to use this phase and how that can be um aggregated and provided as a as a um beginner training and and which has some some sort of follow-on approaches to providing intermediate and advanced courses um that's it for me thank you thanks so much come on okay so um hi Lisa oh yes i will go back uh uh we have technically nine minutes left um but i'm hoping that we can uh just keep going so all right take it away oh we can't hear you okay now we can hi everyone um and thanks gary um well with eco commons we're developing the go-to platform for researchers who are looking to find solutions to environmental and ecological challenges and hopefully i'll be able to convince you why this is needed um so the current environmental challenges that our society has to deal with um they're all incredibly complex and this is these are all examples where people have to take biodiversity data and put them into an analytical workflows to derive solutions and data exists in many different forms and shapes and formats and is offered by a whole variety of different providers and choosing the right data set um and the data quality that you need for the problem that you have at hand really depends on um what the problem is that you're dealing with so imagine your researcher how can you make the most out of this very data rich world that we live in and this is where virtual laboratories come into play to make your life easier and reduce the time that you have to spend on data wrangling and configuring models and more time on actually solving these um environmental challenges so eco commons in eco commons we built virtual laboratories to bring all this great data that already exists all in one place to connect it with published methods and tools and analytical workflows and back this all up by high compute um capacity in cloud storage and so this slide is showing you how we do that in eco commons we have uh three different streams um that um the team has been working on now for a couple of years so the first stream is a place where we develop new models and also we test them and this is a command line environment that is linked to high performance computation and cloud storage um and we use it to develop new models uh and put uh put them into microservices but also our users can propose new models that um they express their interest to turn them into microservices um then we have trusted trusted um domain focused trusted models um and platforms that already utilize those models that have been peer reviewed and accepted adopted by the scientific community and we have developed several virtual laboratories where users can go and use a point and click interface that is super easy to use but still scientifically very rigorous and finally we are developing custom made decision support portals for specialized users currently our specialized users are users from the government so it's a very policy driven component that eco commons stealing with um so we are developing specialized virtual laboratories for users that would like to use the models and the outputs that already exist in order to support their decisions so this is eco commons in a nutshell thank you thank you very much Lisa so now I think we have Steve Kinnett with uh Australia's drone cloud drone cloud thank you yes and my name is Steve Kinnett I'm the chair of the Australian Scalable Drone Cloud Steering Committee and the CI from the host organization and I say the host organization because this project is really about its five partners their journey to digitize and and make things more fair and across those five organizations we're covering a diverse and disruptive national research community um and I'll show you some of that in the tick um drones are really interesting and exciting because they're sensing um at this critical scale gap between what we can do from planes and satellites and what you can do being physically on the ground uh and because of that becoming quite pervasive and and involved in many disruptive research agendas uh we don't want to see all the organizations repeat the same work and so we've chosen to work together uh to um make this common framework um so I haven't done two ways what we'll do is on one level it's all about best practice around drone data analytics or processing and it'll be driven by five use cases from these five partners that straddle both fundamental research applications uh industry based applications and national applications and what I've been mean by that are the ones by the increased capability areas and things along those lines at the same time we're dealing with another axis which is some of those use cases are trying to build new pipelines taking in new sensors and new tools so it's really at the peak end of doing something really really bespoke that you can't buy from uh or or get through open source things so we're not writing the open source codes but it's about tying those together um and on the other end of it it's uh around some of the research agendas enabling the Australian public to be able to contribute into commons and and pipelines and things like that so it has to really scale out so we've got this really peak and long tail type of ecosystem we're going to try and um provide the technology for and the common basis that enables uh the fairness and the best practice to to straddle those two axes um and so our approach then is taking these five use cases from three national initiatives so it's the APPF the Australian Plants Affonomics Facility Oscope which is if you like the geoscience community and and we just heard about the eco commons and clouds and that's really a part of a lot of what Tern does and the eco terrestrial space similarly we have facilities that build and drones and do campaigns actually run the drones for you like the Monash Drone Discovery platform and we have government agencies like CSIRO or major research labs who have a lot of industry links and things along those lines and we're working together um along these pipelines uh so just to wrap it up to give you a bit of a feel um in in terms of architecture I'm not expecting to be able to read this but just there to give you an idea uh there's a bit of a map that takes all of the uh adopt uh adapt and and develop uh technologies that that the partners have suggested that will trial and and and to create this underpinning um a pipelining environment that allows both that peak VDI based advanced tools that push things off the HPC on one end and at the moment the pilot example of that uh was done by oscope and it exists on the research cloud right now for example and down to the other end which would be heavily Kubernetes based cloud native uh leveraging things like say eco cloud or other Jupiter as a sort of service type of initiatives and things which are specific to drones such as open drone map and those sorts of things so they'll be tested and played with and trialled against these five pipeline uh primary use cases uh and what you know the other things that the the community pushes others during time throughout time um and to do that to go through our journey there is um a project plan that's on the right hand side really is a journey of uh five to six sprints an uber sprints um that in and we're coordinating the national community that's involved in this into this sort of tea a two-week sprints within those uber packages to to to deliver the whole program um and what's really really important is at the end of each uber sprint uh we re-engage with the user community as in those pipelines at steering committee uh to ensure that the technologies are appropriate um and and do a valuable role in the pipelines and so that's our project thank you thanks very much Steve um and now we have a cadre project so we have another team presenting hi Gary it's Dr. Stephen Kekron here from the Australian Data Archive so i'm presenting on the the cadre coordinate access for uh data researchers and environments uh project our project is i suppose a little bit different you know to some of the other platforms what we're interested in is establishing a platform for enabling uh coordination of access to um uh to data for researchers um uh basically an authority and um approval uh domain for for that so um the Office of the National Data Commission is actually establishing a new framework and foundation legislatively for access to government data in Australia what but there isn't basically an implementation framework uh for this so while we might have the the principles in place um actually be able to implement this uh either procedurally or technically uh the the framework isn't available um so the sorts of challenges we're looking to address in um in this are really dealing with access procedures how to tie those into um technologies and access controls um and mechanisms for storing and analyzing data um let's say in the absence of those um our argument is that the expected value that you can get from improved access will not be realized and you potentially have potential further undermined trust because say you've you've promised an error unable to deliver access to the content that's available okay so uh the aim of the the project is a basic set of outcomes say is to bring this together in social sciences humanities and related disciplines um for governance creation management and sharing of sensitive data uh or trying to produce a conceptual framework which um and that's really what I'm going to talk to in terms of our our solution that's connected in the five states which is a an established framework now for understanding the different elements of um uh the design of access to sensitive systems so the five states people projects data settings and outputs uh what we're looking to do is establish a mechanism for agreed identifiers for each of those elements of the five states accreditation protocols for the five for each of those five safes and then information exchange protocols um for those um uh for those five states indicators as well and then an access management platform and pilot integrations across a set of um for secure access settings so some of those are already on this call Erica uh and Oren as well as Arnett and the data co-op's group at Swinburne and the Australian Data Archive where I'm based the project's got about 10 or 12 partners involved so I haven't brought in all the logos in here but we're across a mix of e-research providers government agencies and universities so our basic is say this isn't a technical architecture but what we're looking here is actually an architecture for thinking about the information exchange that that we're interested in and some of the tech the the standards and technology that we might be using to do that so you know we have involved um AAF as one of the the core partners as well uh we're trying to leverage some of the the principles that were established through the establishment of AAF and see if we can extend those to thinking about the types of identifiers that we might apply to the five states program and this is some of our initial thinking where those identifier systems might come so there's three three basic elements to each of those five domains people projects output settings and data so the the the SAFE itself is indicated in the in the circle the type of identifier and these provisional ideas for the identifiers we might use uh orchids and DIYs on data DIYs on outputs um raids potentially for the for projects uh and we're we're certainly looking for an identifier system for applying to SAFE settings that is the stack of hardware and software that might be associated with the delivery of a service between a domain and then we're the the other element to this is or what are the attributes that you associate with those identifiers so what are the accreditation systems that we can use and how can we transfer that accredited information across organizations we have models that in place things like grant applications in the project domain particularly we're picking up on some work done in the US the researcher passport system established by the university consortium on political and social research at the university of michigan as a as a model that's been adopted in the Australian in the US domain and we're looking for similar sorts of frameworks across those other domains that's it for me thank you very much steve thank you to all of our presenters today so we have three questions that have come through um if you can i know we've run out of time but if you can bear with me we will i'll stop sharing my screen um so the first question that we have is uh for avoiding um if data curation is a second priority to publication and data is often non-reusable does that impact on the capturing data provenance information in the platform um i i'll i'll try to unpick this a little bit so um i don't know whether data curation is a second priority but it's probably worthwhile to say that data capture and data analysis are primary priorities of the project um so we consider it really important to capture data at the point of the experiment and by doing that we do a lot of things we're able to actually capture metadata when the researcher is actually performing the experiment and is in a good position to be able to record the metadata about the experiment and we're also able to move that data pretty much immediately to somewhere more useful which then alleviates the mechanics of i i guess some of the challenges that you actually get with data curation so um you know data curation to me can mean a number of things and where the way that we approach it in this in this particular project is to simplify some of the steps and the mechanics that you then need to do to do good data curation i wouldn't call it a second priority and we just call it that our first priority is on the some of these mechanisms that are still a challenge when you're talking about very large data sets okay thank you so uh second question uh for louisa what is the significance of the us cloud act on the sovereignty australian sensitive data stored with aws or other american-owned service providers so we still have louisa on no we don't we'll have to take that one on notice um so louisa said to go sorry okay so come out if you're still there i can you ever can you elaborate a little more on the community of practice for machine learning for house research please yeah sure so um with the ass for example um some of the faculties like law are interested in applying um natural language processing for their research so they've got volumes and volumes of data which needs to be analyzed or which needs to be first converted into structured format before they can apply any sort of mission learning to them so um there are similar interests from other communities within has which are looking at converting uh data from unstructured structure data so uh one area of community of practice would be around uh national language processing and and and relevant mission learning techniques which can be applied on those data so uh that's one example um but as we as we go through i think we we've been understanding more from the house community okay thank you very much well thank you so much to all of our presenters um that was really interesting uh as i said at the start this was recorded and all the slides will be made available i'll send a link out to everyone who's registered um just showing on this last slide if you would like to read a little bit more about the projects they are all on our website at that link there so thank you again thank you for your interest in the program have a great day