 Thank you everyone for joining us for the final Projects Exchange session. This is the second session for the platforms. My name is Kerry Leavitt. I think I've met most of you. I'm the Platforms Program Manager. So we'll get right into it. Our first presenter today is Bernie Pope. Kerry, do you want me to share my screen? Oh, sorry. Am I not sharing? Oh, sorry. I thought I was. Great. Thank you. So I'm going to be talking about the global technologies and standards for sharing human genomics research data. A short title is the Human Genomes Platform. And my name is Bernie Pope, and I work with the Australian Biocommments. And I'm also at the University of Melbourne. And you can see our partners. There are many partners in this project. I'm not actually going to list them all in the interest of time. You can see their icons there. Next slide, please. So the problem that we are solving is that there are tens of thousands of human genomes that have been sequenced in Australia to date. And this is predicted to grow rapidly in the next few years, tenfold by the year 2025. Global projections are that to estimate 50 million human genomes will be sequenced by the end of this year. Large cohorts are often needed for statistical power in research, especially in rare diseases and cancer. And this necessitates national and international data sharing because individual research groups rarely have sufficient cohorts on their own. And they need to share data with others. Current systems in Australia for storing, analysing, describing and sharing human genome data are bespoke in-house solutions. They have manual laborious systems for granting access to bona fide researchers to obtain the data and using their own analyses. And generally do not conform well to fair principles. Human genome sharing requires high levels of assurance around data access controls due to the sensitive nature of the data. Current inefficiencies are hampering research progress and reducing the value gained from nationally funded research genomics projects. Next slide please. So in this project, we propose a solution that we are describing as a services toolbox, which is based on emerging global standards. And it is comprised of five features. The first feature is a system for identifying virtual cohorts of genomes nationally. So this describes a method whereby individual research institutes can make available the metadata and data that they have in their genome cohorts. And we propose a mechanism to make that searchable by other researchers in the country across those cohorts together, virtual cohorts together. The second component is semi-automation of data access request approval. At the moment, data access requests are achieved through a data access committees which are located with institute data. These processes are generally quite laborious, manual and time consuming. And so we aim to introduce systems that will actually improve these processes and reduce the cost to people being involved. We are investigating as a third component authorization and authentication systems that are appropriate for human genome data sharing, and that address the sensitive data requirements. In the fourth component, we are looking to develop streamline methods for metadata and data upload to established international genome repositories. So this is where research data that is located in research institutes in Australia is then uploaded to international, standardly used international repositories for sharing in the world. And the fifth component is documentation and training. The outcomes and benefits of this and impact of this project are threefold. The first of those is that this project will vastly improve fairness of Australian human genome data. The second component is that genomic data from thousands of Australians will be shared securely and responsibly nationally and internationally, ensuring their full research value can be realised. And thirdly, the platform will provide a working template that any institution in Australia can adopt and deploy. Next slide, please. So how will we build this and what technical architecture are we using? So I'll start with the right-hand side, which is where possible we'll be adopting technologies that adhere to standards being established by the Global Alliance for Genomics and Health, otherwise known as GA4GH. The GA4GH is a policy framing and technical standards setting organisation. It seeks to enable responsible genome data sharing within a human rights framework. This consists of 600, actually more than 600 members across not more than 90 countries. And several of the partners involved in this project are members of the GA4GH. Across the actual sub parts of the project, we'll be deploying various technologies that have been developed nationally and internationally. These include the Gen3 system for data sharing and developing virtual cohorts. That was developed by the National Institutes of Health in the USA and also Vectis, a similar system that has been developed in House at Garvin. For data access approval automation, we'll be considering the resource entitlement management system known as REMS, which is developed by CSC, IT Centre for Science in Finland. Also, we'll be considering the data use oversight system called JUOS from the Broward Institute in the USA. For authentication and authorisation, we'll be considering passports and the authorisation and authentication infrastructure from the GA4GH. We'll also consider the Research Service RAZ from the NIH and lastly, CI Logon from the National Centre for Supercomputing Applications in the USA. Lastly, for the component of streamlining data upload to international repositories, we'll be considering to use the EMBL EBI European Genome Phenome Archive. Thank you for your time and that gives us you an overview of the project. Thanks very much, Bernie. Next, we have Tim Chatches. Thanks, Kerry. Excuse me. Yes, so the Australian Cancer Data Network is a project with many players. The three main players are the Cancer Alive and Queensland, the AusCat project at Liverpool Hospital, South-Western Sydney, local health district, and the Savard project also in the same place. We have partners in CSRO, University of Melbourne, University of Western Australia, and many others too numerous to note. And you know that we're missing a logo at the moment. We have a competition in progress and we expect to announce a winner in about three weeks. The next slide, please. So the problem is that we have so much clinical data these days, but so little of it is actually used for research. And as most of you probably know over the last decade or so, clinical information systems have largely, and very much belatedly, compared to other industries, replace paper-based medical records in many of the most healthcare settings, but particularly in cancer care. But despite the vast amount of really detailed clinical care and outcomes data that's now in the captured in those systems in machine-readable form, there's little use of those data for research purposes. The data is certainly being used for quality assurance purposes and healthcare management purposes, but for world-class research there's revenue little use. And the speed and extent of that use has really happened by lack of access to the data in analysable research already formed. The backend systems are vast, sprawling and complex and not something that most researchers can grapple with directly within the lifespan of an NHMRC or ARC-thundered project. And also, very importantly, research really needs to involve multiple sites and research results as opposed to quality assurance or health management results, which is based only on data from one site is almost certainly biased and irreducible and may actually do active harm to patients if it's used to inform treatment. So we need to really be able to use these data to answer questions such as determining the risk of cardiac toxicity for lung and breast cancer patients following radiotherapy and chemotherapy. And that question itself impacts to hundreds of some questions. What sort of variation is there in practising across patient cohorts and treatment modalities in Australia and internationally? We know that cancer care is protocol-driven, but there is a huge amount of variation as to what patients actually receive. And more technical issues such as can radio that is imaging MRI and CAT scan features extracted from imaging be used to help predict radiotherapy outcomes in specific cancer patients. Next slide, please. So what we're proposing to do is in this project, and this is really just a starting point, is to harmonise three leading Australian cancer data initiatives with each other and also with similar projects internationally. So the first and the initiator of the project is the Australian Computer Assisted Theranostics Network. Theranostics is a portmanteau of therapeutics and diagnostics. And they've been running for about 10 years at Liverpool Hospital led by Lois Holloway, who is the chief investigator on this project. Lois isn't presenting because she is ill at the moment. But they have established a platform for distributed machine learning across data drawn from quite a few different CATs and good information systems around Australia, which is completely interoperable with the EuroCAT initiative based in the Netherlands by drawing similar data from hospitals across Europe and a few in the US. And that's been running for about 10 years and they do machine learning on distributed data routinely. But it focuses very much on radioomics and radiotherapy aspects of cancer treatment. The second one, which is really the elephant in this project, is the Cancer Alliance of Queensland, which is a clinician-led consortium, but it operates the Queensland Cancer Registry on behalf of the Department of Pilth. It collects core data on every single cancer diagnosed in Queensland residents, as well as operating additional databases which links those data to lots of other core clinical, diagnostic and outcome databases. So it's a huge database which has been running since 2004. And finally, let me know that the newcomer is the Cancer Variation Sabah project, which I'm involved with, which is developing next generation machine learning power pipelines to streamline all of this extraction and loading from the source clinical systems directly into common data models, in particular the OBCOMOC common data model and its associated vocabularies. And while doing that, also developing oncology extensions to that model in conjunction with key players overseas. So the goal of all of this is to create a national resource which is interoperable internationally of research ready syntactically and semantically interoperable cancer data that works with both the OSCAD and NeuroCAT projects and also plugs into, operates with the very large OBCOMOC communities, which can not just cancer care, but all sorts of health care at international. And last slide, please. So how are we going to do this? So we're going to go to Scott Stader, but we have some fair ideas. In fact, we have a technical workshop in the fourth week of March, but basically we're trying to use open source software stacks everywhere. There's a lot of legacy software in some of the systems that may not be replaced, but everything new will be open source. Focusing on the usual suspects of Python for all the data wrangling, using things like SQL relational mapping, Metro language processing by the fabulous spacey library, which is written by an Australian SQL backend ignore stick. But very much the focus is on code sustainability because very easy to set up data extraction and transformation systems, which involve lots of different statements and a very, very hard and time consuming and labor intensive to maintain. So we're aiming to wherever possible to use both simple and somewhat advanced machine learning methods to transform the critical data into our target common data models. And that the reason for doing that is that enables the content matter experts, that is the clinician, data managers and so on, to participate in an ongoing fashion in the task of data transformation rather than having to engage an endless stream of contract programmers to write yet more how to maintain code. That's our aim, and we're making some progress on that already. And finally, we're very much focusing on making data analysis easy because common data models are great, but they do have a little secret, and that is that they can be hard for researchers to use, harder than just a usual data table. So we're developing open source libraries for both our and Python, which addresses that issue with respect to the cancer data, but we'll probably have use elsewhere. So I've already mentioned the very standards. We also get Euracat interoperability specifications for the streaming learning based on RBF triplets and sparkle queries and the common data model and vocabularies which subsume very large sets of standard vocabs so SNAMED CT, ICP, RxNorm, Loink and so on and so forth. So it's very much an international and a vocabulary set. And then finally, we're leveraging other ARVC further projects. So we'll be using the Euracat cloud host to secure remote access health data analysis environments that are being set up on a platform project that was funded last year by ARVC where we can actually pull the data and do collaborative analysis on it. We're working with the Cata project around fire safe governance, workflow, streamlining. All of this will be RAID enabled. I won't explain what RAID is, but it's another ARVC initiatives for identifying research projects. And then in the second phase of the Sander, which is another ARVC project to make better use of clinical trial data, we'll intend to participate in that. I think that's probably my time. Thanks very much, Jim. It's great to see all the linkages between the different projects. So now we have Anita Cannon. Thanks, Kerry. Is that better? Okay, sorry about that. Thank you, everyone. I'm Anita Cannon. I'm the Director for Research Platform Data Strategy at Monash University. First, I want to thank ARVC for co-investing and partnering with us on this project because it's a generalized commitment to data governance. And we've got several partners up there and thanks to all the partners for getting involved with this project as well. On to the next slide, please, Kerry. Thank you. Before I get into the problem itself, I want to quickly touch upon what data governance covers. Data governance itself is a term that includes the policies, roles, and responsibilities around data collection, data management, use, and data protection. And this is more important than ever for sensitive data. Now, in terms of the problems that we are trying to address here, data governance is critically important. But it's also quite resource-intensive and time-consuming. And it's critically important because there are there's a lot of shared risks between the researcher, the research project, and the institution. But it's a challenge to address at scale for an institution. And the scale is an issue from a few perspectives. One of them is the variety of data sources that provide data that support research projects from highly sensitive data sources. Collaboration across multi-institutional and multi-jurisdictional research projects can be challenging. And research infrastructure capacity for secure platforms can be a huge overhead. And finally, big data and sophisticated computation requirements for projects like AI and ML over genomics and imaging data sets can be challenging as well. So scale is definitely an issue on several levels. But what happens if we don't actually lower the barriers for data governance is that the data gets locked away in silos and research groups don't really make data fair. And without access to data, all research starts becoming significantly constrained. At the moment, there aren't many commercial solutions or national platforms that provide that secure trusted environment that uses existing institutional research infrastructure and building on infrastructure like the Nectar and the Research Cloud. So on to the next slide, I'll talk about what we are looking at doing. So this project is going to enable a partner institution to establish the secure e-research platform or SERP for short, which is a comprehensive platform that automates data governance in a way that can scale. It's a combination, in our experience, it's been a combination of technology and processes that reduces the burden of data governance on research groups and it also de-risks the institution, particularly given our obligations around data protection and data breaches. In fact, some of our key research users now tell us that they are able to sleep at night knowing that their data is protected from misuse and they've been managing this in quite a resource-intensive way now for many years now before we had access to SERP. So there are two parts to SERP. It's in that box within the dotted lines there. There's a data custodian component which provides data governance and management. It allows data to be brought in, data to be linked, allows data catalogs to be published and also data extracts to be shared very securely with research partners. Now for a research user, it's a second, the data analysis environment that is a remote environment which is configured with tools for analysis and that environment is also controlled, monitored and fully audited. So as a data custodian, they have full control over exactly what data goes into that analysis environment, what data is taken out and every action within that analysis environment is fully audited. There is potential to integrate with other platforms as well and this is something that we would love to explore further and see if we can enable collaborations particularly across jurisdictions and across projects nationally. We've got some key work packages that are going to be delivered as part of this project and I've highlighted the ones in blue there. The first one is the deployment of SERP for partners. The second is onboarding of research projects and the third is establishing a community of practice. The community of practice I think is going to be really useful to share best practices around data governance but also tools, techniques, templates, standards and the second diagram there talks about the impact it's not very clear but that's a pipeline of projects we hope to, we plan to onboard during the course of this ARDC program and across all the partners. And one of the things I want to highlight in that is the data coming in highlighted if you can see this when it's shared is this data says coming from university partners from government partners from industry and as well as health services and government departments. So we see this being useful for research across multiple domains. So this slide is really about I won't go into the details but SERP has been developed at Swansea University and it's the outcome of millions of dollars of millions of pounds of infrastructure investment for a decade. It is very mature and it has quite advanced capabilities. At Monash we've had a research collaboration with Swansea over the last two and a half years to adapt SERP for the Australian context particularly the Nectar Research Cloud and we've demonstrated that it can scale across the entire university. We've got over 400 users across domains, not just health but all kinds of sensitive data looking at using this platform and our partner Curtin University has also developed the link smart component that it's a linkage capability that exists in SERP so we hope to be able to adapt this to further to the Australian context and share that as a national capability. That's it for me, thank you. Great, thank you very much Anitha. Next we have Michael Hall. Hi, so I'm here to talk about the Australian text analytics platform so thanks to ARIDC for supporting our project. It's us at UQ working with people at the University of Sydney and Annette on this project. So the problem we're dealing with it's sort of a problem of connecting up the bits and getting people to use things that are already out there I would say is a very high level summary of it. So in many research disciplines humanities, social sciences, natural sciences we actually rely on text data of some sort so it could be spoken, written, signed, multimodal forms of language and research communication. And so examples of that could be something like we have language corpora and linguistics which are actually fairly structured but you have grey literature policy, government reports and social sciences. You have oral history interviews which are used in the humanities you have field notes which are used in all sorts of disciplines, geosciences environmental sciences and so on. And when people talk about this data it's often called unstructured data as a linguist I kind of rail against that description because actually it is structured but it's not structured in the way that a computer scientist would think. And so we have to kind of match up these different worlds of what we think is structure. But what we could say is across many of these disciplines researchers experience bottlenecks and text data processing getting text data sets into forms that they can work with in a computational environment and also with text data mining how do they get information out of it. There are lots of solutions out there so often I get links to this one of the issues that they're scattered across many different providers many of them are actually commercial or generic and so they're not readily adaptable by researchers necessarily but another kind of part of the issue is that amongst Australian researchers themselves there's this need for accessible training in text analytics so people might researchers might have an interest in it but they're just not sure where to start so there's kind of the idea to bring together users and providers so perhaps the next slide would be good. So our solution I suppose or the beginnings of a solution one shouldn't know the promise so the platform the idea is to bring together users and providers of text analytics by providers I mean or we mean researchers who are quite good at developing scripts for processing and text data mining and there's quite a few people working in the open access open source world who do this it's difficult for us to work with commercial providers in fact we can't because our commitment is to open science but there are a lot of people working who are providers and there's obviously a whole bunch of users how do you connect these people up so our aim is to develop this integrated collaborative cloud based environment we want to use integrated notebooks platforms for the actual processing and mining of text data so the idea is each notebook does a particular task and you string together a bunch of notebooks and different workflows when you're trying to solve a particular problem also I think importantly we want to be developing at the same time training resources and how to use text analytics I think from the very beginning if a whole range of people from different degrees of sort of skill and computational methods if they can't engage with it then we're not really doing our job so we're aiming for quite a wide scope of people we're not really aiming for the professional person working already working in this area they should be able to figure it out themselves actually so our aim here is that broader range of people in the humanities and social sciences and also going out into the natural sciences where there's folk working with text we're wanting to increase accessibility transparency so the way we describe our notebooks is it's like a car you get in and you drive it if you want to lift the hood and see how it's working you can do that if you'd rather just drive the car you can do that too so that's the transparency bit and also that the research you're doing can replicate this research so the idea is really to increase capacity amongst our researchers to use text analytics so the final slide it's actually fairly simple I noticed compared to everyone else's slides which is because we're actually aiming for a fairly simple technological environment it's really about transforming research by people getting to use these techniques we know the promise to them but they just aren't used enough by researchers at many disciplines so the technology is really just Jupyter notebooks which incorporate these open source scripts for cleaning transforming visualizing mining text data we have a data sandbox where researchers can import their own text data sets which remains secure there we'd also like to start working on how we might access national collections through AIF which we can do already in some cases and the last which is actually very simple technology there's a web based online training environment which connects in so the idea is people brought in through a web based training environment you're brought into this mysterious world of Jupiter notebooks let's say mysterious for lots of people in humanities and social sciences and so you kind of lead into it progressively and on the other side this is our kind of conceptual structure at this point thanks thanks very much Michael okay now we have Bill Pasco hi hello everyone I'm Bill Pasco from TLC map at the University of Newcastle on the system architect TLC map is a digital humanities platform for spatial temporal mapping problems yeah so one of the problems we have in humanities is that individual researchers often have very different and idiosyncratic projects and needs across disciplines and individuals and also that although in humanities we can use scientific and mathematical methods we're not limited to that and there's some pretty fundamental metaphysical and epistemological differences to STEM fields which I won't go into in only three minutes but because GIS tends to have a STEM focus or a commercial focus we often find that there's gaps in what we'd really like to do and what is possible to do so I mean you would normally find that because needs are idiosyncratic then you require a bespoke solution but that's really expensive and in humanities we don't have very big budgets so another issue is there tends to be a lack of cultural layers in data sources so for example national map if you go there there's lots of great information about science and various areas in science but there's not much culture and also if you go to Google you'll find lots of commercial layers but not so much in the way of culture and another issue is that in Australia there's a general lack of meaning of place compared to other countries anyway we grow up in homogenous suburbs and go to homogenous offices we don't really know much about the history of a place and its importance and that affects how we value things and how we understand our place in the world and it's kind of ironic that there's that situation since the idea of country and connection to it is so important in indigenous culture and so in trying to address that we do prioritise Aboriginal and Torres Strait Islander projects and First Nations so next slide solutions so these researchers need easy to use tools they usually don't have the time or inclination to learn a whole GIS system or to get deep into mathematics so if we could automate some basic common things that would probably have a great impact for humanities but there remains that problem of idiosyncratic needs it's hard to find things that are common so what we did the different types of digital mapping activity in humanities from natural language processing to virtual reality there's all kinds of different things that's not just putting dots on a map and try to identify what the next step would be in each area and then prioritise all that we don't want to build one thing that tries to do everything or to make a new GIS system and repeat functionality we try to add to or modify existing systems where possible and build new systems where there's a lack by looking at this as a software ecosystem we're able to let each system do what it does well at the same time as ensuring there's a wide range of functionality for different areas so that means we have to focus on interoperability and produce some relatively simple criteria for TLC map compliance so that you can a user can take data from one system process it in another system and then take it from there and put it in another system depending on what their needs are and you might go from text analytics all the way over to virtual reality for example and we also try to ensure that our development are practical solutions by using prototyping and ensuring we don't develop any platform that doesn't have a project as a proof of concept and to put it to a real world test and also make sure that any project we do that in such a way as the software can be reused for other similar projects so in terms of impact we hope that this means there's humanities issue temporal and that the genetic will benefit from being able to easily see and assess and appreciate that research and next slide so in terms of the technical architecture there's probably not too much that's strange or unexpected for people working in research at universities in software development MariaDB MySQL Postgres that sort of thing so probably the thing worth mentioning that are a little unusual are that we contribute some development to allow other systems that are not under our control to be interoperable and enhance some of their features rather than trying to do everything ourselves and well it's worth mentioning also that the strategy of working sort of an ecosystem approach does have the drawback of completely in that we have to learn and support so many different technologies and also I added people there just as a the machines that actually you do need to consider who's going to support that so at all Thanks Bill, sorry you're breaking up a bit so I'm just thinking you're finished Yeah, sorry I hope I just wasn't talking No, no it might be me the wifi can be a bit flaky in here even though I'm in the office Thanks very much Thanks, oops Okay, now we have Marisa Takahashi Hi Can you hear me? Yes, we can Hello everyone, Marisa Takahashi from Queensland University of Technology or QUT QUT is the project lead for this Australian Digital Observatory and our partners are the University of Melbourne University of New South Wales Queensland Cyber Infrastructure Foundation and Google Cloud I would like to thank ARDC for co-investing in this project Next slide please, Gary Okay, so in recent years researchers from both social and natural sciences have started to incorporate dynamic digital data into their research we define dynamic digital data as continuously streamed data from technical platforms such as social media platforms like Twitter and Facebook and gaming platforms such as stream or other continuously updated data over the worldwide web There are many challenges facing researchers who want to use this type of data. For instance data collection can be costly which is a big barrier for researchers with limited funds and those researchers who are less technically savvy face issues ranging from data preprocessing storage and extraction of the relevant dataset and even when they have their data they still face the issue of analyzing this type of data they may need support in the selection and the execution of appropriate data analytics tools and further challenges involve data governance which covers the ethics of using and sharing this data and finally we have researchers doing ad hoc collections and with this type of collections there is the issue of data quality and replicability of research to determine the demand for the type of service that we plan to offer in this platform we sought an expression of interest from researchers from various universities across Australia and we receive an overwhelming support for the establishment of this type of platform service as you can see from the user base list Next slide please Gary So to solve this challenge as mentioned our project aims to establish Australian digital observatory which will be a national platform providing data bank and data services to researchers seeking to work with dynamic digital data so the benefits gained from establishing an Australia wide platform are reasonable costs and availability of the datasets technical and analytical data services to researchers that's enabling them to focus on their research and incorporating dynamic digital data can provide new methods and insights that's enabling new forms of research there is also cost effectiveness that can be achieved by pooling resources across various universities and finally achieving a critical concentration of skills and expertise will enable us to establish a community of practice so these benefits will create impact by transforming and accelerating research project that use and analyze dynamic digital datasets another impact would be the creation of an interdisciplinary team of researchers and support staff at key locations in Australia who can provide transferable skill sets to support innovative projects across many research locations Next slide please So this is not the technical but rather a functional architecture of the Australian digital observatory as you can see there will be four components the data sources where we will be ingesting the dynamic dynamic digital data from the digital observatory platform that will be the back end technical implementation of the functions described there which then enables the front end portion consisting of the data bank and data services which we will then offer to the research communities so the data will be collected from various data sources as shown depending on the demand from the researchers data can be collected via APIs and web archiving again depending on the relevant terms of service we envision three distributed data nodes in Queensland New South Wales and Victoria accordingly the data bank will be distributed data banks at each of these nodes but we will operate with the harmonized data governance which will provide oversight on ethics and data sharing protocols so the rest the researchers can avail of the data services such as data engineering through to data analytics depending on their needs as interface to researchers there will be a formal process of project onboarding as well as designated access protocols of course training and support will also be provided to researchers and finally we will be designing and implementing a business model to ensure sustainability of platform and the current ARDC funding cycle thank you thank you very much Marisa and finally we have Tom Johnston how is this any better am I coming through or not yes I'm not actually seeing any share I guess you're restarting the yep to that that comes up on my screen no problem sorry I was muted that's why you couldn't hear me saying there you go these are the latest slides no problem thanks a lot I should mention to everybody this is my fault for getting the slides done so late so it certainly not carries anyway so I won't waste any time I'll get started just talking I'm Tom Johnston I'm a professor of cognitive neuroscience at Swinburne University and at Swinburne we're also a node of the national imaging facility and our focus at Swinburne really is on MEG and EEG data so that's magnetoencephalography data and electroencephalography data and related techniques using electrophysiology and the platform that we've established is really addressing the need across the electrophysiology EEG and MEG research communities for an advanced analysis platform or solution because currently there simply isn't one unlike many other domains of research now partly this probably reflects a bit of the history of EEG research which is where really things started the research groups themselves are very diverse they span industry and clinical settings within medical field outside of it there are researchers you know spanning from artificial intelligence engineering and looking at robotics and brain computer interfaces to many many researchers in fields in psychology such as social psychology, sports psychology and the challenge there of course is that their backgrounds or academic backgrounds their level of technical expertise and experience is vastly different many many of the researchers come from relatively small labs spread out across the country and they have limited access and experience with Linux systems with supercomputers and most of them most especially from the smaller psychology side of fields are locked into proprietary software that was made available provided with the hardware they used to acquire the data and whilst that worked okay in the early days of course that sort of proprietary software hasn't moved with the times it doesn't often encompass the most advanced techniques that are available for analysis and it certainly doesn't allow for very much interchange of data sharing of data, sharing of analysis pipelines and reproducibility is pretty awful because once somebody's software license goes out of date or the software they have a license for only runs on a Windows 95 computer and not even beyond that then there's absolutely no hope to actually reproduce the sorts of analysis that they've done and that's really the existing state in many many places really in stark contrast to that there is a lot of open source software that wasn't available 10 years ago that is now widely available for analysis of these types of data and it's rapidly expanding it's largely based on Python and various Python toolboxes it increasingly focuses on combining EEG, MEG, electrophysiology data with other types of data such as imaging data from MRI also peripheral measures of physiology that are commonly acquired and so really the challenge here is trying to address this need for a powerful open analysis platform that allows for reproducible and shareable analyses but one that is easy and accessible and portable that can be used by people in these small labs with very few technical resources if you like okay our approach really simplicity is underlying everything here everything has to be made simple for people to use we want a portable working environment that can be installed on somebody's own PC on a workstation in their lab on a virtual machine in the clouds and a Nectar virtual machine but also can run on a on a supercomputer giving them access or ability to scale their analyses from small data sets up to large data sets from simple analyses up to more complex ones we want to containerize the open source software that is available and we want to provide automated container building and sharing for new analytic pipeline so that when people develop a pipeline they don't have to worry about the technicalities of building containers using Docker or Singularity we want something that is far simpler than that more than anything else we want something they can install with basically one command on their computer and then they just launch into it and make it work and through this we hoping that we can foster the creation of far more findable structured data in EEG and MEG reproducible analysis pipelines which really lag behind for example neuroimaging and create a platform that is accessible by researchers from a wide, wide diversity of settings including regional universities and small labs with less technical staff we built in interoperability with other analysis platforms such as the Australian imaging service, the characterization virtual library, also Brain Life which is an effort that's been supported by the NIH in the United States based in Indiana and through this we hope that the analysis can be scaled up to address some major challenges because the data itself is extremely valuable in fields such as epilepsy stroke traumatic brain injury and dementia this is a little bit of the architectural diagram and you can see here what we're trying to do is cater for people who either operate the columns basically represent different types of analysis different types of analysis approaches so we have interactive processing in the middle column there where there's a requirement for low latency visualization of fairly complex data the data sets however might be quite small and this is the sort of thing that people can typically do on their own PC or on their own workstation in the lab but we also want to provide the ability to provide to have interactive processing and more computationally intensive small and large data sets and here we might need things such as GPU enabled virtual machines we also might we also want to integrate what we're doing with the characterization virtual lab which can sit on top of our performance computer system giving them all the warmth that that provides on the other side we also want to provide them an easy transition in the batch processing for computationally intensive and repetitive data set analysis and here's where we are integrating with the Australian imaging service but also BrainLife.io which is this US based platform and the vertical screen you see down the middle gives you some idea of our approach here and it starts off with a very lightweight Linux desktop which can be pretty much installed anywhere as I said with one command that's based upon an open platform architecture called NeuroDesk which has been developed for brain imaging and we're expanding into this to this domain of analysis and then we want an easy containerization system where people can check containers or download containers easily from the registry and know that those are going to function because they're running on the same basic platform as everybody else and with that I've swept through this very very quickly but hopefully that gives you a picture for what we're doing. You can see on the standards on the right there some of the standards we're doing the bids especially the brain imaging data set format has now a data structure format has been extended to electrophysiology, psychophysiology, MEG and EEG and that is built into our platform by default so all the pipelines that we're going to provide in containerized form will be compliant with bids which we think will be reproducible and open and accessible with the platform. Thank you very much. Thank you very much Tom and thank you to everyone who's presented today and thanks to all of you who have come along to hear about these fantastic platforms some really exciting projects here