 everybody. Welcome back to another OpenShift Commons briefing. This time we have some of my very favorite Canadians on on the line to talk about the work that's being done by a group of volunteer data scientists in Ontario around COVID and driving some data transparency and some really wonderful data analytics to help drive and help fight the battle with COVID-19 up here in Canada. So I'm going to let Faribod and Guillaume introduce themselves and we'll have live Q&A at the end. If you're streaming, if you're watching this via any of the streaming outlets, we will aggregate those questions and post them to the speakers at the end of this. So without any further ado Faribod and Guillaume, please take it away. Amazing. Thanks Diane. I'll introduce myself first. Hi everyone. My name is Faribod Ohasani and I'm one of the website development leads for the How's My Flattening project and I'll let Guillaume introduce himself. Hi everyone. I'm Guillaume Moutier. I'm a senior principal technical evangelist at Red Hat in the cloud storage and data services business unit and I'm working mostly on data science platforms and data engineering and everything that relates to data. Back to you Faribod. Amazing. So how we're going to do this is I'll have a quick slide show to tell you guys about the work we've been doing and I'm hoping that Guillaume will take over from there and have a couple more slides and then a demo of everything. Without further ado let's talk about How's My Flattening. So to give you some context I mean we're here in Ontario but I'm sure depending on where everyone's from similar kind of context this all started back in March with a whole pandemic situation on the rise. You know we're looking across to Europe and seeing you know all the issues they were having and back here nothing had changed you know some of our some of the team is physicians in the hospital. We got kind of quite a big very team that I'll talk about but you know things were operating as normal and there was this big concern looking into how things are going like how are we actually going to get action. So let's go to the problem statement. So we knew COVID was coming right we could see that from the European example and the normal state of things here in Ontario and it may be the case in I imagine many kind of healthcare state of things is that there's lots of data it's not in one place right so some comes in PDFs the government has started reporting some stats on the website it's a frozen you know you get out your Ontario get out your spreadsheets start writing this down every day if you want to get a sense of what's happening. Any analysis that happens in healthcare tends to be you know one-off analysis speed is a huge issue where things take weeks if not months and quite honestly we've given what was happening in Europe we were scared of what was going to happen here if we can do something about it so what happened is we put together a team you know like Diane mentioned it's largely volunteer driven and kind of a community approach of okay let's see what we can do emerging technology to help and contribute in some way. So what we did is we actually got together we're quite a diverse set of people so it's a group of right now over a hundred you know people with different backgrounds engineers computer scientists with clinicians we got pedemiologists and you know we're all together to collect data analyze it and get available to both the public and to do analysis to inform decision-making to be able to better guide the response to COVID-19. So some of the problems that we're solving before from the previous slide is around real-time database so you know having data pipeline proper data appliance not someone just an Excel doing proper live visualizations that are connected to a database doing this all in a kind of agile manner that doesn't take months because we didn't have months back in back in March and then doing this all under a scientific process where you know we we do it in a peer-reviewed kind of way where there is experts in epidemiology medicine and data science who sign off on everything we do so we're not just some guy on the internet you can you can trust in also through produce. So let's talk about what we actually did part of the challenge I mentioned is the data wasn't in one place so the first thing we did is we put it all in one place right so we're gathering data from a lots of different sources so both looking at Ontario specific data Canadian data and then also international data and then that data is of course kind of in different categories so it covers everything from your typical case data to interesting more unique I would say around mobility some economics data that we're adding so the goal is have one site where researchers and others who want to do analysis have to use the data and find it so you don't have to go over all over the internet and it's all vetted so it's not we would trust the data we put there so you can trust the process there. So what came from that data is of course the analysis bit so one way we present this analysis is we have 37 regional dashboards so one is looking at you know all the key all the KPIs from Ontario and then in Ontario we have regional health authorities which are called public health units so we need a dashboard for each of the public health units right because before each public health unit was trying to make their own dashboard they all looked different and if you lived in Ontario you had to figure out where to go this is one research you come and you find the data that's relevant to you. What we do on top of that outside of just the dashboard is kind of more deep dive analysis as well so we have 14 plus analysis on the website I'm going back to how we solve for it these are not static images these are actual live interactive visualizations that are rerun connected to a database I'll talk about kind of how it works in a bit but that's kind of how we're different is not doing this is not an Excel shop we're doing things that are you know the low to medium technical sophistication if you want. This is just to highlight one of our analysis you know this is kind of a new thing that we did that we're looking at the point is to be able to monitor different things that are happening in the pandemic so one thing that we recently caught is that there has been a shift in trend in the age distribution of cases right so this would not be possible if someone was running this analysis every day because we have 15 16 different things we're doing and we're not we don't have the resource system and that so it's all automated to the part that it can be automated that's kind of and honest if I imagine most of the people on this color are fairly technical people this is not a technical problem that we're solving just unheard of in the face that we're doing it for some reason so it's just catching up with with the industry if you will. So I'll talk a bit about the how we use two different kind of visualization software for our website the analysis page is mainly in Tableau the reason for that was quite honestly we had people who were Tableau people like database folks and we didn't want that burden of coding to be on the engineering team if you will so that part is mainly done in Tableau how it actually works we had to go through lots of hoops to make Tableau update live basically the only way you can do it in a reasonable manner for it to also be public is to use Google Sheets so right now what happens is our data from the database the backend is updating Google Sheets every day at different intervals and then Tableau is connected to that so Tableau does a refresh from Google Sheets there's believe it or not no direct way of collecting Tableau to a database of any sorts or an API or anything so Google Sheets is your way if you want to have a public thing but it works and that's kind of how we're operating the other bits is plotly the reason for this it's the standard language and you know for data science folks it's open source so there's no licensing cost so that's kind of the the reasoning for it and it's quite simple and everyone knows how to use it so our dashboarding is mainly in plotly whereas the analysis page the VIST team makes in Tableau and of course I'll talk about the open shift in a second so our architecture is now thanks to the team over at OpenShift has been containerized and moved over so you know we have your typical standard pods running we have different staging and production and development environments and then of course we have cron jobs that run for the data pipelines that go to all the different data sources that we have an updated database which as I mentioned then updates to visualizations whether it's through plotly which directly interacts with the database or through the Google Sheets work around the other how that I wanted to mention again this is thanks to the amazing engineers from Red Hat Shoutout to Olu and Steve who made this happen as we have we moved actually our entire code base we started in GitHub and we moved it to GitLab and now we have GitLab runners that in straight with OpenShift so you know when there's this code pushed we have a fully connected CI CD pipeline that builds the image and OpenShift applies a template to build the image and to get all the resources and you know do all the fun Kubernetes stuff to make sure it gets deployed so it's literally you've got a new branch it gets deployed to a development environment you can play around with it you merge it into staging staging gets updated all the good stuff there so that's that's been quite amazing I know you I'm talking more about the open data stuff the how behind that but it's been amazing to move to OpenShift and take advantage of all the great benefits of Kubernetes and OpenShift. I'll talk a bit about our team I mentioned we got a diverse group again it's largely volunteer driven it's I would say this is kind of very new for me as well in the sense that we're trying to do an open source open community bit with a lot of folks that are not software folks so we have a lot of clinicians we have happy kind of researchers a lot of MB grad students and we're trying to kind of onboard them into this whole GitLab open source approach and teach them about you know a lot of the things that are normal I guess in the open source community you could say but it's been quite amazing as well but I mean it comes with the challenges right so some of the challenges actually that we've been experiencing Ontario has quite disordered data so when you build all these automated pipelines and cron jobs that updates the database things break quite often and some things we predict and some things we can't so we have to kind of build up on top of it so this is a an example you know you're looking at the accurate episode dates column in the government releases and all they put like things that don't make sense right so there's things like 2002 like obviously no one got COVID in 2002 or it's highly unlikely most likely 2020 or accurate episode data 12 AM right like stuff like that which are really basic stuff that can that if we don't get coming it breaks things and then our visitor not updated and you know all the locations of that so we've been kind of iterating and building in like basic additional logic to kind of filter some of this stuff out again not a technical challenge more just a maturity challenge we've been building the bicycle and riding it a few well so we're we're growing and there's some growing pains for her I'll briefly talk about what's next so we got two really exciting things that I'm personally excited about one is we're you know I mentioned we have three seven plus dashboards there's a lot of data that we're working on a scorecard that translates that data for the public if you will so simple kind of red light green light yellow lights picking five key metrics we think are really really important to focus on and then seeing an easy kind of breaks out of it so this is going to be our new homepage so this is what we're actively working on and then the other bit which gave him is going to go into open data hub which is going to be amazing amazing amazing I know the whole team is excited about it which is where we're going to as we get more secure data that not everyone can have access to when there's contractual obligations with it this will be our solution where we can figure out the access management and how different people access different groups of files and you know building in all our pipelines it's something that is either used because as I mentioned a lot of our users are not extremely technical folks so this kind of removes a lot of the complexity which is we're really really excited about and I'm excited to talk more about it as well I'll briefly talk to the response as I mentioned this is largely volunteer driven we've been getting picked up in the news a few times it's been you know some validation for those of you who are not mentioned are probably not familiar with any of these articles the star is kind of the Toronto star is a big deal here if you will it's the Arocoblon of the New York Times so we got picked up by them and there's been some university specific publications but honestly the big thing for us has been this slide I think this is what gives it drives the whole community is on our website we have a chat bot and through the chat what anyone can kind of tell us how they feel about about the work we're doing there's like a five-star rating review and then people can submit specific feedback and I'm just leaving it on the screen here for a couple seconds while I'm talking I hope you guys are able to just read through a few of them honestly this has been what's driving the passion of the volunteers here is the positive feedback that we've gone from the public right in terms of making it a more accessible allowing people to know what's going on people feeling like they need access to transparent data that may be lacking in a way that's just providing them with a resource right that's that's all about the almost at the end of the presentation here the last two slides here are around call to action we really need more help so if you're on this call if you're watching this there's today or sometime in the future please go to our GitLab the links at the bottom of every slide here so gitlab.com how's my flattening well please I'm sorry I just got a call so please do go to our GitLab and get involved and you can just find it at gitlab.com how's my flattening we can use any help that we can get and everyone can contribute we have lots of different things you can get to contribute to so please do get involved there I usually have a slide for I mentioned most of our group is non-technical so this is what I usually show the children what the hell GitLab is and what I'm talking about but I imagine most are familiar about the report so you can figure out what working group you'd like to get involved with that's it for me and some of you we're you know we're a small organization largely volunteer driven our goal is to show the value of open data and analytics and you know an open community open source approach to move quickly to generate health insights and putting them all in one place to help inform both decision makers and the public and it's been a meal I'll do a bit of a transition here I think our we do and what our vision for how analytics should be done it quite nicely aligned with red hats and the company and what their mission is about so it's been amazing being able to collaborate with some of the folks on this call you know Kiyom and and folks from red hats Diane included so with that handed off to Kiyom to talk about the open data which is quite quite exciting thanks a lot for about and as you all have seen the effort that is made by this this community of researchers of data scientists of students and everyone that is involved in the community is is really really important and that's exactly why at red hats and in the Canadian red hat community we wanted to help we wanted to do or share of helping this community build and grow so what I will talk to you about now is that this is the next step I would say where where we can help this community with the data science platform that that they want to build because as you've seen as the number of people who are helping on this on this website and what's more important on the data analysis and and all the stuff going around as this team is growing they have more needs to be able to to to have working environments especially especially shared environments where it will be easy for them to collaborate to exchange ideas to exchange data or other resources and as far bond mentioned they they also have this this need of managing different access levels because they they have some more restricted datasets of course you know some may contain a personal information or sensitive information so those are the requirements that we talked about with the team and we we discussed how we could help them get to this this next level in terms of in terms of platform so our answer to that was leveraging open data hub open data hub for those who don't know is is a project that's read at it's it's not a product it's a it's a project that showcases very well how you can deploy data science platform on top of OpenShift so it's kind of a meta operator that allows you to very easily deploy some of the commonly used data science tools out there so you know Jupyter of notebooks or our cube flow pipelines or seldom for serving models so it this is all the the the staff that we can deploy right now through open data hub of course it's not everything that was that is that is deployed right now for how's my flattening it's mostly at the moment Jupyter notebooks and shared data databases but it's as a summary it's about getting from there you know each data scientist having its own in install install installation of Jupyter and the libraries that are needed and the datasets and stuff like that so getting from this world to this one where on top of OpenShift you are able to run Jupyter hub which will gives you access to which will give you access to to notebooks to easy to use environments and on top of that shared environments with very efficient use of resources and and all the management that goes with it okay what I used also in this setup is in SAF it's Rook SAF that is deploying in this in this installation because it allows us to to have different types of storage that we will be able to use in our environments we needed block storage to create pvs for the users so that means each user is able to have its own environment to store its own it's on data or files but at the same time we needed files we needed shared data so as Rook SAF and OCS are providing shared file storage with through SFFS does allows us to have RWX pvs that can be mounted inside all the pods and we'll we'll we'll show you in a few minutes how it works so that's the environment I will close this and show you how it works so I have created in in the project I have created different persistent volume claims for shared data with different access levels private public secret I have another one here for public notebooks and all those pvcs are using this storage glass a Rook Rook SAF which allows us to have these three no it's not not selected the right one sorry for example for these shared data private pvcs it allows us to to read to use this read write many access mode so that means that several containers of all parts will be able to mount this this share simultaneously okay what is also interesting is that as you see that's the new way to deploy open data hub we are using the kubeflow definition files and that's what is interesting is in this way it's very easy to deploy overlays and in this case I have created this overlay which I called shared pvc and another layer is basically saying oh in this specific configuration of the deployment you will do a replace with this value and here I have the code in a more readable highlighting so this code basically just say whenever you want to spawn a new pod you will look into the user lists private users and secret users if there are n means they will also have read write access for on the shared folders and you would mount you would happen some configuration where you are able to mount the persistent volume claim for example here the shared data public directly inside the pod okay that's the technical part I do a special session on this at some point but let us see how it works so let's say I'm a data scientist and part of the house my flattening initiative I will have access to to my environment through this launch launch board and I will just launch my Jupiter environment it will require me to sign in here I have some credentials that I can use and in this example I will launch a notebook which comes with the sci-fi libraries the bite and sci-fi libraries and also comes with R the R engine because that was one of the requirement from the from the team to be able also able to use our notebooks so again this is a customized version that I that that I created because that's that's quite easy to do from open data hub you can create your own images and here I can select my deployment size so that means that the size of the pod I will I will create I could have some more environment variables and then I will just put it it will take a few seconds and meanwhile we can see what's happening under the curtain and here you can see I have here Jupiter up notebook with my login name and I have the container creating okay and I guess it's almost running yes it has pulled the image and then I am in my environment here you can see there are some stuff that I created myself I have this folder here or demos and our packages that I installed so let's say here I open this notebook so that's my environment where I'm able to work and that's the simulation I have run based on a notebook that the team provided to me but on top of this personal environment I would say I have access now to a shared data folder so this shared data folder is the same one that everyone who's using the environment will be able to see and here I as I have those special access I have access to private and secret folders but normally you know standard people part of the initiative will only see the public folders and that's where the public folder that's where they will find you know some datasets that they are able directly to use with their notebooks so that's a very neat way to provide people with shared data that they can directly use inside their environments and we have also created shared notebooks environment and for about yesterday created this welcome that book okay just to explain how which works and to give some some demonstration on those shared data that that are accessible by everyone okay so that's about it for this presentation it's again it's a standard open data deployment you know first base with Jupyter Hub maybe we will deploy other stuff in the coming weeks but that's a very interesting implementation especially with those shared data and shared notebooks that are very easily done with the RWX capacity that we have in in RUXA for OpenShift container storage and that's all for me all right so as a Canadian and I think both Farabhad and Guillaume as well we're we're all very excited about this project happening and and how it's unfolded very very rapidly this there's one question and it gives me a good segue to how to get involved and I'm going to share my screen for a second just so that I can put up the the GitLab page here on how is my flattening and you can tell me if you can see if you can see my screen that would be great so the one question was that's come in from twitched just while we were while you were all were talking was about why did we decide to use GitLab or why did why did Farabhad maybe you have some insights into this but well while we're talking about it if you are looking here the URL is here for for how is my flattening and how to get involved and and how to get started in using some of the projects here and Farabhad do you have some insights into and I have tons but well why you decided to use GitLab and and how so so when we started the project like I mentioned we have them you have a bit of a different kind of open source community in this that is not all kind of your your technical folks that are usually involved with open source community right like it's not the typical environments when we started we were all over the place in terms of project management so we had code sitting in in GitHub we had lots of Google docs that we had in terms of documentation that would send people we had we use Trello for project management and keeping track of tasks we're all over the place in that sense and when we started working with Red Hat who were the Red Hat engineers that I mentioned Olu and Steve who were helping us move it move over to open shift it kind of became the perfect opportunity for us to switch to GitLab in the sense that we wanted to well first of all we wanted to use GitLab runners to do the open shift integration I know there's different ways of doing it but that was one way the other bit of it was that the nice thing about GitLab other than it just an open source project and you can deploy it me that's just a whole other thing but it does everything in one place for us which is nice right so we have our I know some of those features are also available in GitHub it was just almost at that point a preference thing in terms of we got the wikis here we can use issue boards to keep track of everything and then all this ICD parts that come with GitLab which were nice so that was part of it it was honestly just a it wasn't a because of how small of an organization we are we don't do proper kind of when picking the library like let's do an assessment of what's out there in the field let's come up with criteria and it's all like going back to the whole building the bicycle as we're riding it thing it was a we wanted to do it seem like a good way of doing everything in one place so that was kind of the big decision and the reason for the move yeah I think like I've been involved I don't know maybe a month I don't even know this it's been 60 days the whole project so I don't know when I first twig to what you guys were doing and when somebody reached out for some help on doing this but the other the other piece of the puzzle is that the this group of folks weren't really savvy about open-source practices or you know you had you you were savvy about open data and data privacy and all of those things you had a lot of experience some of the data scientists and analysts were really up on all of that that side of the coin but creating a community around this and like the different approaches to project management and how to make things open and one of like using GitLab or GitHub or anything and putting all of this out in the open was part of the learning curve too I think and GitLab gave us a very nice way to do that publicly and to start you know writing the on-boarding documentation as a group and collaborating on that and getting that up and running so that was that was an interesting thing to watch the phenomena of everybody piling on with you know thumbs-ups and here's a link to another data source and all of this like the coordination that had to get assembled and take place and use in figuring out a way to make that all public and secure at the same time so some of the pieces of the project like Tableau are not open-source software and needed licenses and stuff so there are a lot of pieces along that line logistics wise that we had to figure out as well as what Guillaume has done with some of the Red Hat team getting it up and running in a scalable way so yeah they had to build on top of that that was actually really point that I actually forgot to mention the presentation so we have a very small kind of technical team and quite honestly none of us had any experience with open source like none absolutely none so what so that's another kind of I mentioned Olu and Steve and obviously Guillaume who helped us move some of the technical infrastructure and set it up and could shift but other really big part like that I was just brought up is around the whole community aspect right like that's what I think makes us different than just a website on the internet where it's just one guy doing some analysis and we didn't know how to do that honestly right like we operated on Slack at first and we ran into this issue as more people came on board that the communication was getting very siloed right so a lot of conversations were happening in private tasks between folks and to catch someone up like literally we would screenshot conversations with other people just so you don't have to rephrase it it's just like read the conversation I just had with other person we had a whole challenge that that helped that that's another aspect that red had helped us with day and was one of the leads along this damn the name I'm thinking on here Jane Brian profit right yes and I'm profit who kind of helped us figure out how do we even like open source 101 how do we do this so that was another reason why we wanted to move to a lot so now it's I guess we're still building on top of it so it's kind of coming more more of a community thing but like anyone can go right now on our on our get labs see all the issues see all the discussions now it's not on private Slack channels all the meetings you know when it happens that's all documented there's wikis for things now you know like how do we how does an analysis go up on the website before it used to be I talked to three other people and you know how it happened but if someone just is joining the community have no idea but now we publicize that so it's not a black box of decisions be me and everyone gets transparency into it which I think is part of what we're trying to do which is another piece that I can and sorry which Brian have been the other interesting thing was once we were digging into you know was how experienced some of the clinicians and basically their clinician data scientists and even your role farabod is you know you've you and like a dr. Ben fine and a whole bunch of other folks have this deep deep knowledge of the data and the data sources and how to do the analysis and then there was you know so they're sort of these data scientists doctors or engineers that were volunteering their time and how to take advantage of that and then they're all these other people who I kind of coined a word data scouts people who were finding data sources and then you know just posting them in slack which you know has all kinds of issues and stuff so if you could talk a little bit about how like you have all these disparate data sources how you're aggregating them and bringing them together a little bit and make that talk a little bit about where how you're combining all those data sources into decent analysis for sure so how the website is structured you'll see on the right hand side of that get lab there there's four working groups if you will there is the website development group so that's the kind of more technical folks there is the data collection group and that's some of the data scouting group that's some of whom Diane is referring to and that group is honestly been amazing so we've done a couple of them projects and I'll talk about them one is you know there's 60 some medical students who are just sponsoring their time and other grad students and quite literally just volunteers from the community who are collecting a brand new data side that does not exist around non-pharmaceutical intervention so what I mean by that is you know the government announces different policies in response to COVID so order was shut down schools were closed there's no data source like if you want to find out when these things happened you would struggle like you have to basically Google it and read news articles and put date time right so what they're doing is they're actively monitoring across the country when all these regional down to regional level right the city A versus city B these announcements are being made and there's a whole kind of review process around it I mean they built this whole group right just kind of collecting these this data set around non-pharmaceutical so that's been amazing maybe I'll speak a bit broadly outside of what we do just outside of the NPI which is you know one whole project like Diane alluded to there is lots of data sources that they're kind of non-traditional on traditional so we have to come up with what I say creative way of dealing with this so some is no I mentioned the beginning the government's very simple posts they have one page where every day there is an HTML table that they update with new cases so we have like web scrapers that go every day look through cron jobs throughout different points of the day and read off that table and capture it the timestamp there's other data sources that come to us in more creative ways there's some reports that are PDF reports so again another convert this PDF into actually a usable format that people can just get a time stamp like a time trend and a proper kind of actionable again also an Excel CSV right like a CSV file and keeping historical trends of that so there's been a little mini pipelines built for each of these words a this is our report PDF to database pipeline this is our convert the government website to database pipeline there is the there's also a few other kind of open data groups that we leverage there is this group called Canada open working data I'm not butchering that name led by Isha Berry who's a and a few other others out of U of T and that's another data source which we can use their CSV daily and update our databases so there's basically lots of data pipeline sitting and running throughout the day now on a friendship that just go scour the internet if you will from all these different sources and update the database which in turn updates that Google sheets work around that I told everyone about that then updates the visualizations right so if you go on our website you'll see there's visualizations on capacity there is trends in cases there is RT right so those are all coming from those pipelines are populating the database and then again daily con jobs run that update either the Google sheets which is our meeting or if it's partly it's direct to direct kind of free from the database that runs an analysis and updates it so that's high level how everything works how it's connected to the data yeah I think and Guillaume and the team you know they've done you know and this is not just this is me as a Canadian saying thank you to Guillaume and the other folks for for setting that all up and making that the automation happen to clean that data up and collect that data it's been pretty pretty amazing so I'm wondering if I may then I guess you know what far but expand is it's the perfect demonstration of the struggling that data scientists or researchers are facing with the data you know and that's why I like this project because what we help do is exactly what technology and open source can bring to the research world because you know we are for us it's pretty standard you know to have cron jobs running into containers and connecting to the data moving here and there that's the kind of tools that research that researchers don't even know about okay so here bringing these technology skillsets that we were able to do directly to this research is again the perfect demonstration on how how I would say IT tools you know the standard DevOps and practices and the technology that we've been using for the past few years have definitely the place in modern research you know especially when it's it's so tied to data so I guess we have a perfect example and I'm pretty sure that in the coming months and years we'll see more and more of those techniques applied directly in other research projects because it helps a lot yeah so we're kind of learning from each other and hoping to move forward in the battle against COVID it's pretty amazing and a lot of the lessons that that we're learning on you know and that we've done I think in the open data hub dot IO project as well sort of set us up nicely to be able to help farabod and the team at how's my flattening as well so there's really you know if any good outcome comes from COVID and all of the things here is the ability to respond quickly using DevOps and using open source and using these models and the pre-existing stuff to help with pandemics and other emergencies like this but it's also what's been really interesting to me is there had been prior to all the COVID stuff a lot of conversations that I've had in the the community with data scientists about how much and data scientists they love their Jupiter notebooks they love that they don't love dealing with the infrastructure and so the more we can just enable them to have like Jupiter hub and their notebooks and their pipelines all hooked up the better and the last thing we want them doing is and we'll say wasting but you know wasting their time you know setting up CI CD pipelines you know deploying Kubernetes clusters and all of that so the more we can automate that and make that I hate the word turnkey but I'm going to use it like just a solution that we can drop in and integrate in for whatever the training models the training the pipeline is for data the better off we all will be so I'm wondering and in fact you know it's a little bit worse than that because you know without things like open data hub they won't even you know do themselves the CI CDs and the pipelines they will do everything manually you know as far about described you know at the beginning just grabbing the Google sheets and then transforming them manually and stuff like that so it's not even going to a very integrated step it's just even the first step in using modern tools that is that is a challenge for for most people so if we can bring them directly today I would say to the to the top with this kind of automatic tooling well that's for the best I guess yeah I think we saw that the same thing there was another COVID project in Finland called Fever map Fever map net and there was the good thing about it was it was came out of a hackathon and it was all most of it was developers and they were applying it and how quickly they were able to leverage the technology to move forward with their fever mapping project so I think that's really one of the keys here is where we the technology folks who all want to contribute and move this forward and fight the battle against COVID can actually contribute to this project so if you're out there and listening and want to get involved again come back to the house my flattening dot ca get lab page or their home page where there are links to that and and jump in I wanted to ask Faribod as well and Guillaume as to where do you see this going after we you know the life for this project how's my flattening and the learnings going after this project once we're beyond this so you know for the wider impact at the province and the province of Ontario and I'm over and just put my I'm in British Columbia over in BC so a shout out to Dr. Bonnie Henry and keeping us safe out here in BC so where do you think this is all going to and this learnings yet you done Faribod that's a good question and I I guess it kind of goes back to my mission right like going back to what we want to do is we want to be able to show the value of transparent data community driven analytics right be able to you know what we're really pushing for is to change the the norm of how things are done so going back to how things are done right now in healthcare analytics it's very siloed so there's different institutions that hold different pieces and it's extremely difficult to share data because every institution is responsible for their data and if something goes wrong in terms of sharing the data then it's on them right so that's the issue so data sharing is quite hard in Ontario and the people who have access to the data you have to be you have to jump through a ridiculous number of hoops to get access to very very basic data so it's not very open and that's honestly what we're trying to change that's what that's part of our mission here so you'll see even if that's that's part of what open data hub is I'm hoping will help us all for and let me talk about it a bit so like you mentioned there is three buckets of access right now there is the super secret bucket the sensitive bucket and then the public bucket right so default everyone has access to the public data right so I think we we should be able to put any data that's not at all compromising patient information on that bucket but then there is a sensitive bit so for example there may be contractual obligations or terms of use it's important that only certain people have access to but that doesn't mean that there should be a process set up for anyone to get access to it as long as they follow the terms of use right so that's I guess that that's a transparency that we're pushing for that get access to healthcare data should not be there should not be a prerequisite to be a member of a private club to be a member of this to get access to healthcare data right there is I think a part of what we've been trying to show is a lot of our analysis on our site is done by what the term we like to call citizen scientists right so non-typical folks would you probably honestly not have access to the data otherwise so I think that's a big thing we're trying to push for outside of code outside of the pandemic outside of the pandemic bubble if you will outside of the lifetime of this project is getting data more accessible to folks making it more transparent and making you know making it more open community driven so people can get access to these things and you don't have to be a member of a specific club or a specific group to get access to it how open or how open data hub is enabling that is because now we have this kind of secured access bit that we can use then you can now have pipelines from the different buckets into each other right so there may be a data source that only a few people can get access to but that doesn't mean aggregate data can't be accessed by more people right so now that we have this infrastructure we can use open data hub to give people access to aggregated data that longer has that those issues around privacy around it but it still makes the data transparent it still makes it useful because honestly I think the thing that we're all trying to get to here is to get access to aggregated data kind of going back to the 820 bits the thing that only makes a difference is usually aggregated data we don't need patient information for 99% of the things right so that aggregated data is the challenge right and that's what we're hoping to solve for with open data hub yeah so I know I was in Slack the other day and having a conversation because with all the the work going on around diversity and race and trying to track COVID based on you know ethnicity and diversity and all that one of the issues I think that was flagged for me was the lack of metadata even in the aggregated data that was it was pretty lacking at least for it from from what I could see and what we were chatting about in the slack around how was how did he put it is that the way that we let figured out whether it was ethnic or not was by the location and it was as if an entire city was one ethnicity and was like no no no no we need to break it down a little bit better so I think there's a lot of room for improvement in the also the cross mapping of different data sources so when we have stats stats Canada data on the population on that and then we have the health and there's a whole lot of work that still needs to be done in that space as well so I think there's like we're gonna have a lot we have a lot of work ahead of us but what the I think the thing that you guys have shown the light on is how these citizen scientists can help bring that to the front what it's very encouraging has been to see the Ontario province give thumbs-ups and you know thank yous and respond positively as opposed to like trying to shut you down or anything like that to the work you're doing and you know a lot of that I think I shout out to dr. Ben fine in his fine diplomacy skills and his influence with the different groups he's done some amazing work getting getting us a little bit further in so I'm I'm gonna it's gonna be interesting to see how all of these learnings and the collaboration between all of these different parties because they're I mean there's U of T people there's folks from at least a dozen other companies and partnerships that you guys have you know well beyond the Tableau and Plotley and GitLab stuff there's at least 10 or 20 different companies that are putting their resources and their time and effort into this project beyond the the ton the army of students who have been coming in I have seen so many U of T University of Toronto if you're not Canadian U of T folks on this that it's just amazing to see and you know hopefully this inspires a whole another generation of people to collaborate in the open and also to help in efforts and initiatives like this this is pretty pretty huge just stuff that you're doing and it hasn't stopped yet so it's if you're out there you're listening to this so you're watching it later on and you can see that Faribod I think is like a monster when it comes to this you're always on online all the time everything you can see on this page it's up here it was last edited by Faribod if you a couple of hours or 20 hours ago and that there's tons of folks Spencer has been doing the community project management for this Marlies there's you know there's a lot of people who are helping to make this thing happen so it's certainly not a party of one and yeah honestly I think the the next step is to onboard the Ontario government and collaborating on this too I think that'll be that'll be the fun yeah they're coming they're coming to the party and you know this is this is another another adventure in open data too so I think there's as we go through this and we see that you know the screen scraping from government sites is one thing but to actually have those data sources directly in the pipeline so you're not screen scraping would be awesome too yeah I think that's kind of goes to what you're almost saying around how technology that's almost more than standard right we're kind of introducing into this field that does not use it right so I think that will hopefully also enable parties that are sharing data with us to feel more comfortable okay I briefly talked to the you know different tiered access and using aggregation pipelines to make data more public right there should be no reason why some form of almost all data can be public it doesn't have to be you know the role-level doesn't even have to be amazing aggregates but some form of it for everyone to have access to because there will be hopefully as we're trying to show here amazing analysis and insight generated because of the fact that citizen science can get access to it yeah so I think that's the one of the lessons learned here is figuring out how to get the trust in the citizen scientists so that we enable the government to trust that you know we've got the right processes in place we have the right security and privacy and we're being compliant we're not sharing data that we shouldn't we're not accessing data that we shouldn't we're ensuring everybody's privacy is respected but also the power of the community to help find the anomalies to help find the trends to do the modeling you know we've always heard about like people looking out at data coming in from the universe and finding new stars and new planets and you know all kinds of interesting things there but this is really pragmatic this is affecting and not that that space stuff isn't either but this is really stuff that is life-changing and can really help figure out where we need to get more ICU seats and you know beds rather new ventilators where the curve is not flattening where you know it's reoccurring so the pieces here really are life-saving efforts and to be able to bring to bear all of these resources is a pretty amazing and to do it in a way that everyone trusts the outcome and I think that's the thing is that doing it in the open helps as we say in the open source a little bit of sunlight goes a long way it's kind of the open the sunlight on the trans open data is great but also it's a bit of a two-way street you have to make sure you're compliant you're respecting privacy and all of that and that's really you guys have done a very amazing job working through all that and I think that in part is because so many of the clinicians who are on here and I'm completely impressed by the roster if someone goes to the the website and sees who's weighing in on this it's it's not it there are a lot of students but it's not a student-driven effort so there's some pretty senior people in the Ontario health system working on this and doctors and clinicians who have who really have been helpful and key to making sure that we do everything up on the up and up and not break any rules or regulations so maybe I'll read the comments on that last bit I think that's that's a credibility challenge that I hope we're also solving for is because everything that we put on the site whether it's a new data source or a new analysis I signed off on by experts in the field like it's not some random person who's looking at this it's all in the open and you can see all the comments that get generated but you know our leadership you mentioned Dr. Ben Fine who's a who's a radiologist who's a clinician also engineering background amazing guy and then Dr. Laura Rizella and Tralee Vedas-Ottenoy who have different backgrounds who kind of bring the whole package if you will so Ben has that clinical background Ali has the data science analytics background and Laura has the epidemiology and scientific expertise and that kind of that that trio spends off on everything so we part of what we're trying to do is not be part of the hype if you will so we want our analysis to be actually scientifically sound so that's why there is on top of the open community bit there is a rigorous process that we go through before anything gets added to the site because we do want to make sure that the science is there and it's not just hype on the internet that's perfect so Guillaume we're almost at the end of the hour we're close to the end of the hour or we're at the end of the hour one of those things any last words about where you're thinking of what you're thinking of doing next what's on the roadmap next for open data hub and how is my flattening project what's up next I guess next what what we will have a look into is maybe look at the there are go pipelines because you know far about is doing a far more than the team they're doing a lot of things that are related to to the stream of workflows and automated workflows and treatment of data and stuff like that so I guess that's the next thing we will we will look into just to see if it fits there are their needs there is also the the kubeflow pipelines that we want to look into trying to again automate a little bit more what's being done you know going from notebook to fully automated data pipelines that's that's what we will look into in the in the coming weeks so with that I'm just gonna thank both of you very much for all of your efforts and this and the entire how's my flattening team which is getting bigger every day and hopefully we'll get a little more awareness out of doing this one and get a few more resources as well and more eyeballs on the project and continue to help the province of Ontario make their decisions based on good actual data and in real time as close as possible so thank you again for all of your efforts Farabod and Guillaume thank you for stepping up and taking this on so really appreciate it thank you thank you all right take care guys