 So, this first lecture is an introduction to cloud computing and virtual machines. It was interesting to hear everybody's background because, in a way, it's not quite, we weren't quite sure, this is the first time we're giving this workshop and we weren't quite sure what kind of students we were going to get, so it's very interesting that we have a lot of sort of very sort of computationally aware biologists is sort of how we describe the whole group as a whole and it's very, it's very interesting and I think you will get a lot of this class. This claimer up front is that I may mention a number of companies and in whatnot I have no association with any of these companies. I did realize though as I was thinking back about the slide at some of my slides I do mention the Collaboratory and this is something we'll be talking about throughout and Andrew talked about it and many of the speakers today are gonna today and tomorrow I'm gonna talk about it. I actually am a PI on the Collaboratory so I did financially get grant money from the Collaboratory but not to me just to the group. That's my email address, my Twitter handle and this is the CBW Big Data 16 is the hashtag I invented for this workshop. As Anne mentioned this is part of the Bioinformatics.ca workshop and this is a website of course you all know from having registered and founded to get this course. Our flagship workshop I would say is the Cancer Genomics Workshop so that's the the one workshop some of you have taken which is it's our longest it's a five-day workshop not the longest but it's a long five-day workshop and where we cover many of the things that are dear and close to the OICR and so Bioinformatics.ca is a pan-Canadian Bioinformatics sort of portal but it is hosted here at the OICR. So the workshops that we're planning in 2017 this is pending approval by the the committee that oversees a Bioinformatics workshop which is meeting next week will be basically the same as what we gave this year plus probably one a new workshop that's in development that so they're sort of late November early mid-November I'd say you'll probably go can go to Bioinformatics.ca and see the offering for all the workshops and so if there are workshops you haven't taken that you might be interested in or you'd like to tell your friends about other workshops please feel free to do so. For the workshops that have been given already you can have a sort of a sneak peek of how what was given and so by looking at last year's course material so we have on the portal right now we have the 2015 materials so the 2016 is not there yet but on the github which is also open it does have all the workshops and all the material that have been given so far including the lab exercises and so forth so you can have a good look at those workshops and see if that's the workshop that you might be interested in and and then you know of course of course info email address the website and there's in mailing list that you can subscribe to that announces all their upcoming workshops and anything like that so that's a thing that you're encouraged to do. So just for one second I'm gonna sort of stand in my soapbox just to advocate for or promote support and so forth all the open movement so the open source obviously many of the software packages that we're gonna be talking about are open source packages open access is with respect to all the publications so many of the publications we cite and we use in our teaching are openly available to the community open data obviously with respect to sharing of data and making that available and making science reproducible with the caveats that we're gonna talk about Mark and myself and talk a bit about with respect to human data and then the challenges that presents with respect to sharing and so forth and open courseware which is like the CBW is sharing all the course material and making it available for others to teach so it's and we're part of Bioinformatics.ca is part of Goblet which is a global alliance of training teaching and bioinformatics that's where we sort of share with the world basically and with on their portal all the material that we have and best practices for teaching and so forth so we're very much involved in sort of this open movement. So I'm gonna start this in Washington and I was at the NCBI for five years from 93 to 98 so a long time ago and during this at the NCBI I was in charge of GenBank so DNA sequence database that is has been growing over time since it was first started in the late 80s and so these are the five years I was at NCBI and so it was a very sort of active period we kept we had sort of always behind the eight ball sort of dealing with a an foundation of data and so forth and we even sort of started panicking after I left I remember still getting emails from people is oh my god you know we're never gonna make it blah blah blah so it's overwhelming and then a few years later I was invited to NIH to chair a session on the 25 anniversary of GenBank and this was the sort of the growth curve of the 25 anniversary of GenBank and that was my period and so yeah a little white sliver there at the bottom and so of course sort of the what we refer to as big data is always changing and we thought we were dealing with big data in the 90s and we thought we were dealing with big data in the 2000 and of course now it's even worse and so some of the learning objectives for this module are going to be to understand the scope of big data and human genome and dealing with those challenges and I'm gonna talk a little bit about the cloud and the concerns we have about using it and and the what we can how we can work on the cloud and how we can work with virtual machines and so forth and so my lecture is sort of going to use a lot of terminology that's going to be used by all the other faculty throughout the workshop and it's going to be really important it's sort of basic concepts but if you don't get it it's definitely time to ask questions and we'll have lots of time so like I mentioned so big data is a relative term so the first picture is a five meg hard drive being forklifted into a plane and next to the bottom picture is a five terabyte which is a million times more data or a million times more storage into a device for a fraction of the space and so it's and of course so we're talking terabytes we're talking petabytes and soon we'll be talking exabytes and so forth so it's really it's useful to understand this these terminology and the order of magnitude they represent so this is a figure so we're at 2016 here now so at the end of the graph and so the red line the top line is the the doubling time with the current projection the yellow line is the what Illumina estates and Illumina being a sequencing company and so they're they're even more conservative than than what what we've been doing so far and I think that's probably their projection of sequences generated on their machines and so there's just a fraction of the total and in Moore's law which is the sort of the law the physical law of how fast processors had their doubling speed as the processors is is below that so we're actually surpassing data is surpassing the the the capacity to process this data so that same article in in plus had a an interesting table about so comparing genomics to fit to physics with respect to the accumulation of data and the bandwidth that they have to deal with and so so we're now in the the zeta space and the so it's it's starting so i've copied in my little cheat sheet here from wikipedia to help you with zeta exa and so forth and so peda is a thousand terabytes so with terabytes you serve everybody familiar with terabyte space or magnitude right so if you get a storage of hard disk you can get a terabyte on your desktop pretty easily petabyte you start needing sort of data center and so it's a thousand terabytes and so exa will be a thousand petabytes and zeta is a million petabytes and so so now it's estimated that there's going to be enough to generate one zeta base per year so the zeta base is a sextillion right so that's 10 to the 21 versus so it's a million times petabyte so per year across the planet and of course the challenge becomes now to moving these files around and and making them available for people and so this is where cloud computing of course comes in where you don't move these files around you leave the data where it is then you yourself you move it so you move the software which is obviously a lot smaller files and that's the whole rationale behind what we're going to be doing in this workshop so what's driving this data growth of course it's the technology so on the left is the the broad uh or sorry the whitehead is now the broad in boston and for the 2001 paper which is the first the first genomes human genome was sequenced it took about 10 years to do one genome and it took about there's about 50 plus sequencers in this lab and there was uh probably another few hundred in their labs across the world and all of those together but they all did it was so so-called uh capillary or or non uh next-gen type sequencing technology so it was it was done with the old technology so the equivalent of this you know 50 plus sequencing capillaries is now basically about one aluminum machine today and some of these labs today have a hundred aluminum machines and so you can imagine how much uh they can generate and so that explains sort of the previous some of the previous graphs I showed you aluminum also has uh what they call the high seek x sequencing system which is basically 10 sequencers 10 aluminum like sequencers and you have to buy them all 10 together except there's um they they weren't well there were they sold a few but the the Canada got an award for one of these and they split it across three cities and so Montreal Toronto and Vancouver each got part of the 10 so they they have they split they can split them the ship departs different places uh the uh but these are only used for although I think that's changing now but they were only used initially for whole human genome sequencing I think now they allow other mammals they of course the machine doesn't know if it's a mammal or or a human or a chimpanzee or a mouse but I think they from their pipelines and so forth but that said so you can you can on one of these systems you can sequence 18,000 whole genomes per year which is a lot of genomes for any given center but that means it would still take you 1800 years to sequence everybody in Canada so it's still quite a bit but if later on we'll be talking about the peacock project where we're doing an analysis of 2,500 genomes whole genomes and that would have taken 1.5 months to sequence all of those with uh one uh x10 and so there's obviously and some people have multiple x10s and and some are you know and they're growing so and they of course you can distribute and so forth so um with soft with the cloud computing there's basically a new software paradigm there's a paradigm shift that's happening and that is not you don't sort of get data download it and compute on it read the file and spit out and write the output but the paradigm shift is in that now the data is not moving because it's so big so you move you move your software to where the data is and there's also hasn't there's some algorithms are are working this way now is basically they're working on a data stream and so it's not it's the data coming out of the machine it gets processed while it's still in RAM and without having to write it and then you you write the output to to disk and so that you don't have to there's no read file initial first step you just sort of read the stream from the stream and so it's a whole different way of doing things it's and it's not all software there's still lots of software that reads files and writes files and so forth but there's definitely that the shift was starting to be talked about a few years ago at the AGBT the which is the advanced AGBT advanced genomics what does the AGBT stand for advanced in genome biology and technology yeah so it's basically the trade show for Illumina and all the sequencing machines and all the vendors that do are in that space and so it's a it's a very interesting meeting to go to to find out what the latest stuff is so a few years ago lincoln stein who's our our director here of informatics and biocomputing wrote a paper about a case for to convince people that they needed to move to the cloud and so it's sort of a paper that would not be written today because it's sort of obvious today but at the time it was less obvious and it was definitely a sort of an interesting challenge but he in this paper he did sort of bring up some other challenges that we haven't sort of quite figured out yet and so i'm going to just print all the the lines here and so the first one was the the cost of storage so which is doubling about every 14 months so then doubling ie could store more or the price is reducing you can store more megabase per dollar over time before next gen sequencing had a doubling time of about 19 months and then after next gen sequencing was introduced in about 2002 2003 sequencing had a doubling time of every four months and so what happens now is that the cost of storing a nucleotide is going to become more expensive than the cost of sequencing a nucleotide so it's a so with the x10 and and the 10x also actually both of those sort of long read technologies there is the a thousand dollar this so-called thousand dollar genome is is here it's 1200 or it could be a few it could be a couple of thousand depending on how it's done but it's sort of that's that order of magnitude the first genome i mentioned the 2001 human genome that single genome which took 10 years to do cost about a billion dollars and so now we're about a thousand dollars and so it's a it's quite a it's quite a reduction in price of course but now the what's happening is that with the cost of sequencing outpacing the cost of the the cost of storing it's becoming cheaper to resequence a genome than to store it if you think about it which doesn't make sense because uh and it actually not practical at all so what happens is that although the storing of the DNA sequence might be expensive and sequencing it might be less expensive doing the if you factor in the analysis that you needed to figure out what that sequence that nucleotide is that adds quite a bit of price to it and also there are some samples especially what we deal with here in cancer that are actually you don't have the ability to resequence them all the time because you have limited amounts of DNA so that's an that's an issue and the biggest issue is actually bandwidth bandwidth at the sequencing level is we don't have bandwidth to sequence everything all the time and so we do need to store bytes and not minus 80 DNA to be able to resequence it although that's probably the most compact form of sequencing DNA of storing DNAs in the freezer at minus 80 but so those are things that definitely are are to be considered so so we have lots of data we basically in many places we have inadequate sort of IT infrastructure in most labs some labs are more in doubt than others but if you go across the world and across all the biologists and what they have access to there's quite a bit of places that don't have quite exactly what's needed and we could write more grants or we could get more hardware or we could definitely look to the sky and definitely a number of companies have done this a lot of sequencing company Lumina and others have do that so you can send them your DNA they'll sequence it they'll put it shipped hard drives to Amazon and then you or get the data to Amazon one way or another and and then you can do the analysis on Amazon in the cloud the shipping part I remember speaking to bioinformatics people at Amazon and they say oh yeah we're really good at shipping stuff and so really so the shipping hard drives is not a problem for them and they do it all the time and so so you can ship your hard drive to Amazon that's the fastest bandwidth of course you can get data onto the cloud that said uploading data to Amazon is free so it's if you have the time then you can do it you can do it that way as well and of course a number of companies already use Amazon so we talked about Lumina, Nanopore which is another sequencing technology, Dropbox, Twitter, Netflix and so forth all use Amazon and are all sort of using the sort of the elastic cloud availabilities and so forth that there are possible there and so and and so that's that's happening now likewise there's a large number of companies that have data centers all over the world that are humongous I think I spotted a typo here in the I think the Apple data center in Ireland is more than 1.9 million 1.8 million is probably 1.9 billion and but the other ones are all in billion so I think that one's wrong but so of course Apple, Microsoft, a number of companies have large data centers cloud infrastructures that they make available so of course we call them clouds but they they're actually physically somewhere on the planet on the on the ground not in the air and but they are available from mostly from everywhere on the on the planet so Amazon which are Amazon web services which in part was developed to deal with their shipping and handling of of business that they had to do and then they realized that they built all this infrastructure and then they had some extra sort of bandwidth available and they say oh maybe we can sell this service to people and it turned out to be one of the first and one of the bigger sort of service providers and and they've developed a method by which basically they encase large data center units that they build and then they want to add one of these things which would have you know maybe a hundred racks with all this hardware storage and and CPU on it and they just back you know get a truck bring it in and hook it up to the data center and so they have data centers which are larger than multiple football fields of these containers that each of which can just hook up to each other and and make the resources available. One of the sort of concepts that I think I first heard from Amazon although I'm not sure if they were the first I'm assuming they were one of the early ones is this concept of elastic computing which is basically the ability to use the amount as much as you need and stretch and stretch and stretch and to get more if you need it and then if you don't need as much then you sort of let the rubber band sort of come back together and use smaller amounts or just you know turn it off altogether so it's the ability to come in and use the amount of computing infrastructure you need to do the job you need to do and then sort of close it up afterwards and so that from a user's point of view it can be a lot more convenient and cheaper than maintaining your own data center year-round which is going to have more or less usage throughout the year. That said for example OICR has a pretty you have actually three data centers three rooms separate machine rooms with with computes and we use Amazon as well and we use other external Christina will talk about it more tomorrow we've used infrastructure like all over the world so there is always a place like OICR which has 150 plus or 130 plus bioinformatics people will use all the compute infrastructure we have here and then it will need more than that at different times then it will go out and use for different projects and go out and use that externally the same way the cancer genome Collaboratory is an example of a cloud infrastructure that's academic in nature but that we're going to run as a sort of cost recovery sort of method to actually so we'll have data we'll have cancer genomic data on it or we have cancer genomic data on it and then but we'll charge people eventually we haven't started doing that yet but the plan is to charge people to use cycles on the Collaboratory so we'll have all the permissions you need to access human genome data and the cancer human genome data in our case and and then the compute cycles so why would somebody use that instead of using Amazon well because we have the data here already there some of it is on Amazon but we have actually a bigger data set here available so if you want to compute on this data set then it makes sense to go where the data is and so likewise Amazon might be better if you want to do bigger jobs than the number of cpu they have because they're much bigger institution or organization they have their elastic band can stretch a lot bigger than ours and so they will be able to do much larger jobs compared to what we have but what we all have available is going to be it's going to satisfy what we think most people that want to do sort of cancer research so i so amazon is is is not cheap and so then you can go look at the pricing and um there to move files to amazon as i mentioned is free but to download them is not free so you can generate lots of data so you can move your your BAM files for example do your computes so you get charged with the computes then you want to download the VCF files which are much smaller files then you will have to pay for that that bandwidth of the downloading the files so downloading the files is going to always going to cost you at a place like amazon and it may not be the best place for for everybody so so amazon so collaborate may be better for you or some other academic cloud or if you have collaborators in europe there might be some clouds there that are more are easier and so forth um there we'll talk about standardization there's lots of of things that from a one of the reasons why we've been so things that worked out so well for us using amazon is it's pretty standard hardware and software is involved there and and it's very there are no surprises if you go to an academic cloud you might get surprises and uh not all clouds when we i'm not sure christina is going to talk about that but when we start we're doing that the peacock project we invited all the people part of the project from all over the world to contribute cycles and they all said if you have cloud infrastructure available to you please let us know and turns out that not everybody has the same definition for a cloud infrastructure and so uh some places they'll ask you to tell them which command you want to run and they'll run it for you what you want me to type and they'll type it for you and they don't actually let you log into their system and and do it yourself so so that's a sort of a differences but with amazon of course you could get there's that's a lot more clear and standardized phi uh marx going to talk a lot about that and and sort of dealing with this the security concerns and personal health information the challenge there is the variation of those laws across the world and for the icgc with the international cancer genome consortium and dealing with the cancer data from many countries they each have their own rules and and and reality that we have to check in and amazon being a us base company although it's a multinational but it's still us base it's still um deals be it's part of the it's basically the patriot act which allows uh the law enforcement to go and reach through servers and data without you know with minimal sort of uh any suspicion of terrorist activity and so forth would reach through even a amazon server based in our island for example so because amazon is is a uh as a us company then if you have german data on amazon in europe that the the patriot act would allow people if the americans suspected the germans to be involved in some activity they would go in there and so the germans don't like that and so they actually don't put their data on amazon and so it's a sort of um it's part of the deals and and realities of doing uh uh that kind of data so compute canada was part and responsible is overseeing all large compute infrastructures across canada and and basically their aim and vision and so forth is to make the best resources available to canadians so uh a challenge for us to use a compute canada infrastructure and this workshop but not in this class i don't think is that everybody's based in canada so that's that's okay it's good uh if you use so we actually the collaboratory is a is part of the compute canada but it's actually going to be working differently because we have this different sort of business model that we've presented to them and they accepted to us to to do but other compute canada resources which have been used in other workshops like the um metabolomics i think it was and now the uh microbiome workshop right or no non-amazon they use a compute canada it was the kibaxi there's a it was a epigenomics one yes that's right it was geome in kibac in montreal and so in montreal they they use a compute canada infrastructure and there they had american students that could use the infrastructure during the workshop but couldn't use it after they they went home so that's a bit of a problem for the international nature of our student population we have so we've taught in cvw over the last uh 17 years we've taught about 2,200 people so maybe 2,200 to 2,300 and maybe three or four hundred of those are international and so we've had a number of international students that we're very much sort of aware that said the collaboratory will not have that issue so what we're going to be looking at in the next couple of days is going to be compute canada infrastructure which is funded by canadian dollars but which will not have uh this uh this issue so that's going to be very useful and so as i mentioned availability usable in a workshop and the collaboratory is it will pay for cycles later on so how to interact with the cloud so so one way of thinking about the cloud is a sort of a uh high performance or hpc system that somebody else is taking care of so it's not local it's a bit awkward to say that here at oicr because we're taking care of our own cloud here and so it doesn't apply to us here but in general when you're going to be using a cloud infrastructure you actually don't need to know where it's physically located it's somewhere on in the internet so you have an ip address you have a domain name you know how to reach it and um but it's as if it was you're buying or leasing or borrowing a slice of the infrastructure for yourself and calling it your own so you basically it's as if you had sort of rude access to your own system that you're which is protected from everybody else's activities in a computer infrastructure and and so this concept of elasticity i mentioned earlier is is really useful way of thinking about you take up as much as you need and when you don't use it you let it go basically and so that it's really important from a consider small clouds and large clouds so if you think of amazon as a very very big cloud and the collaboratory is a small cloud uh it'll be very important to let go of the rubber band of the elastic once you finish using it so that you'll be able to you'll make the resource available it's not none of these even amazon is not infinite but amazon has so much that it's hard to see how and they want you to spend so they'll make it very much available so another way of thinking about it is that a traditional computer and that could be your laptop or it could be a high performance computer will have a sort of hardware layer will have an operating system layer and then you'll have your software and so you can think if you put all of that in a bag you have to have security to get into that bag so you'll have security to get into the machine room basically or you'll have security to get to log into your desktop computer and so you have to know the password to get onto the computer to be able to do things the virtual machine concept is that instead of the operating system layer is more a layer that's allowing multiple little machines if you want to think of it that way to spin off and be themselves enclosed into as if it was a your own laptop or as if it was a thousand laptops that you had access to that you could launch jobs on and and so forth so the skill and challenge and using these kind of hardware is one where's the front door so how do i get into the system and how do i manage ten copies of a of a of a pipeline versus a hundred versus a thousand and some systems allow you to do that and some don't and learning the differences of the various systems is is is really uh important another additional sort of container thing that we're going to talk about in the next two days is docker and docker is uh i'll go over so basically docker is a container that works within a vm basically so it's like in in this docker container you have all the the nuts and bolts you need to run a pipeline to run one tool let's say the simplest way of thinking about it it's just it does one tool but of course you can run multiple tools and chain but or you can run dockers and chain as well but the simplest way to thinking about it is that it's a container that has all the libraries all the the software all the the things you need to put together to run an application and so in that it's um it's also referred to as being very lightweight in the sense that it's not very big file and then you can store that file when you're not using it and then you can share that file with other people so that people can reproduce and run the same application you just ran and with the same libraries and the same software and the same code and so forth and have it so that it's totally reproducible so the big concept behind all of this is that it's to enable science to be reproducible to make things easy to share to make things open to make things so the whole docker project is an open source project the the virtual machine space is actually very much there's lots of open source project there's lots of commercial projects in that space vmware is the first one that comes to mind as a commercial project and so vmware will is a tool to generate vm's virtual machines that allows you to to to run multiple so for example you can run a linux environment with linux tools on your mac or your pc because you've got a vmware layer that allows you to abstract the operating system and then run that separately and so docker is a way of of putting things together in a way that makes it easier for you to to share store and and and reproduce other things and so we'll talk about ways of distributing docker and ways of sharing docker containers and we'll have you'll have lots of practice with those later on so the other sort of flip side of the workshop here is that we're dealing with human data and so there's a lot of uh caveats that come and and restrictions and and things that you have to worry about with respect to human data and so um so this personal health information which is obviously clinical information which is is is important and uh and should be considered and many people consider as private information and genome sequences is the ultimate sort of genome personal health information in the sense that it's totally identifiable to you and it's very uh it has a great prognostic capacity in the sense that you can get a lot of information about the person by looking at their genome and so how come we have so many genomes available to us to do research well then uh mark will talk about that later but there's a lot has to do with the our society basically is entrusted scientists to do good things with this information and so it allows us to do uh to make discoveries about cures for cancer and and many other genetic diseases and so forth and so that whole concept about uh the the balance and the responsibility of scientists to use this information in an appropriate way and to not try to re-identify people and and to not and all the many things that Mark's going to talk about is really critical in our use of human genome data. I um explored with uh Bartlett Knoppers who was actually a PI on on the collaboratory grant which is also a bioethicist about the possibility of us doing of getting sort of for this class getting uh dacoaxes or controlled access to to data and it was entertained for a few minutes and then discarded as a not a a feasible idea although it's been done in other courses with other data sets but um that said it's a uh it's a very uh serious and a very important thing that people need to consider very carefully when they do require access to this kind of data and do get it and what they have what that means from from the patient's point of view that made this data available to them and what it means from a society a research community point of view as well so it's a it's a really yes so that's a very interesting question so that are we going to be able to share everything with oh everywhere um maybe you better wait for Mark to ask that question I think so so the for example right now we share what we look like right I mean that's sort of public domain what we look like and so there's you know I don't know how many billions of people on the planet now but we can all look at every face basically and and so forth there's some laws that sort of prevent not publishing the you don't have permission to take pictures and publish them but I think we could have security by obscurity in the sense that if there were six billion genomes available right now we had every genome available that would be a very interesting uh possibility yeah and all the the clinical information that goes with it yeah which will be I mean that's the ultimate data set right I mean it's the we'll be able to to decontolude environment and genomic and and and phenotype outcome so uh I in our lifetime I'm not sure but maybe in the next 50 years I'm older than you so you'll still be alive in 50 years but we can revisit that later I think yeah yeah so um so in this workshop you're going to learn about the ethics and the rules allowing you to use human data you're going to learn about VMs and Docker and Collaboratory and Peacog so Peacog is a pan-cancer analysis a whole genome so it's a project where we've made extensive use of Docker and and cloud computing across the world and and so and Christina is going to talk about that and that's it for me any questions a bit early I think yeah so again yes and yes you can make it secure enough that you don't necessarily see where other people are doing but slightly than if you do you know p.s command and see where other processes are going on the main scenes I think um I sort of compare HPC facilities to your laptop to a big laptop or lots of laptops and so it's it's really having and so if you have a thousand laptops and you have like a hundred people using a thousand laptops you can sort of see moving around and you can sort of be sharing I think the cloud computing in academic cloud or commercial cloud has encapsulated the sort of the environment in a way that you really don't see outside of your the resources you ask for so you really don't see mood so you don't see the mood access of the machine next to you you don't see who the access of your machine but which you have access to but it's really so that sort of what you said at first sort of bit of that in the sense that you don't see what the others are doing and and there's a lot there was a lot of concerns at the beginning with with cloud computing about security and actually places like amazon and i'm assuming the auditory tooth is a new infrastructure are a lot more secure than a lot of laptops and then a lot of computer infrastructure or a lot of academic clouds and so amazon is actually probably more secure than most computer systems you've ever come across and they have double encryption they have encryption between the transfers encryption once you're on the system you need a file key you need two factor authentication to get on and so once there's all sorts of layers of security are available in amazon that are not available in many academic clouds or in many academic nations in the future because you get a window you type in the password and you're in and so um and hopefully you know system in system in but the password doesn't work for that one and so but that's it you know I mean once you are for example actually the system to appear prides itself and actually having a very secure system and they actually they don't have two factor authentication but they have uh you know they do a pretty rigorous password check and and they they they monitor a lot to make sure that they like we get poked at by the rest of the world extensively and they see the code but they they also see it as a way to get it yes yeah can you put several doctors together to make a pipeline for example something about passing math and uh another program about passing limits uh so so the doctors so the string of doctors needs to be in the same sort of operating system space so you can't sort of do can't sort of mix and match that way that's it and um it's funny talking to uh Brian O'Connor who's sort of are one of our docker masters at who used to work at the OICR he he prefers things to be put in one docker container that technically that being said you can put multiple the output of one docker can be the input over the next so the issue one issue with dockers is that because it's such a lightweight sort of infrastructure there are it's less secure than than a vm for example but if you own the vm and then you have your docker in it then everything is no concern because you've got there's nobody's going to come into your docker because your the space that your docker container is in is owned by you and only you and so from that point of view it's less secure but it doesn't matter because you're you're you're a master of the ship of the master of the of the super big ship that the your whale is into