 Okay, so we'll get started with the module one of this workshop in New York. So the usual Creative Commons slide, like Michelle mentioned, and just to stress the point that it's a buy and share like license, that means that there are different versions of the Creative Commons license. The one we're using is that you have to cite, if you're going to use material, you can modify it, which means you can take, let's say, a slide out of a PowerPoint and put it into your own lecture. That's fine. But you have to give credit to where you got it from. And this is sort of the little bad thing for some people, but I think it's a really useful thing. Because that means if you use one of our slides, you have to share your slides. And so that is sort of an infectious agent. So if you get one of our slides into your talk, then your whole talk becomes free to the world. And so it's really sort of a buy and share like, so you have to share also. And so I encourage blogging, tweeting, video, taking notes and everything. So I'm the only one that uses this slide, I think. But definitely I'm a big, big proponent of making the material available. This work I'm here this week is on behalf of my institution. My institution is paying me to be here this week. It's not, I'm not getting paid by the workshop per se. I'm being paid for by the OICR to deliver. So this is part of my job to give you these lectures. Right away I'm going to put in this slide to say that the slide deck you have in your book is different than the slide deck I'm going to present to you. And so that we had to submit our slides a week ago, a lot has happened in the last week and we're always thinking about our lectures and we're always updating our slides and whatnot. So the slides today's lecture, this morning's lecture is actually not that many changes so it's not a big deal. But my lecture tomorrow is going to have a few more slides that I've thought about in the last week and that I need to add in before, after I sent in my slides. So all the slides that have this little white dot on the bottom left here are slides which are not in your deck. So don't look for these pages, if you see a white dot that means you don't have it. But it is on the wiki as this PowerPoint is on the wiki and so actually I was having some problems on the wiki this morning so the PDF version is there and not the PowerPoint but I'll fix that later today so you'll have both the PDF and the PowerPoint available from the wiki later today which we'll have this slide. This is a different slide so that's my email, it's my Twitter account and the hashtag I was going to use for this workshop turns out not so good. This is going to have to be one that we use if you want to use the hashtags if you know about hashtags. If you don't know about hashtags, you can ask me later. So we're going to mention a few companies and we have already like Amazon and things like that and so I just want to declare that I'm not profiting, I don't have any stock options in Amazon or Oxford, Nanapur or Illumina or any of these companies and so I will not profit in any way shape or form from mentioning any of these companies. So what we're going to do in this first workshop is like Michelle said it's a very easy to start you warm up. I don't mind laptops being open for this one, you can focus, you can take notes on your laptop and so forth. But it's going to be an oversight of these workshops and also I'm going to introduce you to some of the cloud computing work that we're going to be doing this week. So we'll give you some intro about bioinformatics, history of the bioinformatics.ca, cloud computing and getting on the Amazon web server. So what do biologists do? They like to make observations. They like to make hypotheses, test them, challenge them, conclude things and then write papers and so this is really what what lab biologists like to do but it's also what bioinformaticians like to do. We basically do the same thing. We do what you do on a laptop is really a bioinformatics experiment. So you're testing our hypothesis, you're looking for something, you're making discoveries and you're interpreting data and so forth. So it's although bioinformaticians don't necessarily think of themselves as experimentalists, we actually are very much so and my background is in biology and but I have colleagues of course and you'll see from the various instructors you're going to have this week have backgrounds in computer science, math, physics and so forth but really we're all doing testing things and looking at various things. So this week we'll be looking at RNA-Seq. We're not going to be doing any protein mass spec work but it's part of the field of computational biology. Later in the week we'll be doing interactions and pathway analysis. This is a diagram, very clear diagram of the this was a work done several years ago in integrating all the interactions in the yeast cell. So this is all the known interactions put into one big network diagram and the color is of the dots represent the dots are all the nodes are genes and the edges are interactions between these genes and the different colors of the dots represent the different go gene ontology references space and so you can see things that are related to each other by their color. So the central dogma which all of you know I'm assuming is DNA makes RNA makes protein and then we have the sort of what I call the NCBI version of the central dogma then DNA makes RNA makes protein and then you write a paper about it and so linking of course linking DNA RNA proteins and publications is sort of a key thing to computational biology and key thing to the way we do work here. So some of the things when we try to understand the cells is we do experiments some of these are bioinformatics experiments like I mentioned we want these to be reproducible we want the people to find out our data we want people to find out our methods and we want them to be able to rerun these experiments validate them and move the science forward. So the classic experiment bioinformatics experiment would be to do a blast search and how many of you have done a blast search before I hope most of you many of you yes good so we're not going to do blast this week that's somebody else's lecture but it's good. So but think of the sequence as the reagent right the sequence it came from the database you're using think of the blast search itself as a method and so which one are you doing and doing a protein protein blast or you're doing a nucleotide protein or protein nucleotide and think as the alignment as the interpretation or looking at similarities and hypothesis testing and so you have to know your reagents you have to know your methods and you have to do your controls and so you have to think about so what would be an example of a control in a blast experiment anybody yes that's one example or also a sequence you know is in the database if you're able to find it if you can't find it and you know it's in a database by looking for the same using the same sequence is obviously using the wrong parameters are they aren't able to find each other so you have to have both negative and positive controls and doing your experiments and when you do bioinformatics experiments you think about it that way as well sort of things you're expecting to see you do see and things you don't want to see you don't see. So what is bioinformatics I mean we're all doing we're all came to learn some bioinformatics this week and so I'm going to give you my definition and the my definition is covered on your slide so you can't see it right now but I'll show it to you in a second and if you download the PowerPoint you can see it but right now I'm going to ask you to team up with the person next to you so two by two sort of discuss and write down one sort of definition of bioinformatics start now and then I'll be asking you so write write down something and then I'll be asking you just in a few in a minute or so okay so there's a don't spend too much time on this does anybody have an answer they want to share with the class basically a summary of everybody is bioinformatics is about integrating biological themes together with the help of computer tools and biological database and gaining new knowledge about the system and study so it's really all of the answers and of course you put 10 bioinformatics in a room you'll get 10 different answers which is quite quite appropriate but it's just it's it's important to think I think about computational biology or bioinformatics and there's lots of debates is it bioinformatics is a computational biology is is it biocomputing and so forth I do not we actually were involved in writing a bioflex book 20 20 is it 20 years already it's almost 20 years ago now 15 17 years ago and we wanted to call it computational biology and the publisher back then told us over our dead body you're calling it bioinformatics and bioinformatics is a much trendier word than than it is now and I said okay we'll call it bioinformatics and so we did but it's really there's a lot of people that say oh it's computational biology cell bioflex as far as I'm concerned it's both of those cover this space people that like to do sort of more algorithm development and so forth like to think of themselves more as computational biology people that sort of practice computational biology think of themselves more as bioinformatics I it's really sort of semantically sort of not really useful to separate those two I think all of those people welcome to this field and contribute to this field and you can be a practitioner a developer developing databases developing ontologies and so forth and you're part of the community it's a very inclusive community so I think it's not really useful to separate it with those terms so the the Canadian bioinformatics workshop series was started in 1998 and so when I was at I used to be at the University of British Columbia in Vancouver at the Center for Molecular Medicine and Therapeutics and also part of the Canadian Genetic Disease Network which is basically a network of which doesn't exist anymore of geneticists from across Canada there's about 50 groups across Canada was part and one of the things that they realized they were needing back in in 97 actually when we started planning this workshop or this idea is that we needed training and computational biology slash bioinformatics slash whatever your favorite name is for it and so we started back then through the so through the CGD and the Canadian Genetic Disease Network we started the Canadian bioinformatics workshop series and actually one of the TAs in that first year of the workshop so realized that bioinformatics.ca domain name wasn't taken yet and says oh you should take you should register that right away so oh my god that's a good idea and so thank to that TA we's now a faculty we we we got that domain name and in 1999 we actually had our first workshop in Calgary and back then the workshops were there was four different workshops we taught and the bioinformatics workshop was a two-week workshop so you think you have it long this week imagine being here for two weeks and so it was two weeks and we covered all the basics of I refer to Blast and Altree and structured databases and various nucleotide databases GenBank and so forth and we covered some basic genomics we did some HDB back then we did a bunch of things and then that was so there's two-week workshop and that two-week workshop was a prerequisite for the one-week workshop of the other type so the other one-week workshop we had we had one on proteomics one on genomics and one on tool development the tool development workshop was was the least popular of our workshops and we've only offered it like three or four times over the years it also turned into a one-time into one year into basically NCBI toolkit development workshop it was and it turned out not to be sort of the big one but even that sort of bioinformatics proteomics and genomics even those around 2006 2007 we're getting fewer fewer people coming to these workshops and so we thought are we are we becoming bad teachers what's happening and there's several issues one is that these longer workshop as you probably know taking it going away for a week is difficult and so being away from your your lab and so forth for a whole week is a bit challenging and so and going away for two weeks is even more difficult but the other very big change that happened in those seven years eight years is that universities started offering these introductory so they really sort of interact introductory type workshops and so they were now being offered throughout throughout the world basically so and and so the our introductory workshops of one and two weeks didn't quite fit the model anymore and although they are still offered sort of similar sort of two-week workshops are offered at Cold Spring Harbor they're offered at the EBI in Europe they're offered at several places and so it didn't quite work out as much for for the K and Marfa Rags workshop as well so what we started in 2008 which actually was coincidental with Michelle and I's move to from Vancouver to Toronto is we started a new series of workshops which were short and more sort of leading edge type technology new the new things that are happening in the field and having it sort of two and three-day workshop allowed us to have them for us to be more flexible and to be more to the point of what was needed to be delivered and so we started those in 2008 and now we have as Michelle mentioned a number of workshops that we're offering in across the year mostly in the summertime but now we're not quite the summer yet and we're started already and so we have in sort of alphabetical order we have a meta genomics workshop and that's three-day workshop cancer genomics which is a five-day one of our longer ones analysis exploration and and our workshop two days and so forth are we okay yeah so we're just powering people up then need some power so we have a high throughput biology from sequence network which is this one this week which is our first time offering information and stats and meta genomics sort of metabolomics two days another RNA seek workshop two days and informatics a high throughput sequence data it's a two-day workshop and so when you think about it giving a two-day workshop on high you know on next-gen sequencing bioinformatics is a little surreal I mean it you know there's a lot of material to cover but we definitely we sort of teach as an example of that one is we teach where to find things and how to do things and hopefully give you that the starting ideas and vocabulary and insights and what you need to do so this is a modified version of my talk oh yes and so so this week's workshop is actually a mixture of three of our existing workshops so it's the high throughput biology workshop informatics so it's this workshop here to see high throughput sequencing it's the RNA seed workshop so it's two plus two means four and it's the network pathway workshop which is here is three so that gives you your seven days so you're getting seven days worth into one week sort of crammed into to to which we haven't done and like I said before we try to schedule sometimes have workshops back to back so people register for them separately but we haven't this the first one we've given as a package as Michelle said you know all information is about the workshops are on bioinformatics.ca and invite you to visit that website and all our previous material from previous years is available online and as we mentioned it's available as either and or PowerPoint PDFs and also movies and so we have movie files that are available in from a voice over PowerPoint and that works most of the time and so this is the course info or URL the workshop announcement mailing list if you want to be on the mailing list for that it's a separate list and the one Michelle mentioned and then invite you to subscribe to this one as well so this is standing in a soap box time a little bit so open access open data open source are essential I think are essential for science they're essential for computational biology and then essential for for us to deliver this workshop and so it's it's not only a responsibility and an obligation but it's I think it's something that comes with privileges of doing publicly funded work and so I strongly encourage you to think about and think in this space and and think about how you who who how yourself you can contribute to this information space if you take if you think of blast which we talked about earlier it is a software package which is it's out it's out she's more than open source it's actually part of the public domain so it's actually free for you so it's it's blast was developed by the NCBI it's actually so open that you can actually download it you don't have to do anything you can sell it if you want you can find somebody who will buy it you're allowed to to sell it and so companies have done that of course they've repackaged blast into their own package and and sell their whole package which includes blast and so that's totally available it's it's the the work of the US government and and it's it's made it's made that possible but blasts would not exist or would not be necessary if it wasn't for GenBank and GenBank is the DNA sequence database of all publicly available sequences and if we didn't have GenBank then we wouldn't need blast so blast GenBank which is an open source sorry open access database made it necessary to have a tool like blast and allowed blasts to be developed and perfected and improved and so forth and so all of these things are connected to each other and really are critical for the work we do that said I don't want to be against commercial tools per se and so there is a niche for commercial activities and computational biology of course and in my mind if they take they can develop their own software and if they want to sell it and people want to buy it that's great but I think where they serve an even more important niche is in the sort of support and service sector and so people that buy certain software packages what you get is you get a help desk you get a somebody will help you with using that software in their package and so forth and so I think that's great for companies to do that but academics work in sort of the sort of the free space in a sense but in the sense it's but nothing's free right my salary is paid for by somebody my time and everybody else's time here is paid for by somebody and so none of this and the internet is not free because somebody paid for that as well so all these things are free in the sense that they don't cost you anything to do but there's always there's a number of organizations and groups paying for it in the background to make things better to improve health to improve crops productivity to improve connectivity between people to improve science and so forth so all these things are critical to what we do but at the same time what happens is that I wrote a letter to nature a few years back that it was critical that if you find something wrong in a database if you find an error and I swear to you there are some errors there it's I know it's hard to believe there are errors in the database of the public databases but there are some and if you come across one of these and you'll see that a sequence doesn't make sense it's contaminated with with vector or whatever there's something wrong with it it's really your responsibility as a user of these resources to let the resource themselves know that the changes need there's an error there are peers to be an error here maybe you guys want to have a look at it and I did that one one several years ago I found I was was doing a search for mitochondrial sequence and using blast and so forth and I came across a record which obviously was contaminated with vector sequence and and so I informed the database and questioned that had that that sequence and they fixed it and then a few weeks later there's the sequence shrunk by half and so and then you could do and then so my search was not confusing anymore so I redid my search and now it made sense the results I get and I didn't hit the sequence anymore and so it was really useful to move to move things forward so so this letter was about GenBank but this could be you know any database that you that you work with or any software this is software packages not that's misbehaving I'm sure the people that are using that software would love to hear back from you and to hear back and so it's really becomes again a responsibility of the community to be able to use that Michelle you pointing at me of something no okay okay oh yeah well we'll get there so so why do we have my friends because open data from genomics and proteomic technologies and so the reason you know we're here this week is because there's lots of data coming in a lot of it is publicly available obviously not all of it is but at the same time many of the tools we're going to use this week are publicly available their open source software packages mostly if not entirely and that's really sort of key to what we're doing this week so Michelle is just wanting me to jump along to get to this slide so we're in this slide and what we're going to talk about now you switch gears a little bit I'm just going to talk about cloud computing and why we're using cloud computing this week so so far before I start this section any questions comments concerns why not are things okay good and feel free to raise your hand throughout whenever I talk tend to ramble sometimes and trust me that's good I don't mind and let me know okay so we're okay so far we haven't done anything hard yet so that's good that you're okay so so cloud computing and the new sort of software paradigm so I think so one of the big things that's happening and in our world now and competition biology and biology in general is that whoops what happened here so make sure that still running competition okay that's still running that's good okay so is that the data space we're working with is sort of reaching sort of petabyte scale and soon probably exabyte scale so as an example so one of the projects I'm involved with is the International Cancer Genome Consortium and we are sequencing the worldwide this is not just us at OICR but we're part of a group worldwide sequencing on a scale of about 25,000 human genomes from tumor samples paired with their normal DNA so that's actually 50,000 genomes and transcriptomes and epigenomes and clinical data and at the end of so we're about sort of three quarters of project but at the end of the project we think that the whole data set will be about 10 petabytes and compress will be about three petabytes so this is really sort of we're talking starting to this large scale so this will not be a data set that you can download to your laptop don't don't think about it but it will be a data set that will be available somewhere and you'll be able to go there and do your work on that data set so that's the way that's the way we're thinking about this space right now so there are more and more large data sets that downloading to your computer is not really possible but moving your computes to where that data exists is possible and this is where cloud computing the rationale behind cloud computing comes from and the other thing about as you probably know about human data is that often you need special permission to look at this data because human data is often not always but most most of the time is covered or protected by a body there's several bodies across the world so that sort of makes sure that if you're a scientist looking at this data you're gonna do good things with this data and you're not gonna try for example to re-identify the person that whose DNA sequences you may be looking at and you're not gonna try to to exploit the fact that you know who it is the fact that they have this this mutation or this disease and so that so you're not gonna do that kind of things and so that's why you need to have special permission and the special permission you're asking is actually making sure it's also signed by somebody who can fire you should you digress from what it is you said you were gonna do and so your institution usually backs you up on on your application for this kind of data and so the other thing about sort of things that are changing in the software development world is that it's becoming less and less although most of things we're going to be doing this week are still in that space but it's less about reading a file into memory doing some computes and then writing the output file but more and more into sort of working on streaming data and so the Oxford Nanopore for example is sort of using that thinking about the data the data is streaming through the machine as it comes through some analysis is done on it and then some files are made up and then at the end when the thing is finished streaming then you write a file output and so all of these things are sort of changing the paradigms in which we're thinking my colleague and the person I report to Lincoln Stein wrote a piece in Geonobiology a few years back looking about the sort of the growth of the data and so he looked at sort of the next-gen sequence growth curve and so how much it was costing to generate data and so the hard disk storage and so how much that was going to cost and how the with the new next-gen sequencing how that that curve is changing so that basically what was going to happen is going to become more expensive to store nucleotide than to sequence a nucleotide okay so think about that and so it's going to become it's easier to generate the data every time you need it than to actually store it on a computer and then going to that computer to get that nucleotide we're not there yet although one could argue we're going to be there in a few years what this equation doesn't take too into account is that the bandwidth the machine bandwidth basically that is not there we don't have enough machines to resequence everything all the time then to actually go sequence only when we need it but in the end the best and most secure and probably most compact way of storing a DNA sequence is probably in the minus 80 freezer and a vial and so that if you need that sequence again that's that should be your backup it should be that DNA sequence in the freezer and so sometimes it's possible to do that sometimes you're sequencing samples from a cell line for example and so you have lots of DNA but sometimes you're dealing with cat tumor samples or environmental rare environmental samples and whatnot and so you don't have access to DNA all the time so you do have to store the DNA sequences and you have to make you have to keep it around and so forth but it's something to think about so we have now we're almost at the thousand dollar genome so which been talked about for so many years and so it's depending which company you speak to it's a thousand or two thousand and depending on which currency Canadian dollars US dollars and so forth it varies a little bit and and but what the joke is is a thousand dollar genome but it costs a million to analyze it right and so it's so even though it's a cheap to sequence it's continues to be very difficult to analyze and one of the reasons why we have to keep all these fast Q files around right now is that because we actually don't trust any of our software and generating the right answer and so we want to keep the fast Q the raw data around so that new software comes around and can reanalyze and remap and realign the whole the whole thing because nobody's got a right yet and that's why there's many if you read bioinformatics and computational biology journals you'll find many many new tools to align and to map and so forth be it for DNA or RNA and so forth like there's a hundred plus RNA seek liners and so why they haven't figured out yet is because it's there's too many the technology the reads are too short maybe or or there and that's why the long reads are bringing new insights and you and we'll talk about that a bit more this week and so there's new the technology is changing all the time and so that the software obviously has tried to keep up and so so what's what are we to do so too much data is not enough you not enough computer infrastructure for most labs so where do we go what you know write more grants write more get buy more hardware or we look at the sky and obviously many genomic companies have done this already so they basically you send them your DNA they sequence it they put the ship or most the fastest way sometimes to get data from one place or another place is by truck and so so you ship your DNA on hard drives to Amazon and like I spoke to Amazon people many times it's oh yeah we're really good at shipping stuff I said oh yeah you are so Amazon if you want to ship a drive to Amazon there they're more than happy to deal with it and then load that drive the data from that drive onto the cloud and then make that data available for you and your colleagues and whoever you want to give it access to that way and so that's a really sort of a simple way of reason for using the cloud and of course many companies are there already so Google Dropbox Twitter Netflix and so forth they're all using the cloud right now and so Amazon Web Services are have a large they have football fields amongst football fields worth of these containers that they just bring in plug in and then all of a sudden they have hundreds of more racks of storage and and disputes available to them at multiple un untold places across the world one of the challenges we have with with we have currently with Amazon with respect to data acts with data permissions and dealing with human data and so forth which is now allowed but it wasn't allowed for many years is that Amazon won't tell you where they are so they're somewhere on the planet that's you know but it could be in the barge in Thailand or it can be in northern Canada although I don't think they're in Canada maybe they are but they haven't told me and or or it is in Virginia the Ireland and so forth so you know sort of geographically which space they are at but you don't know exactly where they are Google is the same as much more sort of hidden actually than Amazon Amazon you actually know the zone but of course people have flown over all these places and they see these large you know spaces and so forth so that's they're sort of easy to find so some the challenges with cloud computing it's not always cheap so you can do expensive mistakes I remember the first workshop we did we actually left the workshop going and we spent several thousands of dollars of non-use machines or maybe some students after the workshop we're still using I don't know but anyway we they gave us our money back so that was very nice getting files in and out sometimes is difficult so we have to leave you have to deal with the slowest point of connectivity of bandwidth connection is between it's not necessarily at Amazon it could be your institute has a bad bandwidth connection to the rest of the world and so that becomes it the rate-limiting step it's not the best solution all the time and actually this week we'll be using Amazon for some of the labs but not all of the labs and so you'll see some experiments which are done better and and I've talked about sort of personal health information and security concerns so some people say well if I have personal health information on Amazon is everybody gonna have access to it and so forth actually turns out that Amazon is one of the most secure compute infrastructure it's more secure than most of your university's infrastructure maybe not the DOD but but it is very secure and you can actually have double encryption and you can have fobs and all sorts of security protocols available to do work on Amazon so it's really it's actually used by the US government for for lots of very sensitive data and and so you are now maybe not a DOD but you are now so NIH has just given approval to use commercial clouds for human genome data so it's now as of two three weeks ago so you're now allowed to use human genomic data for DB gap restricted data sets you're allowed to do that on Amazon yet to get special permission from from the DB gap but it's part of it's gonna become there they've just opened the gates just a few weeks ago and so sorry yeah well it took I would say probably took three years from NIH to do to get there and so it was a long process so I know DOD will be three years from now yes and I yeah I totally appreciate and understand the challenges there so so obviously there's some of the advantages of cloud computing is that one is actually for this workshop we wrote a grant to Amazon so actually the Amazon dollars we're gonna be spending this week are free dollars in the sense that didn't cost me anything didn't cost Michelle anything so we got Amazon to give us free dollars but it's a little bit it's a bad joke but it's it's a bit like giving crack to kids so they enjoy it they have a great time this week and now they go home and they want to have their own then they have to pay for it so so this week it's free but once you go home and start doing the same thing on Amazon you'll have to pay for it and so you have to keep that in mind but it's really useful for our teaching because we're able to reproduce we are produced to have really sort of high quality compute infrastructure with the high bandwidth and sorry the high sort of compute powers that we need and have the same available to every student in the class so that would be very hard to have even in this building that to sort of the first years actually in 2007 we started doing the next-gen workshops at OICR in Toronto we actually got that systems group to sort of carve out part of the rack for the workshop but even that was not enough to do to handle the load from this class and so it's really from that point of view to have a spike it's sort of classic see Amazon usage is basically you needed a lot of it for a short amount of time then going to the Amazon is really the sort of the best use example for doing that there are it's it's getting easier to transfer large large data sets to Amazon and actually what's happening with Amazon is actually they're loading a lot of public data sets and making them available worldwide so the data is already there that you want to compute on so that's really it makes it useful like the thousand genome data for example there is AMI so Amazon machine images are available that have a lot of which is basically the virtual machine that we're working on they already have all the tools that we need for for this workshop and so there are lots of so we're going to have one that we're going to use this week but there are others that are available that are made available throughout multiple organization Galaxy for example which we're going to use as well is a bit is available has there's an AMI or there's quite a few AMIs that have Galaxy installing them but I'll talk about that more tomorrow and then you should also keep in mind that we're working with Amazon this week but there are other solutions there is a Google has a genomic space their institutions have basically what they would refer to as cloud infrastructure as well and so there's lots of solutions this academic what we call academic clouds also are available so you should look for this so this week some of the tools and data you're going to be working with are going to be on your computer some of it's going to be on the web somewhere and other things will be in the cloud and really sort of traversing in these three spaces it's going to be sort of things you learn this week and see make advantage of what's easier done one place or another and you're going to become efficient at sort of moving things around and there are different ways of using the cloud this most of the things we're going to do is going to be command-lined and so you'll be typing in the commands at the prompt and things some computers are going to happen and then you'll have files and then you can view these files in a number of ways some of which may be with viewers that you have on your laptop yourself or a browser or something like that so yeah so big data so that was some allusion to so this is a 5 meg hard disk from several years ago so 5 meg now I don't know is an email attachment right and so things have changed a lot over the years also this is a sequencer from one company that's available now so we use basically drop your DNA and it goes into the Oxford Nanopore so it reads DNA directly into into your your laptop to USB key into your laptop so things that we've set up for you so we've loaded all the data files on on AWS this week we've brought up Linux instance of and with lots of software for NGS analysis we then cloned these things and made separate instances that everybody will be using in this class and we've simplified security to sort of basically all have the same login and password and this is because we're going to be using data that's freely available there's no there's no controlled access permissions that are required but the ways things are set up this week on Amazon is not necessarily the way you'd want to have it set up for your own instance and so getting some familiarity with making things sure things are secure and so forth will be very important so as Michelle mentioned everything's on the wiki and all the updates of everything is on the wiki so all the changes and actually the actual instructions on how to get on Amazon is also on the wiki and so it's probably best to follow those instructions they're more the most up to date and so this is from this workshop and basically this is screenshots from Macintosh and so and I'll have some things for the windows so I'll declare my Apple bias upfront so sorry for the PCs and so one of the tools we're going to be using is the terminal so one of the things you should do is put that terminal to your to your bar and so the terminal is available from the utilities so if you go to the apps directory and once you run it then you should save it to your save and if you started the terminal application that's what it would look like so just how many of you here and don't be ashamed or afraid of sharing and sort of relatively new to Unix get it's good about half the class so that's so so the Macintosh end is really a front end to is actually Unix box so with with lots of graphical user interface and upfront but in the back end is basically the Berkeley operating system and so it's a it's a classic sort of Unix box and so with the terminal application does it gives you access to this space and this is going to be also the interface that we're going to use to access Amazon cloud and so so this is actually from the wiki and it gives you instructions which I'm going to ask you to look at from the wiki not really from my page but from the wiki there is detailed instructions on how to do things and so what we're doing here is we're copying this file the cwny.m to to your machine so you're downloading it from the wiki and on the Mac you do a control plus and then you save file, link ads and so on like that is a control plus also on the PC sorry yeah so yeah we can all do it together yeah and so maybe the our so now so according to the flavor of your machine, yes there's two instructions, all of the macintosh linux instructions or all of the PC instructions so the goal in the next five minutes is to get our self your self onto Amazon cloud okay and so please use your stickers but the instructions are on the wiki so we'll follow through the wiki yeah actually actually so you put on your on your machine you download it from the desk office maybe you create a folder where you put all the stuff from your workshop maybe day one if you want to be super organic but at least maybe you can make a cw folder on your directory or on your own option desktop so that it doesn't have something all the other time and that's when you know nowhere to go calling those files and so you're looking for that so they're all in this one so if you look at this file this dot PM file it's just very friendly it's all textiles but basically this is a key that allows you to log in to the computer Amazon looks for this key because what happens is once you're on Amazon as you know how many of you have a shopping shopping shopping shopping online shopping online shopping when that is on yes so so one of the things that this account long you need to go require is a credit card we're not gonna ask for your credit card but my credit card is there and so and it's this make sure that not everybody in the world and log into this instance of the workshop you need this key basically the volume so I'm telling you the key and the key is only available by the people in this room right now but not everybody can go crazy and then this is the account of course that the grant that are received from Amazon life to as well so we're spending we're not spending money but it is backed by your credit card as you know every Amazon account is backed by credit card this is good we need some security and some don't share this with everybody this is a key we generated from the access these images yes yes so some so if you follow this hurl some of the things you have to do once you've downloaded this file and then you have to change the permissions of the file there are several ways of doing this in minutes and a quick way is to do this command so ch1 600 let them file name and where the 600 comes from is by adding 4 to a stew you do right and six for the owner zero so just so you know everybody's name badge has a two-digit number behind you oh yeah both digits for you this is this is the number of your Amazon I had to be careful because they've had it before where two people have been on the same one and they're overwriting each other's files so very clearly again you need two digits these yeah okay okay okay so does anybody have a number that's larger than 35 we have one okay so you should not get Okay, anybody else? Does anybody have a number larger than 3 and 5? No, you have a number. I'm here in part. Oh, sorry. Any other questions? If you have logged into Amazon in its coffee break time, see you back here at 11 o'clock. If you leave the room because you know you're okay, you're still here and we're going to talk to you. And take a deep breath because the rest won't bring you your eyes. So there's two things. And then at the integrated assignment, you can ignore that if you know you're okay. So I have both ways. For the people who are computationally oriented, you can do it on your own. But if you are not, I have coffee taste. But then at some point, you need to try your room. There's a time where you must do it on your own. So coffee and taste now. So if you're down the place, even more. Michelle, I'm going to turn off the recording.