 So I will go through a bit of an introduction, but hopefully by now you have a folder on your computer that has the learning, the learner.rmd file and the data sets that are posted. So if you need to grab those, please put them in a folder together and we'll get started with that right after this PowerPoint. So I already gave a bit of an introduction about myself, but this is just to reiterate kind of the background in which I view science and my training in this. So I did my undergrad at the University of Alberta in the Department of Physiology and this is where I first got exposed to data, working with human data sets and I got interested in all the emerging technology. So I jumped across the other side of Canada to here in Toronto, just across the street to do my PhD in physiology. And it was a very interesting time because I was working with first trimester human placentas that at week five can be about the size of a piece. So when we started off, it was very difficult to study because we just didn't have the technology to gather information from such a small sample. Slowly our sequencing technology got better. I mean we were starting off with microarrays at this point and coming off the edge of people who were, you know, finishing a PhD thesis with one microarray we were bulking up to multiple microarrays but still having issues getting enough sensitivity. And near the end we were starting to get into single cell sequencing which was very interesting for us. So we actually have an image of a blastocyst. So this is very early human development. The little cluster of cells is the inner cell mass that will develop to become the baby. And the single cell layer on the outside is a trifectoderm that becomes this rich placenta that supports a baby during development. And this is an image of 45,000 single cells. So each dot is a single cell that we sequenced and then we recolored the plot according to cells that are expressing transcripts that are similar to each other. And this simply is just not possible to do without bound for Maddox. And I got interested in teaching and trying to help people manage their data at the end of my PhD and continue with the teaching. And I kind of took a different direction afterwards and I jumped across the street again to the School of Public Health to do a masters in bioethics. Because I was working with a lot of human data and we were becoming more interested in machine learning and thinking about how to shepherd these technologies forward. And I'm currently teaching at the University of Toronto, Mississauga in the Department of Biology and specifically helping to integrate more computational skills into biology courses. So that you don't have to take a bound for Maddox course you can just learn biology and integrate some of these concepts into your regular learning. And we already had our introduction with Vicki and Zoe and they will be with us for this two day workshop as well. So if you have any questions, feel free to just you can put your sticky on the corner of your laptops over here. And they'll be looking out for these stickies and they can walk over and help out. So, feel free when I'm going through my worksheets on our to just put up the stickies while I'm talking. I'd rather you stay with the lesson like don't wait for break in order to ask for help. We will be circulating and I don't have to stop talking only at a break we can take the small breaks along the way. We already did our introductions about you. I wanted to also talk briefly just about why are we interested in Bound for Maddox. And I'm sure each of you have your reasons for being here as you recognize that this is a very important skill. Maybe you have been interacting with some Bound for Matitions, some statisticians to work on the data and you want to have a hand in that yourself. I think it's a very good literacy skill to have as well currently seeing the landscape of how science is moving forward. So basically, we are increasing the amount of data that we can acquire the amount of data that we can store and the amount of data that we can handle with our analysis. So thinking even about patient records. Maybe some of you guys can sympathize but like 15 years ago when you went to go see a doctor and then you went somewhere else they wouldn't have your files, or you would have to take another blood test or take another x-ray. And now we are storing all of that information centrally so all of our patient files are getting really big. So in order for us to make sense of that we also need this increasing technology. We used to be very focused on numbers and sequencing and micro rays, quantifying things, but now we're even working with like photos and images. Those are very computationally expensive things to go through as well. I work a lot with genomes, and I find the Human Genome Project just fascinating as it began in the 1990s. It was proposed and it took 13 years to complete. It took billions of dollars. It was an international effort because we knew that one place wasn't going to have enough like storage to an analytical power to go through the whole genome. So maybe back in 1990, they were setting aside 3% of the annual budget to be going over ethical and social legal issues of resolving the human genome, knowing the large impact that resolving the genome would have on society. This is also one of the rare cases in science in which the project finished ahead of schedule, and they were able to publish in 2003. So billions of dollars international effort, more than a decade to do. But now you can send a sample across the street and get back an answer within hours, if not days, and it can, you can order online a 23 and me which is not exactly sequencing technology, but it is genome wide technology and you can get those kids for $99. So the accessibility of this technology is increasing so we are also have to ramp up our analytical power to be able to handle this data. I went through the pre workshop surveys and I saw that some of you do have some experience with our many of you. Maybe this might be the first time that you're encountering it. Don't worry you're in the right place. So as a brief introduction for our, it is an open source programming language. So it was developed more for a statistical with a statistical foundation with a focus for high quality reproducible statistics and visualizations. So that's why for this introduction workshop we are going to be focusing on managing our data, visualizing it with graphs to show some of the flexibility and the reproducibility aspect of it. And then also moving on to linear models because I imagine many of you will be working with continuous variables and quantitative variables. It was started by Robert gentlemen and Ross Hacker, and one of these gentlemen actually worked at 23 me as a director for a while and it truly shows the collaborative effort or the overlap in a lot of these spheres for programming and biology. Our generally as a beginner friendly language, as it has very clear grammar rules, and it is a little bit more human readable than some other programming languages out there. It's object oriented, meaning that we're going to store values into virtual objects that we can then transform or recall. It's customizable so even all of the pipelines that are out there everything's open source meaning that when we have a function that does an action, you can actually look behind the hood and see how it is doing that computation in case you want to customize it to your use. And that is different, as opposed to for example like Matlab, where it's proprietary so when you make a call, and it gives you an output you're not able to see the steps in between of how it brought your data to that final output, whereas our is transparent. Now, when we get into complicated stuff you don't really have to go line by line and understand everything, but at least as transparent for you and you can customize the steps in between, should you need to for your own use. One of the best things about ours that there's a very large user group, especially if this is your first time interacting with programming or are. Don't worry, try, try it out. I know it's very intimidating to see all these warning and error messages as you are starting out. But I can promise you that anything you do today, you were not the first one to try it. Any error message you have somebody else has done it before. It feels a little bit silly at first, but there's so much documentation on our online that you can find. So when you're working through it. I generally tell my students in the first year, you won't break our no way that hasn't been broken before. So they will be scary and intimidating going through these messages, but there is a lot of help available on the internet and even locally in Toronto, or at your own institutions for learning bioinformatics. So something else that we're using today is our studio and I know the complement between this can be a little bit confusing, but our studio is actually developed by a different group. And this is an integrated development environment, meaning this is how we're going to interact with the our language itself. So it is from the post-it universe. They develop a whole bunch of functions and packages for our, and it used to be known as tidy verse but they're really they're going under a transition to kind of make more universal branding. So they can execute both are in Python, which is another reason why they are moving away from the our moniker. And syntax, so it has built in syntax highlighting and code completion, meaning that it will help you write your code and help you find errors before you try and execute them, and you can manage multiple projects which is really handy so that if I'm working on my own project but collaborating with you, we can keep those projects separate so that the objects don't interact with each other. And another popular popular environment is Jupiter notebooks. So Jupiter notebooks like our studio can run both are in Python code. I get a lot of questions when I'm teaching intro classes of which is better are a Python, both have their own strengths. If you're trying to do you might find that one language has more packages that support the function of what you're trying to do. But I am more familiar with our I know some other people are more familiar with Python. Basically, anything you do can be achieved in both languages, but it's much quicker for me to write a few extra lines of code and are to achieve the task, then to learn how to do it in Python, and then do it with fewer lines. So what I would suggest is pick one language whichever one feels more comfortable or the one that maybe you have more people in your lab that are using so you can get more support and get comfortable with the language first. And you can always learn another one afterwards and it will be a little bit easier. Question. What is the blogging. Just saying another name of the language. So it used to be known as tidy verse so you might hear some tidy verse packages which we will be touching on today, but they are now getting known as posted. The reason why tidy versus a little bit distinct from base our packages is that they do play as with their own rules a little bit. So the way that their functions interact with other functions within their own packages is a little bit different. And we'll highlight that when we interact with them. So they play whatever is developed within the universe plays well with each other. And those rules might not apply to functions that aren't developed by this group of people. All right, so let's bring it together. What do these two have to do with each other. R is the language that we are trying to learn. And our studio is like our smart notebook of how we're going to interact with our itself. What I what I use as an analogy is that even when you're learning English, you still need a way of writing English so we might use Microsoft Word, for example, or a text pad. So we are going to be learning the language of our, and we're going to be writing our code within our studio. So you can still take what we're learning by when we're using the our language and write code in the Jupiter notebook. It's just another smart notebook kind of situation. So, but the fundamentals is that we're learning our but the method that we're using is our studio. We have any questions about how they interact. Okay, so we have a few workshop objectives. This is what we're going to be going over in the next two days. We're going to get used to how we talk in our and how we interact with the R studio interface, getting comfortable with the syntax or the grammar of our language. So it is essentially learning a new language. And at this point, I would also say, if you took as somebody who knows nothing about Spanish, if I took a two day workshop on Spanish. I wouldn't get very far and try to converse with someone who is Spanish right. So be kind to yourselves you are learning a new language in two days. You won't be able to be very fluent at the end but what I want you to be is comfortable and willing to try out new code. So we are going to understand data management so some tips on how you might want to collect your data, interact your data with our, we will be going over some visualization, carrying out repetitive tasks so programming strength is being able to do these repetitive tasks robustly and efficiently and minimizing the chance of typos. We're working with some continuous data for a linear regressions tomorrow, and just in general talking about principles of good code, or at least better code. And in the green, these are more like an official objectives. But what I want you to do. And what is going to be a little bit difficult at the start is getting used to typing with 100% accuracy in our casual conversations or when you're like messaging with your family, we're used to having auto correct support us or even when you're writing your thesis, you can go back afterwards and make the sentences more fluent, correct any of the typos or capitalizations. If this is your first time programming it will be frustrating that every mistake you make will stop you immediately, and you're going to have to find that error before you move on. It can be very intimidating getting getting a lot of these red messages, but it is a, it will take a little bit of time for you to get used to being able to scan through your code and noticing the common areas that you might make. We are going to be learning how to interpret warning messages and error messages. So as a heads up when you see this mess message being returned to you and telling you that the code might not have run properly. The messages are are telling you, I've tried I've done what you've told me to do, but this might not be as you intended so double check that what you did worked. So what so warning messages mean that are actually did something, whereas error messages are no, I can't proceed, I don't know what to do with it. So if our gives you a warning message, that might be okay, it's just telling you you might want to double check what I've done, and see if what I did is what you intended for me to do. So get your red messages on our don't freak out. We can work through it and sometimes we might not even need to resolve it it's just trying to tell you something. And the last objective is feel comfortable and encouraged to explore our own programming. If this is your first time programming again, it is a little bit of an uphill climb, learning something very, very different. But feel free to tell me to slow down a little bit or repeat a concept show it another way. Also keep an open mind that what I'm showing you is one way to solve a problem code is very redundant there's many ways to do the same task. So if you have if you think of another way to do it, you can share it with me and we can talk about whether that would be great, or some some other way of handling it. I mentioned before this is going to be a very hands on coding workshop. So if you are following along fine, you can keep that green sticky on the top corner of your laptop, and that will let me know that you are doing okay. And if you have any questions or you're encountering an error message that you can complete, then you can also put your red stickies on the top corner. Again, feel free to do that at any point throughout the workshop. So this is going to be our break. As Vicki and Zoe will be monitoring and like scanning the room to see if there's any of these flags up. When we're doing exercises, we will also explicitly tell you to put your flags down, and then put up the green one when you're finished so that I can get a survey of how you guys are progressing through it. This is a bit of an introduction on our first module which is getting to know our. So here's a list of the learning objectives. So at this time we can check in. How are you guys feeling about learning are maybe give me like a thumbs up with your excited if you're like a little bit nervous. All right. So, on the foundations of this are is a very highly powered calculator you can use it to just do simple calculations, it can conduct mathematical operations. So what I have circled in the blue over here or what I'm showing you right now is actually some output from our what I'm showing you in the blue circles. These are called prompts. That's the indication that it's a line that's ready to accept code. So when you're writing code you always want to start at a prompt if you don't see a prompt there. That might mean that you're in the middle of a multi line code and we need to fix that. So sometimes if you're writing multi line codes in our studio, it will change into a plus sign. But if your code's not executing one of the things you want to take a look at is trying to find that prompt. So what we have here is the prompt indicating that we're writing code. If we write two plus two and then press enter, the output is going to come out underneath. So the number one is saying this is a first item of your output, and the result is for so it's just a computational code or mathematical operation that it's running. As I've mentioned that our is an object oriented language, it actually comes pre installed with some common objects. So here we have again another prompt so it's accepting a new line of code. If I type PI for PI and press enter, it'll give me the value of PI. So here, the value of PI is stored in an object called PI. This is something to keep aware of because you don't want to be overriding some of these common things so don't save an object as PI because then you'll overwrite the original one. It's not the end of the world if you do, but just get out of the habit of saving it over these common names. Same thing as like functions, we do have a sine cosine function that's in there. So when you're making a new function, try not use these commonly used names. And the strength of these objects is that we can actually operate on them. So if I do PI times two with a little star in between, it will give me the output of the value in the object multiplied by two. And we don't only have to use a pre installed objects, we can create our own and that definitely is a strength of it. So here we have another prompt. The object that we're trying to create as a single letter X. And this is called the put arrow, because I'm putting the value of 123 into the object called X. So you'll hear me refer to this put arrow and it's this the two keys of the little arrow in the hyphen so you can type out that put arrow. It's not like a single item itself. So it's two characters that make this little put arrow. And you'll notice that actually after you do this call, there's not an output that is displayed for you. So all you're asking in this first line of code is put this value in this object and ours like I did it, but it doesn't return it back to you so you have to do another call afterwards, you give it the object name and then you return what is stored within. So it's always a good habit to check what is in the object after you stored it in. Or if you're doing a transformation acting on the object, never assume that ours thinking the same way that you're thinking. It's a good habits double check and print it out. So again, once we've made our custom objects, we can act on it as well. So if we have X minus two, it will take two out of that value. Oh, so this is the first line of the output. So I can ask it to do multiple things. So for example if I store in an object later the values 123 and then times it by two. Then it's going to give me three outputs for each of those values but here is simple one. Good question. If you're answering that question, I'm just going to say this a little bit early, but anytime you see these square brackets, that's an indication of position. Yeah, so it's saying in the first position of our output. This is a value. And you might have heard me talk about functions before so functions carry out actions on these objects. So functions are characterized by these round brackets. So here we have an object, we're going to make an object called prime numbers. We're using this put arrow again. So we're going to put everything on the right side of this arrow into the object on the left side of the arrow. So this is going to be the object we make. This is going to be the value that's stored within it. So C is a very basic function that stands for combine or concatenate. So C is going to take everything within this bracket and store it into this object called prime numbers. When we're making a new object we don't have to make it first, since the object prime numbers doesn't exist yet it will automatically create it at the same time it's storing the value within. So here's within this first clip that we have our function named C, and then the start of our round bracket, and then a bunch of values so every time you see round brackets, the name of the function comes before the round bracket, anything the function acts on is within the round brackets. So the items that the functions are working on are separated by comma. That's how it knows that I'm trying to add the value one and three, not 13. Okay. You'll notice that I space a lot between my values over here so I have one comma space three are is actually very forgiving of spaces compared to some other languages. So if you didn't have your space over here between the comma and the three. That's okay, it is a bit of a personal preference, although you shouldn't have a, I think convention wise you shouldn't have a space between the C and the bracket. So it's not preferred to have a space between the function name and the start of the round brackets since they go together. But whether, like how you space your code is a bit of a personal preference. And in case you are working on code with multiple collaborators within your sphere. It is worth it to sit down and talk about coding practices before you get started, or you might drive each other crazy by correcting each other's personal preferences. So just as a review what we're looking at over here, we have a put arrow. So we're going to store the value on the right into the object on the left. And it stands for combine. So I'm going to combine all of these numbers that are that are separated by commas into this object called prime numbers. It doesn't give me an output because I'm just storing the value. And then afterwards we have a prime number object that will give us a list of all the outputs. So here, we're only acting on one object. So it's giving us telling us in position one prime number has all of these values. If I tell it to return the value of multiple objects then that's when we would get different numbers. Some other very basic functions that come pre installed in our class. So if I run the class function on the object prime numbers it will tell me it's numeric because we're holding on to numbers. If you do class on a vector that is storing words that will come out as characters. And we can also do very basic statistics here where we have a function called mean. And we know that that is a function because it's right before a round bracket, and it's acting on the object prime numbers that's within it. So a common mistake that happens over here, watch out to make sure that you've closed your round brackets they should always come in pairs. If you do forget to close one of the round brackets over here and press enter, you won't see the little prompt arrow. So that's an indication. Other things are just simple typos. So for example here we have an object called prime numbers. So if you're trying to run mean on an object called prime number without the s, or if you didn't capitalize the end, that would not go through. So our is sensitive to cases, and your object names need to be exact. So watch out because this can come back with an error saying object not found double check that you've typed it out correctly. So make sure because if you do make a mistake and then you make one object with the s and one object without the s it can get very confusing in your workflow. All right, we have any questions about this slide, starting out with functions. Okay, we will do all of this code again once we get into our bum just giving a little bit of theory before we get started. I mentioned before has a very large user base, meaning that there's a lot of tools that are already available out there. So consider this if you have a brand new phone. If you wanted to do something such as play mine sweepers, you wouldn't get on your computer and start coding up a game of mine sweepers. You know that somebody else has already done that. That is not something unique to yourself. So you might open up the app store. You might download the minesweeper app, but then every time you use it you would still need to open it and bring it up to the surface. That is the same for our. So we have the CRAN network, the comprehensive our archive network. This is where our lives as a language so it's maintained by some researchers at any updates are published by this CRAN network. So this is a big library of functions and packages that are available to you. And as many of you are working at in biological data sets, you might also encounter bioconductor. So this is a separate repository can think of it as like a separate app store that focuses on tools for biological data. So this is where you would find the packages. So the installation of them is a little bit different. And there are many different repositories out there. But they're all open source and free for you to use. So if you were trying to say analyze the microwave data. Don't start from scratch trying to write up code for importing the data, making a plot and such. These packages exist already. You'll have to access one of these repositories, download the package onto your device, and then bring them open and activate the library when you're working with them. So this is a common step that again is is missed when you're just starting out every time you open up your studio. It is a new environment. It doesn't really have a long term memory, unless you do save it explicitly. If you save, you're saving the code that you've written, but saying for what we were talking about earlier if I have an object called x, if I have an object called prime numbers. If I make that, and then I close my R studio and open up again, the code to create those objects is still going to be there if you saved it correctly, but the objects won't you'll have to rerun your code to generate those objects again. What you might not found is very frustrating error message when you get started. There's many ways that that can occur, but it is definitely troubleshootable. All right, and we're we're about to get into our studio itself. Our studio has four different panels. This I think is a default layout. But once you get settled in our studio, you can actually go in there and customize it to your liking. You can change it to a dark background. You can move around these panels and you don't always need all of them open as well. Our studio is great that it has integrated some of the things that we need to do with clickable buttons so rather than having to go through a file pass in case you're not used to writing like terminal code, you can click through your folder directories. We will go over this again when we all move into our studio, but this console is typically the brain. This is where the computations are being made. So if you open our by itself, it will be the console, you can go into terminal and open our and you will only have the console. The other three panels is just kind of bonus from our studio to help you write your code easier. But fundamentally the computations are happening on the console. However, if you are typing only in the console, say you wrote 50 lines of code. The next day, if you open up just a console, it's not saved there. That's why we have the script and the notebook. So we're going to be crafting our code in a script. We're going to be using a markdown notebook today. But this is our record. So we write our code here and then we send code to the console to run so we can save this document so that we have a nice workflow. When we visited tomorrow, we can go back and continue from where we started. This panel down here help plots and more. It is again more supporting documentation so if you write code in the script to generate a plot. If you press run, it will move it over to the console, run the code, and then your plot will pop up in this other panel, and the help is linked up to the internet. And if you have any questions about how any of the functions work, you can look for built in help in your, excuse me, our studio very handy use it all the time. The environment is kind of your bird's eye view of what our has in its memory right now, and what it's working on.