 So all I'm going to show you now before lunchtime is very simple things just how do you read in your FCS files and You know just looking at what they look like inside R and Some simple visualizations your data So I'm going to try to take it slow for this first first hour and a half and I guess I already have everything open here But just if you go to file open file, and then you just navigate to documents workshop and open up the file that you're Did you get module 2 it's on the week So open up our studio Yeah, just leave the code There's no No, they're already in the virtual machine Okay, so now open up this one And you should see something like this is everybody have this Yes, good Okay, so Some of you have used our before some of you haven't really too much used it maybe just a little It's really just like a ginormous calculator. It's nothing to be afraid of it's not necessarily It can be a program language, but it's really not necessary for you to be a programmer or anything like that to use it So I just want you guys to get a little bit comfortable with with the things that are done So for example, so first of all, this is how you can run the code without having to copy and paste it is just scroll down to the first line of the code like this and You can press control enter and How does that work? Great and now do you see how here in the bottom? This is the console. This is where the code actually gets executed It basically copy and pasted it for you So you can go line by line and just do control enter. That's how we execute lines So this first thing is just us everybody is familiar with the term variable and Assigning a value to the variable. So now x is a variable and it has the value 5 Now why is the variable and has a value 10 and what you can do is any kind of calculator type operation with it Everybody's comfortable with this so far good Now what we're going to be working with is vectors and matrices. So I'm just going to give you a little Example of what this is what a vector is. It's basically a bunch of values Contaminated together. So x is now a vector if you just type x and press control enter to execute It prints out the values that x contains This is another way of defining a vector so one way is you just type in the values that You want the vector to have another one is let's say you want to come up with a really long vector With the values from one to one hundred. You're not going to want to type one two three four. So you can do sequence Which now when you print why Sequence from one two three by one if you press the up arrow It will show you the previous commands that have been entered in the console You can change this to be two one hundred And now when you print why? There's all those numbers This is gonna come in handy in our programming and I just want you to be comfortable with that what this is. It's It's nothing scary So far so good Now when you encounter a term that you don't you haven't seen before There's actually really good help in our and the way that you get help is you just put a question mark in front Of the term that you're searching for and so control enter on that line And over here suddenly the help menu opens up a whole thing for you and you can read about what is the matrix object? How to create a matrix object? Typically, this is where I would sort of describe an overview of what you're asking for help with and then Typical usage it will show you know, this is how you call this function Some details about the parameters and probably the thing you're most interested in is the examples Down here So here are some really fancy matrix that someone made that don't worry about understanding this. It is a quite fancy matrix built Here's one way that you build a matrix You basically give it all of the values that you want in it You tell it and row means number of rows. How many rows you want the matrix to have number of columns? So let's print out a and see what a looks like Does that make sense to everybody? Yeah Another way to create a matrix is by how if you have two vectors of the same size you can just Put them together into a matrix instead. So our bind means Row bind X and Y and I guess I didn't execute these things properly It gave me that warning. It didn't give it to you maybe because I made y to be a hundred instead of three So there you go. So you can just Bind the two vectors by row row wise and you get this this matrix now you can perform operations on matrices notice how a is a Two by three matrix and so is this new matrix that I made it's also two by three if I add the two of them together It just adds each entry. So this entry the first entry with this entry is two Right, so it just does element wise operations Now this is where we're gonna be using this kind of notation a lot. So I want to just go over it quickly you can Index the matrix to give you specific entries The first value put inside here is the row number and this is the column number No, that's just for Freediness Readability it's not Python. There's no white space limitation So here's a the first row The third column is the number three. That's what I got right if I want the whole row I just leave the Column number empty and there's no need to get space there This is the whole second row of a Here's the whole first column of a Notice that it prints it, you know horizontally, but but it is the first column So does everybody is everybody comfortable with a matrix? Yeah, I know it seems like extremely simple useless stuff that I'm showing you But we will be using this so much that if you're not comfortable with this you're gonna get extremely confused That's why I'm just going over it Another thing that we're gonna be using is What are called lists and they're exactly what they sound like it's just a list of things and A matrix really is sort of like a list, but really you can only put numbers in it In order for it to make sense here in the list you can put anything you want So here's I define this silly object called my list and You can give names to the objects that you have here. I put the object X that I have before the vector I gave another name to the second entry and my second entry was my original vector Y Now if I want to only access the first entry of my list Remember how with a vector you can subset it by just putting one One of those square brackets and you get the first entry or the second entry with a list It's just double square brackets just because it's like inception going a little bit deeper But what if I what if I have a list of a hundred things and they're all maybe they're all FCS samples For example, and they all have names and I want to get the one dots for Bob But I don't remember if it's like the 50th or the 51st. I want to get Bob's FCS file So you can also subset a list you can recall an entry from a list by calling it by its name So that's where I run was saying annotation comes in very useful because if you have it all correct You could have a list of FCS files that are some of them are named control some of them are named stained This makes it easier to to access things by name as opposed to by index number What if you try to access the third entry of the list? Error Doesn't exist. That's right So our will give you fairly decent error messages try to I Mean subscript out of bounds like this is our subscript, right? You know that this is a simple example. It's clear that we only put two things in it We're trying to get the third one. There is no third one. There's an error So that's what Ryan was saying, you know, there's gonna be a bug, but you can debug it It's right there in front of you. It may be less obvious in some other cases, but What else can you do with a list? You can get the length of a list? So It's intuitive that this function exists, but I'm showing it to you that that it's there You can also get the length of a vector. The vector has three entries a Matrix has two dimensions so you can get the dimensions What happens if you type length a It has six entries in total. That's not very informative. I would rather know the dimensions of the matrix So is everybody okay with this? Substating by index or by name thing Okay, so that was Stuff that you probably are already familiar with I didn't know exactly what you've been doing with our before So I just wanted to make sure we have the basics Here's some more advanced functionality of R so I Can generate a 20 random numbers Using the function R norm if if you were to be looking at through one of those Bioconductor package vignettes where they're talking about how this is how you use the package First you do this and then you do that and they're gonna have some code There's going to be a function in there. They didn't in detail describe for example They'll never explain to you what length of a matrix or whatever it is, right? You're gonna assume you have some understanding So if you see a function that they're using and it really doesn't make sense to what it's doing and why just do a question mark our Norm what it what is that? What are they doing? Oh? There it's it's they're just basically Finding go to the details read read read and they're basically just giving you random Numbers generated from a normal distribution with mean zero and standard deviation one so you know that it's I Actually didn't know this before for some I thought I had to look it up myself to them like what's the R for random? So let's print a So there's 20 random numbers generated from a normal distribution now What I want what we're going to be working quite a bit with is which How many people here have used which before? excellent Very good. Yeah, bro Which Entries of a are greater than zero. That's what that statement says and what is where is this three me? Does anyone know? Me three how does This is the position in a the third entry is greater than zero That's what you asked. All right, which entries of a are bigger than zero The third one the fourth one you guys will have different random numbers, right? So you will not see this exact same thing in fact We can assign this to a variable Pause numb, okay, must I am? So I want to double check that this is correct. So I'm going to subset the third entry Yes, it's positive. I Don't want to subset each and every one. I can actually subset all of them like this Does this make sense? Or is it too fast? sense good, so Yeah, the or instead of assigning it to a to a new variable You can actually directly just type it in one line a which a are greater than zero That's a way to get all of the positive entries of a out You can do the same for let's also check which ones are less than negative one The 7th 13th and 15th and 19th Here yeah, we are sub-setting a which is a vector, right? So if I do a Subset the third entry If you do a round brackets third entry is going to be like a is not a function Remember how length is a function and you put round round brackets is when you are calling a function Squares when you're sub-setting like you're Subscripting or whatever So let's say that for whatever reason you're interested in all the positive entries and all the entries less than negative one How would that ever relate to anything that we we ever do would be? Let's say I'm not a is actually the expression values of the CD3 stain and I want to get all of the entries that are greater than my threshold value for my gate, right? So I'm gonna be doing something very very similar. This is why I'm going over this example So let's say you want to get all the ones of greater than zero and all less than negative one So if I intersect those two I Get this integer zero thing and it means that they in their intersection is empty So here's a set of indices which ones are positive a set of indices which are less than negative one What is the intersection well none right you can't have a positive number that's less than negative one So I'm not going to intersect them. I'm gonna take their union instead And here's how you do that. You just type union. It's a function. You can actually look it up if you want Union x comma y that's that's how it works where x and y are vectors of the of the same type So they're both vectors of numbers So here's one set of numbers I have This is my the indices of the entries of a which are positive and there's another one I have the ones that are let negative one and I'm gonna combine them now when you print combined Remember these were all my positive entries. These are all the ones less than negative one that I had How many do I have 13? So when we're later when we're doing gating and you want to a imagine a is an express a list of expression values for the CD3 stain and you get all of the ones that are greater than five because your gate is five You can take you can do this thing see what the length is of that Let's say it's a hundred and ten cells are greater than five You know the whole length of your original all of your cells is let's say you start over two minute cells Then you can do a hundred and ten of them are greater than five So my proportion of C3 positive cells is a hundred and ten divided by two hundred Right, so that's sort of where we're going with this. So is that good so far? Okay, let's plot a so notice here. I put the colors red This is just one of the many many plotting options that you can that are gives you and I'm gonna be showcasing just a few of them here and there as we go because it can get quite complicated It's actually our has amazing visualization capabilities And if ever you come across a time when you're using this and you want to publish paper And you want a really good-looking graphic. There's a Just Google our plots and there's this website where people have submitted all these really good-looking plots and the code for them So this is just a simple dot plot. So what what do we have on the x-axis is just the index of a so One would be the first entry of a mine happened to be negative. Let's see Negative point eight right there it is. So it's just a simple dot plot nothing fancy Let's not plot the density of a Remember how a we took from a random normal distribution This sort of looks like a normal distribution and it's got a little kink here That's yours yours will look different. You have different numbers That's because with just 20 numbers you can't really reproduce the whole density distribution of normal distribution Hist of a produces a histogram plot You guys will obviously have seen this in fcs express or flojo, I guess I Excel there you go. You can do Excel it are So you're so you're familiar with what this plot looks like what if you do his day What if we see what? What his actually does look at all the possible parameters you can add To the call to the function hissed You can actually It this is a lot of information it will take you a long time to actually read through and understand But one thing you can do is add this number Afterwards which will give you 15 try to break it into 15 bars your data if you give it 20 It you you only really have 20 numbers so you can't really go that high So these are just some of the very basic plotting functions. Let's take a look at this plotting region here You can by the way, I don't know if you noticed that I have my This tab here closed so that my plots will be a little bit bigger You can just close it and open it if you need to this basically stores a history of all the Commands you have been entering in your r studio session You take a look at this plot you can actually go back and it will take a Second for some of you that helps for computers and see what you've been plotting You know if you forgot or plotted something before you looked at the one that was there You can also go to export and you can save your plot as an image or as a PDF So if you make one that's really pretty you can say let's Does that everybody have that everybody's good good? So we got through plotting a dot plot a density plot a histogram You can directly plot something by Creating this is basically creating a random normal distribution of a thousand entries I'm taking its density And plotting it So you can nest functions within functions or will first generate the innermost thing, right? It's just like math, you know the brackets it's going to do the thing inside the innermost brackets first Then it's going to go I guess it is a lot like inception on that you mentioned. Oh my gosh Okay, there you go Okay, so are we okay on this stuff everybody's fairly comfortable with this it wasn't too insane or anything good Let's now go to flow cytometer stuff Stuff you care about First of all, how do you read a float on fcs file? No documentation it says That's weird Why would I do that? What if you do two question marks like really really no documentation suddenly and bring something out? So like Ryan was saying because there's so many packages out there R doesn't want to just load all of them up because your computer's gonna be really slow You only load the functionality that you need as you go basically So the first time I asked asked how to read dot fcs. What is that? It didn't find it because I hadn't loaded the relevant package But because it's installed on my computer r is able to like Okay, let me look into it further for you if you really need to know And it actually went through the help files of all the on the packages. I haven't brought into r yet And it found this stuff here This stuff before these two columns. That's the name of your package So it's bringing up matches to read dot fcs. And then there's this read dot fcs header I don't know what that is. I really wanted to just read an xs. So click on that And look up here. It tells you the package name This is the thing you need to load into r in order to be able to access this functionality So the function is in that package Exactly It's only after you read that package that you're going to be able to use that as a function Other functions such as length that we used that one was like the basic r stuff Like it's already loaded because it can need to be able to start somewhere So Let's do that. This is how you load a library Or you that this is how you load a package into our intelligent from now on I want you to load this because I'm going to be using it And some red stuff Happens this red stuff. It's ignore it Yes, then you will see a big error right here somewhere and you will notice it And then that will be clear. What did you and that how would you address that? Let's say that you get an error What what would you do? Google Yes Very good The good thing about is that it's true. There's so many errors that people have definitely experienced You're not believing you're not for someone to out there, you know, you're not that special So someone has already Answered it online Okay, so now we have our Package loaded. We're going to be reading in an fcs file. That's stored on the computer, right? So I need to tell our what directory it's stored in so I need to know I need our needs to know What directory I'm going going to be working with Get wg stands for get working directory Where does our think we are at now like what folders are by default looking at It's worth looking at just the It's kind of like if you have a windows machine. So this would be like your user slash Armstrong whatever your username is It's not like in my documents or anything like that. So let's tell it where we want it to be Set working directory. This is the function And you give it the exact path on your machine where your data is stored Didn't tell you anything. It just did it Now what if I do get working directory again? Now we're there directory It lists all the things that are in that folder Why don't we open up the folder and just make sure we are on the same page? So open up the folder Icon go to documents workshop Data Okay, so You can see down here a little bit. It's telling you that there's this stuff in there This has a dot on it. It's a hidden file so Actually, the first three files we're going to work with are in the folder full fcs Let's check out what's in there those three fcs files So these are the kinds of files you guys would be reading in So this is how you read a file you say read dot fcs and you give it the actual path to the File relative to the current directory. You're in right now. I'm in the data folder within that folder I have full fcs folder slash the file name So execute that it's going to take How's it going? good Is it loaded or? Okay, good excellent excellent So there is a warning message there and it's basically As long as it's not an error it's safe to continue And then read it and if it's something obvious that there's you know something you should do about it Do do something about it. But in this case, it's kind of there's some Things where people didn't save their fcs file exactly like the package expects it to be safe There's like maybe they're missing some some of the metadata that ryan was talking about they haven't saved You know, what is the sartometer name that you're used to acquire these files? So it's okay. We can still work with the data. It's just letting you know It's not this is not like the perfect file ever so Let's print f So this is what an fcs file looks like inside of r It's this object f It's called a flow frame object Remember how before we were working with vectors and matrices and lists? Those are objects. This is a flow frame object. So it's just a different type of object What do we know about the files from reading this? One piece of information that we know about the file just from looking at this pbrae Do we know how many cells were acquired? Yes We have 65 000 cells We know what channels were recorded forward scatter area height side scatter area All of these other ones that you know, someone did a good job of annotating them and putting all of the Antibody and whatever other things are called in biology. I'm not a biologist or a mathematician So if I say something that sounds funny to you guys, it's because of that so This basically contains sort of the overview of what the fcs file is What kinds of things can we do with with with this object? Remember how with the vector we could see the length of it how many entries it has With this It's number of rows So here I kind of treats every single cell that passes through the full cytometer as one row of data Full of let's look at what how what it looks like Exactly, uh, there's only one five because sometimes there's A way that you can save actually two flow frames within one like it's it's silly, but Yeah, it looks fairly useless. So don't It you shouldn't see any reason to ever use length on a flow frame, right? And roll is the one that you should probably be using Let's see what else we can do The column names so it's almost like like a matrix, you know, it has rows and columns The rows there's 65,000 of them. So each cell has one row the columns. There's Those are just the channels, you know that we're measuring So it looks like each cell is one row and each of those entries in the row is just the measurement of the channel for that cell How do we get that big how do we see that matrix though? It looks like a matrix, but how do we get it? It's through the function Express it's the expression values stored in it So I'm going to put this into a matrix And I'm going to take its dimension and surely enough it has 65,000 rows and 16 columns I'm not going to be printing e because it's going to be really really long It's going to take up all of your you're not going to be able to see what's in it But remember how we can subset things and In matrices like we can get the first row third column kind of thing and remember What what does this do if I just do one to ten like one column ten? It gives you all the numbers From one to ten so let's say I want to look at the first ten cells in In this matrix Does it show you the same thing? Yeah Yeah, um because now I'm going to be able to work with this as a matrix So I'm going to be able to add another matrix to it. I'm going to be able to subset certain rows and columns Um Whereas just f is not a matrix. It's a flow frame It just so happens that they did create this function for the flow frame that has the same functionality as the matrix Yeah, so someone was very smart at making this package Is it you? No No, no, no, this is a Extremely involved package I want to say coding is behind the FCS function Don't put brackets. Just type it in and press enter and a lot of times This is the code. I did not write this right like it's Code I don't think you're too interested in understanding. I don't think it should be something you go through, right? um So this is the effort that people go through when they write a package, right? Just so that you guys can do dim and it will give you the number of rows and parameters and stuff Uh, but a lot of times actually There are some functions that you can print out like this and see and maybe there's some functionality Within that you want to copy and change a little bit for your case. That's a totally great idea A lot of them when they're actually kind of hidden a little bit. They don't show you every little thing they do um Just because it was too confusing, but you're welcome to give it a try So let's look at let's just look at the very first The very first row of the matrix e that contains our expression values So what does this mean the very first cell that went through the cytometer And the lasers hit and blah blah detector something And some stuff was measured and here's the stuff that was measured. We had 27,700 units I'm using my units, right of uh the for the forward scatter Area and then this is for the height and the side scatter And I had the the value 19 or 1,984 for this channel the beautify one five channel And 625 for the r780 channel, which was the city the one measuring city three And so on does this make sense? So if I wanted to get what was the first cell, how would you get what was the first cells? Side scatter area measurement How would you get that? Yes or If you didn't want to be like one two three Yeah Like this right In quotations. Yes Why do we have the quotations? None In some programming languages it matters and this one not so much So why why couldn't I just type? You know, why did I need the quotations? Because if I type x that's my variable if I type x that's just a letter x, right? So if I didn't have the quotations, it's going to be looking for some variable ssc Minus some other variable a right. It doesn't make sense. So we had to have the quotations So So you might be able to think of when I actually It's not a big deal if there is a hyphen. It's just as long as everything's in the quotations As long as r knows that not to try to access it as a variable Okay, so now we know how to access, you know some cell like maybe the first Why don't we look at the first 10 cells? The side scatter area measurements So the first 10 cells is what they look like Remember how we use that function which to see you know our normal random variable which ones are positive Which ones are negative we can we can do that now with with this So what does this line say? All of the cells Yep, this is how many cells I have in total, right? So if I wanted to I could now divide by It's going to tell me that in fact here's a really complicated one 10% of my cells have side scatter greater than 500 So there Now we've used the thing we learned earlier Finally Um Or yeah, or debris or whatever. I don't know. Yeah, I'm sure that's So this is so e is basically the matrix We're going to be essentially working with for most of the stuff We're doing because it's the one that has really the important information all the channel measurements about the cells that we really care about um, but like ryan said there's a Huge importance to the annotation of of the data So here's when you type here in the console f And you do the at sign And you press the tab key f is a flow frame object. Remember it's very Computer science see how what that means object. It means that This flow frame object. It has those expression values that you see here, you know, the ones with all the numbers in it that we actually care about It also has these parameters Whatever that is. Let's see what it is Okay, that wasn't super informative. It says some other object now When you encounter a scenario like this where you try to access something of the object and it tells you it's another object It is inception. Oh my god Do another at and another tab Play around with these things print them out See what they look like and then tell me in your opinion, which one is the most informative one If you do it in a call in uh, just a Terminal it will also It won't bring down such a nice menu like this, but it will print out all of the available things you could do Yeah Yeah, our studio just makes it a little bit more user friendly. Did you find something? You know It's an accessor symbol It's just like, you know, the square brackets we use to Subset a vector, you know, take the third entry. Well, this isn't a vector. It's way more complicated So we're going to use a fancier symbol than the square bracket It's actually what you can do with that. Yeah Because so it doesn't do anything feel really just no it just It's a way to access a thing that's in it. Yeah, it's just like using the square brackets for matrix or Yeah So this kind of looks a little bit interesting, doesn't it the parameters and then data It's It's pretty much when you print f. That's what it prints, right? But Let me just go back to my code. That's already kind of written This actually actually let's do this. Let's assign this to be d This is d, right? This is actually a matrix So you can actually go d one one the very first entry of d Is this Let's take only the first row of d It gives you okay, this is This is parameter data, right? So it's information about the parameters We're working with it tells me that this is the name of it is forward scatter area This is the the range of the cytometer how what it can measure what kind of values And this is what the actual the actual values are within it 23 406 is the minimum forward scatter That a cell had in this data set To me that tells me that chances are someone kind of removed the degree on the fly, you know and whatever D bar or whatever it is when you require the file And the maximum forward scatter value is 262 206 so When you plot this you expect that there's going to be like A significant amount of cells just chopped off in the lower range below 23 000 Forward scatter and there's going to be a bunch of cells and some like fly away cells up to 262 000 What if I Get the last one So this one was this is the channel name. This is what it's measuring city 127 expression This is the the range of the values and the minimum value is this It's negative So it's on a logarithmic scale, so it's fascinating values So this is just information that you if someone gives you a file, you know You want to know is it or not like, you know how some of them only go up to 1024 and forward scatter some of them go to 260 000 This is information you want to know We're not there yet. It's coming up though Yeah, there is a way to tell yes Here's something else F adds description And it actually has you can scroll through all these things. These are all metadata That's attached to the fcs file when you're you know running it on the spectrometer and saving it as an fcs file All these things automatically get saved So it seems like a lot of gibberish And a lot of gibberish in fact a lot of it, but there's actually some very useful things for example this Keyword is called fill really it's like file name And this is the file name if you recall of the file that we just read in So that's nice that it's saved somewhere date July 17 2007 So that's useful to know You know when was this file acquired if you do do any kind of analysis and you find that There's a big difference between the two groups, but then you look at the dates and see okay Well, these were all done before 2008 these were all done after 2008 they probably changed their laser or something So that's one One way we could we could try to address that issue quality control The op the operator was administrators on the person didn't really enter their name We can't go after them for doing something wrong if they did Something about Yeah, yeah So all these keywords we're finding are Keywords There's also something interesting here survival time from whatever serial conversion whatever that is 63 That could be useful Maybe so this That's what Ryan said this is a custom keyword this the Your computer when you're acquiring your your cells is not going to ask you to enter this by default But you can add this like Entry that you require people when they're doing this analysis to always make sure they fully annotate the data You know at acquisition So sometimes there's gonna be in your fcs there Most of a lot of time has to be in an excel spreadsheet or something Yeah, that's absolutely valid. So actually what what's what's I've done a few times It's like or like Ryan said annotation is so important and so many issues arise from poor annotation to the point where I'm not aware that there's anything wrong with the annotation The person giving me the data didn't really know that they made a typo somewhere in the excel spreadsheet that they've given me So then I can't See anything interesting in their data and I can't figure out why And let's say that they give me an excel sheet For this data with the file name And the survival time And so, you know, I'm reading in the file names, you know I'm reading in the files and then I'm looking at the excel spreadsheet for the survival time But it's also inside of the fcs file So that's the way I can do quality checking I can check this file according to itself It's 63 days survival, but in the excel spreadsheet it says 75 days That's happened to me where It was 10,000 fcs files mouse data collected over seven years, I think and They gave me an excel spreadsheet with 10,000 lines in it. So What happened was, you know, different people were pasting like different parts of the excel spreadsheet Somewhere a row got shifted down one and so everything was incorrect Everything else from some point on was incorrect off by one row. But, you know, it's not obvious So but something was off So I had to go into the fcs file description and try to find something that I could do a quality check with You know, for example, this this would be a good Good quality check because I I expect it to be slightly different between files And it's something I can double check with the excel spreadsheet that I'm I've been given I think it was an error that they thought by one was probably the most common Yeah, and it's And everything's wrong, right? Just being off by one means everything's incorrect As much as you can, yeah, because you don't want to be wasting your time analyzing data that is, you know, was annotated wrong At the time of the experiment, the acquisition would be very time consuming I mean, especially when you have to do panels It would be 10 times more time consuming After the facts, right? If you make a mistake In the excel spreadsheet, you're not saying when you have to do it. So pretty, somewhere there's lots of video So that they were really careful on any 10 rcs file during that position Get that that's not the most fun time So there's an error there's an error there as well. Yeah, that's the outbreak. Yeah, so so as long as you would assume this better than never Um, but you would excel is fine. You don't necessarily do that, but you should put it all stretch. It's really negative That's fine. This is just trying to get information on some of the sys there Yeah, actually, so most of these things are automatically entered by the computer. You're not entering everything of this The only thing you are entering is maybe your name when you're requiring the files and maybe this number And maybe like maybe one other thing there's in in here. That's custom But all of these are otherwise automatically The cytometer sends that information to the computer. It knows how to save this for you It's just like I don't know if you've ever seen this where you take a Picture with a really fancy camera and then you upload it and then it tells you this was taken at each other's speed this and Camera that and lens that you know, somehow it knows, right? You're not entering that information So it's the same here, but you can enter additional information that will help you Um, really the best place to do this is somewhere else. Yeah, it's much easier to check and much easier to fix So I do all that explanation I'll be talking on Tomorrow about where the place you might want to exit that and there's much better place than you obviously just had Yeah I guess but that's true the problem that Could happen is when when someone's entering the other information into the excel spreadsheet And shifts it by one roll, you know, then that doesn't really help it So you you always try to just read through the description that you have You may be able to find some keyword that you could match it up for for quality purposes when you have really large sets of data Definitely, yes, absolutely. Yeah It's just a matter of catching that Enter the annotation if I get one change is don't make your file names meaningful Yeah, we've run into problems again and again because people try to make their file names mean something And we've run into a lot of problems Here's the example Sure, like if if you have you know File file one underscore B cell underscore S dot fcs file to underscore t cell blah blah blah Then suddenly the t cell has a dash in the middle. It's capitalized not capitalized. There's a comma I don't know where so I'm searching for the word B cell contained within the file name And I'm getting some but not some others because they were misspelled There's B cells spelled B C L E L, you know, someone accidentally misspelled it. So it's Those kinds of things We've seen this a thousand times. You think you can tell people the biggest absolutely Full stop the biggest problem that we've had is people not following rules And so you think you're gonna be okay. We make up these rules Doesn't work and so it's very hard to fix it when it's in the file name It's much easier to fix it when it's in some metadata description And so we Our experience has taught us that annotation is best done separately from the data. So one for your data Which the fcs values separate the metadata which is everything else that that's just based on experience If I work for you, it does great So working with people like so and this works great. So this is a different way. So this We're trying to do a lot of stuff on me, right? And I'm trying to teach you things that can help you do an idea way Some things kind of work in a folder world That doesn't work in the other it really doesn't work in the idea world And if you're gonna go down this road after these two days Um, yeah, it's the kind of being the biggest problem. At least it's been We've done this for lots of different labs on really really good labs Fantastic Harvard MIT Stanford and These people you tell them and the people we're telling them are all other people and this is full consortium We got together said this is the way to do things and they don't follow the rules. They think it's a better way and weeks Yeah As long as they have another That's just Yeah, so I'm pretty pretty the other one where I was workshop A couple weeks It's very difficult At all So what's the particular problem So for example, so And this is the sample This is the patient sample number To six and this is the That would be that would be technically okay, I could work with that but the problem is it's very prone to typos, right? Um, it's very difficult to check that kind of thing when it's in the sys files one by one. Yeah It's very hard to go back. You can't So one thing one thing about data is Data doesn't change an mcs file is a data file once that's written. You don't touch it again, right? That that's that's kind of the rule. You guys make one rule that we might never touch a data an sys file Because that that could be clinical data You're right. I want to start by starting on that So the problem is if you start including these metadata inside your sys file Information about other stuff other than This cell and this rest of this measurement. That's really the only thing you want to get out of the sys file You're making this other metadata metadata changes Right, you got the diagnosis wrong We went back with the patient, you know some some other information that should have They had a sex change on your nail and now it's treatment, right? But the data the sys file There were some measurements shouldn't change metadata can change So you want to keep things that change separate from things that don't change So minimal Annotation on the sys file It's okay maximal Annotation on some other type less That's good Yeah You're like My cells are on ice and I gotta run them right now and you check and stuff in and you got somebody talking Well, I have a count for example to that actually it went uh What also has had happened in that that mouse data set with 10,000 files is there was a keyword That was very important for them to manually enter and it was uh, they had one of the markers was a lineage marker So it actually changed from file to file, but the channel names stayed the same So you had to actually change the you know in In here you had to change for example what this says, you know cd14 Should have changed it to b-cell or t whatever lineage you're going for And what happened was someone had entered b-cell, you know for this file and they ran it and then they left And another guy's like oh, don't worry. I'll take over for you. They started writing all these next samples and left that keyword static So all the next samples were labeled as b-cell, but they were not b-cell So That's not so if you don't want to spend time adding those keywords totally great Just double check that you don't have one that is misleading Left in there. That's like saved for all the next samples. That's all. All right. Okay. Let's move on. This is a very passionate issue Yeah Yeah In the real world, yeah If you have 10 samples, it's okay to go back and be like, oh, that was the type of here, but if you have 10,000, believe me Um, so So those are the keywords Here's a useful one for example Well, maybe that's a useful 225 Maybe you are given an excel spreadsheet with not the file names in the diagnoses, but the tube number Or maybe this is an extra type of quality thing This one it was also automatically generated. So it's not like you're sitting there entering it But it is available for some cytometers some acquisition softwares Uh, okay, so we looked at this. Okay, let's move on from that Let's plot stuff How do you plot fcs files? Well, let's try the obvious thing What happens now? What do we do now? Oh, right. We haven't loaded the package that brings in the capability of plotting flow frame objects So let's load it. It even gives you a hint of what it is because it recognizes you try to plot a flow frame But that's such a complex object. The basic plotting functionality of r doesn't know what to do with it So we load this and now The way that you plot is you give it the flow frame object f That's the one that contains everything about our our fcs file you give it two channel names And that's what it looks like I have added some extra options here just to make the plot better looking. Why don't you try taking out Take out this smooth equals false Yeah, that's awesome, isn't it? But I mean, I don't know. I can't really see what's going on there It's funny explaining how the computer collects the numbers from each cell Yeah I don't know let's see Yep No So a flow vis gives you a little bit of a weird error when it can do something Because it's seeing a flow frame object and then it's seeing numbers And it doesn't think that's what that's not what I want. They wanted a flow frame object and Strings things in quotes However, I will show you a different way of doing what you want In just I don't know when Maybe I'll show it to you now. I'm not sure where it's coming up Remember matrix a So what does that plot the first column in the second column? So it can actually it plotted one eleven and two twelve Or I don't know actually Two one and eleven twelve. Yeah, okay. That's right. Well, it's yeah sorted You can actually plot Remember our our matrix e that has all the expression values These are the first two the forward scatters area and side scatter area That's going to give us basically the same plot that we had before And here I will tell you in one second Now you can do one three instead of forward scatter side scatter So the pch is point character It means what character am I going to use to do the the dots and I selected a dot if you don't specify This by default it does this like circle thing Notice how it's taking a while because it's like they're so huge and it's it's going over top of each other And there Doesn't look very nice So the plotting character means The character I'm going to use to plot So every time you're plotting closer times you pass you to use a dot because there's so many cells That otherwise you're not going to be able to see anything with these huge characters Because before we were actually using the flow vis package And we were giving it a flow frame And it was smart enough to figure out what to do It knew like for flow frame objects use a dots and if that was hidden information from you, but it was Going on in the background Now we're not relying on the fact that we're not using flow vis in this line here Because because we're just plotting some numbers in a matrix Yeah Exactly does this make sense So that was a great question because it is very tedious to type out all the Channel names and stuff all the time, but if you're Like working with this data, you know that it's always the first and the third channel of old scatter area sides scatter area Yeah Yeah um Another thing that I would do which we were going to do actually tomorrow or I think tomorrow But instead of that you could just say fsc equals Can you do that? Assign these two variables here Every time you want to type forward scatter dash a and close you just type fsc So what for what? Right so so for fun some functions if you're you know, can't remember what it was you can Press tab when you're part ways you're typing the function So I was like this I was like l e and g and then oh, how do I spell length? I forget Tab brings up all the things I'll start with length. Oh in fact, it even has a shorthand for If you want to assign Length variable or something So if then you tab again and it's going to select that first one. So you don't have to always type things That's a personal preference thing. I never use it because it bothers me a lot. Yeah, so I'm the absolute Yeah, okay. Say that because I'm always look. Oh my god. It bothers me so much Anyways Like I said personal preference Uh, you might notice this looks a little more squished than it did before When we plotted it Is because you can hardly see but there's these little dots, you know here. There's one here. There's one You know, I don't really Want to waste my entire plotting region just so I could see that dot So how can I make it so that it looks a little bit better? Is you can add another parameter to another piece of information to the plotting procedure to tell it How you want it to look and that's y lim You know how this is your x-axis and this is your y-axis The y-lim means the y-axis limits what values do you want to be looking at? And that's a vector of two Of length two. So I want values between zero and 5 000 As far as I can see this is 10 000 This is 5000 like that looks like really all I really care about Does that make sense? so this plots thing it's it's You know fairly straightforward in terms of you do you say plot My thing but then you can specify exactly how you want your plot to look For example this point character thing you can specify You know make try to make it look a little bit better More understandable by specifying you on this Now when you look at this if I haven't explained what any of this is to you in the very beginning You may have been a little put off by this line, right? What is this like e bracket comma c bracket pch? It stands for concatenate and it's put these numbers together into one vector It's a computer thing. You know, it's a little abstract Yeah, yeah, that's what it usually is Here at sea, I don't know Yeah, exactly This thing this line here Is something you may typically see when you're reading one of those package vignettes where they go over an example Though this is how user package you do this and then you do that and then you do plot bracket e Square bracket comma c all these things. So I just want you to keep in mind that it may look more complicated than it actually is um I mean, I guess it is a little complicated, but The plotting Specifically has so many possibilities of making it look better I'm trying to stay away from that right now because I don't want to confuse you too much But you know, it was necessary for us to plot only up to 5 000 otherwise it would look like crap. We wouldn't know what we're looking at So so far we've learned two ways to plot full of cytometry data one was using the package flow vis where you just give it the flow frame this line Give it the flow frame and the channel names And another way is A little bit more customized Oh, I guess I don't have up there where you just plug the matrix of values You can use the indices of the columns which column numbers you want You can assign variables to these channel names. So you don't have to type it over and over You can set your axis Y-axis limits to be something more suitable your x-axis. Why don't we do that as well? How about you guys do that there exercise set the x limits to be x x axis limits to be between Whatever you think is suitable by looking at it go Okay, good, excellent So what values did you choose a little better? Yeah Now why did that why is it that uh for the size scale? I had to cut it off like the 5 000 You know, it went up to a lot. Why did I have to cut it out so much? This is one of the things that is stored in those keywords And it's automatically stored by the computer when you acquire the fcs files Remember that the third channel so call names Column names of f gave us all the parameters right all of the channel names I'm subsetting it to the third one only Here's all of them Here's the third one only That side scatter area If you actually read through the description carefully You would actually see this keyword P3 stands for parameter 3 or whatever something like that. I made a parameter. I assume that's what it says Parameter 3 display what scale should this be displayed on and it says log So actually we should be taking the log of side scatter before we plot it and then it will look nicer That's why it doesn't look so great right now What what is the display scale for forward scatter area? How would you find that? Sure replace the three with a one Lin linear scale Sometimes you may have you know parameters in your data that are supposed to be on a log scale or on a linear scale And this is one way that if you're not a biologist and automatically know these things Check Okay, so let's stop for now With this plotting, but let's first let's now read a Yeah, we can get through this Kind of repeat it a little bit Let's read a flow set not just one fcs file, but the whole set of fcs files Remember in this folder full fcs. I actually have three fcs files You can imagine you might have a hundred. You don't want to be doing read dot fcs one by one You want to read all of them? This is how the functional read dot flow set works. You give it the path, which is the So this is how things work in R You you can specify path equals The the parameter that the function read dot flow set is expecting Or Trust that the function is written well enough that it can guess what you're trying to do so for example Length takes x Some something called x you can do length. Remember our vector y. We had our vector y like this Or you can specify It's still actually using y the parameter name is x within the actual function So in read dot flow set you have to specify the path to your data Which is the folder name relative to the current folder you're in as full fcs And then this thing pattern that you can specify is dot fcs What this is going to do is look at all the files in the paths that you specified all of these files And it's only going to read the ones that contain the string dot fcs Because what could happen is here you might have hundreds of files and like a few of them are excel spreadsheets So you don't want R to be trying to read that as a flow flow frame Or you have organized your data really well and you don't need to worry about that In which case you don't need to specify this parameter at all You can just do that So now run that it's going to take a couple seconds or six seven Remember how when it reads it it gives you all these warnings because you know It's not necessarily perfectly annotated and some standard issues a little bit And then print it out Remember how when you print it out f it said flow frame objects at the top Now it's a flow set object It's so it's like a set of flow frame objects like a list essentially very similar to list And you can do remember how we did call names Of f and that was like all the channel names You can also do that for the flow set It expects that all the things you're reading in have the same call names. They have the same channels It's going to complain if you try to read in a bunch of fcs files and some of them have You know cd3 some of them have cd10 It's going to be like no, I can't put these together Here's some other things that functionality of how working with a flow set object sample names It names them based on their file names right These were the file names if you look at the folder there You can do length I only have three fcs files in my flow set We did this one already call names I want to get this first one and there's two ways of getting it either by name or by Or remember with the list how I had my list and it was like first and second I could either just do the double square brackets and put one if I wanted the first Object in that list Or I can use the name of it instead So if I had if I had been reading in a bunch of fcs files and I wanted to get bobs and I knew that his name was in the file which was terrible Never have the name Oh, yeah, me too, it's terrible So this is a little a lot like a list of flow frame objects So notice how the first one it's just like a it's a flow flow frame object, right? You know how we used n-roll before so let's say I have this first one and I wanted to see how many cells are in there I can do n-roll of this flow frame object 65 000 Now if you have a hundred fcs files, you don't want to be you know, n-roll of fs2 fs3 and so on you don't want to be typing that so there's actually this fs apply function which inception applies a function to each fcs or To each fcs file within the flow set So for example the function n-roll Which normally to a flow frame it does this it gives you the number of cells When I apply it to the flow set It gets it actually applies that function to each entry of the flow set each flow frame gets that function So if you supply Flow set apply To each part of the flow set apply To this flow set apply this But we say if this applies actually the name of the function yes, yes, it's just Yes, yes, yes If I if if I call it flow set one instead fs apply flow set one The first thing is always which flows that are we are we going to be applying this to and then what are we going to be doing? n-roll And notice how it has named each entry according to this so that instead of if I Make this into a matrix B b The first entry The first row is that and it also has sorry it also has a name here Or I can say okay For the file What was the How many cells were in there so again you can even in the matrix you can name your rows and instead of saying I want the first row or the tenth row. I want the row name this So this would be the flow set is a matrix with a matrix. Yeah, yeah a little bit. Yeah something like that It's more like a matrix within a list of matrices Like the flow set is a list One thing in the list another thing in the list another thing in the list Each of those things is a flow frame Which has a matrix Here's a fancy fs apply What did this do? Yeah, it went through and the first one had this tube name the second one this tube name the third one this tube name Okay, I have like four minutes left so Almost Is everyone okay if I take four minutes good so Everything is a flow frame inside of the flow set right so you can kind of subset the flow set And pretend like it's just f or f from before so you can do things like plotted just like you did before There it looks a little bit different than the first one right? This is the second one you can save the matrix of expression values just like we did before of let's say the first one And now let's extract just the forward scatter values by taking this matrix e Remember it has like the rows or the cells the columns or the parameters And now I have extracted only the fcs files. I'm going to print out the very first 10 of them So the first 10 cells these are the oh, sorry fcs forward scatter values Now I can plot the density of these there's actually you know 60,000 of them right I only plotted the first prints of the first 10 Let's plot the density So in this flow frame most of the cells are around 20,000 and 100,000 actually so when you were cutting off your plot you could have cut it off with 100,000 And you would have seen pretty much most of them There's a little kink here. Do you guys see it? Very small like a little kink there Those are all these cells that are on the margin Those are when the Slateometer can't measure anything higher than that value. So it just gives it that value We're going to get rid of those later Another super cool thing about plotting in r and this is the last thing I'm going to show you before lunch This par it stands for parameter and it kind of sets before we start plotting. I'm going to Make my plotting set up a little better Mf row it means that it's going to have three rows and one column the next plotting thing that I'm going to do And this is what happens. I'm going to plot the first what I'm plotting here is the first Flow frame right i'm taking the first one forward scatter side scatter Let's do that plus it here second and third So what this par mf row See three one thing did was it just set up my plotting region To have three rows and one column So the next three things i'm going to plot are going to go into those slots That was not um, yeah, I was on purpose. I wanted to see Very good. See I copied and pasted and I didn't change it Yeah, I just uh, so so main means main title of the plot So when you when you add this thing to the plot function main equals you can also say Oops third oops, so now when I plot it It says third And that's all i'm going to show you right now Uh, sort of yeah Yeah There is a lot of a a lot of options that you can do so Definitely you can do that, but you would have to yeah That's the long way Or um, yeah, I would I honestly haven't done that exact thing that you said, but uh If you look at question mark If you look at question mark par that's the thing I use here to make the region three by one You can scroll on forever and ever and ever and ever and read about all the things you can set Here's what I was using And it explains kind of what it does here. So there is definitely a way I just don't know what to talk about. I could probably google it Okay, uh, is this good or does anybody want to ask me anything feel free?