 How many are you getting? It's time to report it. So you guys want to open up our studio? And remember how yesterday we transformed our flow set and we saved it? Now we're going to load it so we don't have to re-transform it. Scratch. What was it transformed to? FS.trans or trans.fs.rdata. You can find it in the directory and specify the full directory. You just need to start with the clean console. Sure, yes. That would be great. Yeah, I guess this is module 5 so you can open up the R script for module 5. It should already be in there. How to do that? So basically what we're going to do today is we have... We're going to start with the transformed data. So remember we pre-processed it, got rid of all the debris, margin events, all of that, skated our population of interest in our starting population. So we'll emphasize in this particular data set. We did the best we could to exclude all the cells that we don't care about. And then we transformed the channels that needed transformation. So today what we're going to do is actually analyze the data set and try to get something out of it. And we're going to do the, I suppose, discovery route of analysis where we're going to use full type of our key optimets to try to come up with what is the phenotype which differentiates the two groups of patients. So in these 20 samples I have chosen 10 of them which are HIV positive and 10 of them which have a very long survival time and 10 of them which have a very short survival time. And we're going to try to find the phenotypes which best differentiate between these two groups. As a first step we're going to get the live CD3 positive cells using 1D gating so the flow density that Ryan was just talking about we're going to do one dimension at a time, we're going to use that. And actually now this step we're going to do just like we did yesterday for the degree, we're going to do a pooled flow frame and then we're going to try to look at what is the CD3 versus the viability channel look like, try to see what the gate should be and then grab just the CD3 positive live cells. After that we're going to, in order for us to use our key optimets and flow type, we have to first define, tell it what are the positives and negatives for all of the channels. So we're going to define thresholds for us gates for each and every one of the channels. And we're going to do this again with a pooled frame we're going to plot it, see what it looks like and say okay it looks like the CD4 gate should be this. But then we're going to actually use flow density to do that exact same step again but instead of us just looking at the plot and eyeballing you know the gate should be 5 or 6 or whatever we're going to actually use flow density and have flow density define the gate for us and just convince ourselves that it's doing something that we're actually also doing by looking at the data we're going to visually assess the suitability of the gate so we're going to just double check that it does actually work and then after that the second part after a long time we're going to talk about we're going to do flow type and our key optimets and that's going to be really hard. So I'm not going to lie. It's going to be a little computer sciencey. Like Ryan said this is the first version of those last two packages flow type and our key optimets. Those are the ones that make those colorful plots that then the really red one that's your best meaner type and they look really good for publishing papers. So with first of the day we'll be right out of the water. This is the very first version of them yeah but currently they're just finishing up the newer version of these packages and once they come out they're actually going to be much easier to use. So you guys are going to be super smart and super qualified to use the newer versions that are easier. And actually me teaching you the harder version is going to give you a lot more intuition behind how those algorithms actually work which is going to help you in interpreting the results. So there that's my excuse. So yeah the first thing we're going to do is so today I'm not going to use our studio at all actually I'm only going to have the code on the slides and you guys are going to be running everything because you can debug your own issues now you know. You know what you should be seeing so you don't need to be seeing it on my screen as well. I will have some pictures. So the first thing we're going to do is this very first command here it clears the current workspace. So today we just started up your computer so you didn't have anything in there but once in a while you can do that just to clear all the variables so you didn't accidentally use a... yeah there you go. Graphics off. This again closes up all of your plots that you might have loaded and it just gives you a little bit of a speed boost and now we're going to call bring in all the libraries that we're going to be working with today Flowcore, Flowdency and GEOMAP which is actually something kind of used by Flowdensity for the plotting purposes when the packages published it will actually include the GEOMAP package within it so you won't have to do this while sitting by the director. Then we're going to set our working directory to again this is to tell where our data is going to be stored and then we're going to load trans.fs.rdata print it out just to see you know that's the thing that we were working with that has 20 things in it at this point if I was working on this and I haven't worked on it in a while I would just plot at least one of them and just double check you know this is what I think it is properly transformed so feel free to do that does everyone have that ready like everyone has opened up our studio and loaded in the trans.fs yeah good and what time is it should we take a break now before I start actually doing things okay take a break if you would like to continue working then you can just plot the CD3 versus the dump channel see what it looks like to remind yourself because that's the thing we're navigating first we're going to get the CD3 caused by live cells should I press up 10 I think that's how you stop it okay so we have our transformed full set already loaded up and ready to go we have a plan of what we're going to do with it and the first step is we're going to get the CD3 positive live cells so I'm going to be using the CD3 channel and the dump channel which is the viability stain and the CD14 whatever thing they want together so I'm not going to be typing out these r780a v450a over and over and over again because I know that's what I'm going to be typing a lot of I'm just going to set these two variables CD3 and dump for convenience does that make sense so we're going to define first the pooled frame again because we have this full set we're going to try to pick a gate that's going to work for all of them so I'm going to take a random sample of cells from each and every full frame that I have and remember the way that you can use this function getGlobalFrame first you have to tell R that it should read this file that I've written so that's what this line does is calls that file that I had written for you guys because I don't need this visual we're just going to get the algorithm so and so down because you can this is your code but you can't get it to another qualifier no so this function getGlobalFrame if you actually look at it in this support functions.R if you open up that script and scroll down to where the function getGlobalFrame is defined you can see that actually it takes on a flow set and it takes on one more parameter but that parameter is optional there's a default value for it that I automatically calculate for you or you can set the value and what that value is is sort of relative to the average size of a flow frame how big should my pooled frame be so in this flow set each file has about 20,000 cells the default value is 2 so the pooled frame you're going to get is going to have about 40,000 cells you can play with that value around if you feel like you're not getting a good enough representation and increase that if you want to have 60,000 cells in the pooled frame instead of just 40,000 or decrease it if you feel like you don't need that many but because it has the default value you don't actually have to specify it and then we're going to plot our pooled frame using the plot dense the flow density plotting function and you just give it these two channels like this and how does it look is everybody seeing it yeah I'm not oh I have it surprised I haven't looked at this in two days so I have three days actually from the airplane okay so the random it's a random sample it's a uniform distribution if you have 100 cells it randomly pulls out 10 elements so I'm looking at this and the x axis is c3 the y axis is the dump channel and obviously I want to gate the ones these ones right this population where the hand is that's the cd3 positive live cells so I'm just eyeballing it for now just like yesterday we did I see that the value of one remember these are transformed channels so it's a logically transformed soul the value one is a transformed value does everybody agree with my sort of gating here of what I intend to do I intend to set the gate on cd3 at one and the gate of the dump channel at 1.5 the time you would put it a little closer to yeah feel free to do that or you did okay good 1.2 and 1.3 1.2 and 1.3 okay yes that's actually one of those things that flow density the way that it gates is basically like Ryan said in many different ways it either finds a nice separation of the two peaks or uses a standard deviation or percentile or those trap slope things he talked about and the way that we one way that we choose which method to use is by talking to you guys how do you set your gate and we need to know how tight do you make it because that will tell us should we use the two standard deviations or three standard deviations right if you want to really tight two standard deviations away from the beam okay so now yep when you do it just once I don't know if I would try different ones yeah so what would happen is if your data set kind of moves around like each frame is little little bit shifted and you do a pulled frame and you plot it it's not going to look so nicely separated right it's going to kind of look like one population here you know it wouldn't be clear for me where the gates should go if I do it once and it looks very clear to me where the gates should go I don't need to do it again if it doesn't look very clear I'm going to actually do it again but increase my parameter for let's get more cells and just be more sure that it is possible for me to just set one gate for the whole set it may be impossible in many cases it's a slightly impossible sometimes there is enough variation that I don't feel comfortable just using long value and I will mention how to address that so then let's say that you pick your values feel free to pick your own values you don't have to use mine and let's plot it for so this code here is going to plot it for every single flow frame and just put those values there and I did not do that so I will let you guys make this plot and look at it yourselves and tell me what you think about it tell me what you think about it is it clear what we're plotting here we did this yesterday but it is a lot of things so the first line of code opens up a plot of 5 rows and 4 columns because we have 20 samples and I want to plot them all in the same plot so I can get a general overview of what's going on this second part here, MAR it's the margins around each plot so the first number 3 means how many lines of empty space should be below the plot the next number 3 is how many lines of free space should be around here the third number is above and then the last number is 1 so when you plot it actually there shouldn't be too much space between the next plot here because we don't have a legend here any kind of axis labeling so we don't need that much space and this third one is just brings the labels a little bit closer up to the axis it's just for saving space on our plot in the region if you do question mark par whenever you get to the point where you've done something and it looks really good and you really want to save that plot but you really want to make it pretty and you want to adjust your sizes of your labels and things like this so you can par and go through that and it has some really really cool options for making or improving how your plot looks so that's the first line it just prepares it and then this 4 loop because I want to plot every one of them I'm going to cycle through 1 to the length of the flow set that's going to be 20 right I don't necessarily know that so instead of putting 20 I put the length of trans.fs it's always best to even if you know that it's 20 it's always best to just leave it as something more generalizable right because what if you want to reuse your code and read in a new flow set that's 30 and you forget that you had that 20 hard coded in there that happens a lot you forget that something's hard coded and you're wondering why is it not doing that what I expect then the first line within the loop uses the plot density this takes the i frame inside the flow set and it plots it for these 2 channels CD3 and dump and it draws these 2 lines remember v means vertical line h means horizontal line and line with equals 2 means make the lines a little bit thicker than normally and blue any questions about this this is the transform on both sides the one that we did yesterday yesterday when we pre-processed or removed all the debris and stuff and then we transformed we saved trans.fs no it was logical so everyone's okay with this everyone got the plot okay and looks fairly good I mean it's not perfect but I'm not reassuring that this is this variation it's always going to be a little different that's true okay so we have decided that we were fairly satisfied with our gating strategy according to how we visualized it so now we actually have to apply just plotting it doesn't make it so you have to actually remove those cells and just retain only the CD3 called the revival cells and so we're going to do that exactly the same way that we removed the debris before where remember how you set threshold for forward scatter you took the cells that were less than that threshold and you set up side scatter you took the cells that were less than that threshold you intersected them and those were yourself so we're doing exactly the same thing so but remember how we used a for loop for that and then you know within the for loop for each flow frame we did this and that's a little bit slow for running time computational time so instead because this is something that we apply to each and every flow frame we're going to actually try using fs apply this time remember fs apply it's a function that applies a function to each flow frame of a flow set object for example the function n-roll remember the function n-roll when you give it a flow frame so just one of the flow set things not the whole flow set but just one n-roll of a flow frame gives you the number of cells you have in that flow frame if you do fs apply and then you give it the flow set comma n-roll it's going to take that function n-roll that you put in there and apply it on each and every one of the frames inside the flow set and it's going to give you you know flow frame one has this many cells the second one has this many this many this many there's just one line of code but it does like all these things we're going to try to use the same idea here and what is the so the function there was n-roll that's the function that we were applying to the flow set what is our function here a function is gate the CD3 live cells there is no built-in function for that so we have to write our own function now yeah it's just like you know that function we used for the pooled frame get global frame that's a function I wrote somewhere else so that then you guys can just reuse it every time you want to get a pooled frame you just call that function you don't have to type it out the whole thing that I have typed out for you before so you're going to experience a lot of things where you're analyzing your flow data and a lot of your steps are very similar you don't want to be typing it out every single time you want to use this long code and try to change and disease or names or whatever every time so you should learn how to create your own functionality that you for your experiments find useful in this case I will find it useful to get the live CD3 positive cells in your case you might find some other common thing that you do that you find very useful maybe you can write your own function for a custom debris removal of your type of data so how do you go about writing a function we want a function which does something to a single flow frame right because then we're going to apply that to the whole flow set so let's just start with a single flow frame and design our steps that we want to apply to this one flow frame and just write the code for one flow frame then we're going to make that into a function that recognizes as a thing that it should run that same code so let's just for now focus on flow frame this is how we're going to remove the dead cells and the CD3 negative cells or rather retain only the CD3 positive live cells let's just start for example take the very first flow frame call it F we're going to first do what find this for example the CD3 positive indices which of these cells are CD3 positive well those are just which expression values of F in the CD3 channel are greater than one or whatever your gate was 1.2 whatever your tight gate was does that make sense does that one line of code make sense so far okay so that gives you all the indices of the cells that are beyond the CD3 threshold then we're going to find the other indices of interest which are the dump channel negative cells and again that's very similar which of the express of F so the expression values of F in the dump channel are less than whatever gate you decided 1.3 or whatever and combine them by using the intersection of these sets of indices and that will give you the little quadrant that we're actually interested in and then the way that you subset the way that you just take those cells we can put those into the final result that we want viable.f equals F and then you just subset the indices you could have instead found the CD3 negative cells and then taken the union of the CD3 negative cells and the dead cells and then subtracted those right so that doesn't matter you do it whatever sense to you. I have chosen to done it you know in this way but you could have done it the other way subtracted the unwanted cells instead of retaining those cells we're interested in. So does this make sense? So if you want to write a function that's going to repeat a process for each flow frame the first thing you do is figure out well how do I do this thing that I want to do for the one flow frame. So this is your very first step you're going to do this write this code that it works before you try to abstract it into making it some function that later you call. So you have to first make sure that the thing you're going to put in your function actually works and so let's make sure that it works. So here's the plotting steps you plot the expression of F so I have used this other so there's actually I guess I have plotted in a couple of different ways I have the plot dense version so first you plot it remember F is not altered F is the one that it's just the transformed one it's not the one that we have gated viable.f is the one that we actually gated so first you plot F and then use the points function to over topple up with in green color plot the cells of interest did everybody get that? Does everybody see is it legitimate what we're plotting is it legitimate what our function is doing is doing what you expected sometimes when you're writing this kind of function if it's doing something more involved than what we just did you might see something when you try to validate your function you might see like you've accidentally all the cells are green or something so you must have gone had a typo in your code or done something wrong so always always make sure that your function is doing what you expected to do before you use it on your whole entire flow set so I didn't include it here but there is a way did I include it in the virtual box the plot this yeah no did I include in the code sorry? yeah I did okay so yeah I basically just it's the convex hull of the points that are within that region when you go to question mark see hull it's you can see sort of what that function does it's explained and then if you scroll down to the end it gives you an example what I find the most useful and I'm trying to if I'm reading someone else's code and I don't know what this function is how do they do that what does that mean how do they even get that you know how am I supposed to know that I don't know where they where they got that I read sort of the header of the health function to get a gist of what function is doing or what it's about and then I scroll down to the bottom and find the example section look at the example and the example section usually they have some code that you can actually execute yourself and see what the code does step by step you will see okay so in the example they first load this data and then they plot the points that I get that and then they run this function and then what does that do oh there's what it does you just run the code and you see that oh that's the code that places the gate around so yeah if you're not comfortable with just copying my code which shouldn't be look through the example of that health file it's one of the best ways to learn just by example I mean obviously read the description it will help you sort of understand but it won't really click until you run the example so is everyone okay with that convex health is when you have a bunch of points and you want to place a gate that is as tight as possible but encapsulates all points so so yeah but every single dot so the reason why the gate is not very tight looking is because there's you know like one dot here there's like one dot here there's no dots here so that's why I just kind of cut across this line like so exactly exactly exactly the best I can yep so this is how I would plot try to like replicate a gated plot it's not going to look super satisfying to biologists because they're going to think why did you go up that bar and this and it's just because there's a few dots there probably don't make a huge difference if you include or exclude them and like Craig was saying you know it's if your gate is tighter it probably will look more comparable to a manual gate so you define gate get a point with this new function show but then you redefine it so what what the function does is actually it will give me the indices of where these lines should go sort of like all these points that surround my points except for it doesn't connect the last one so what I have done is that I have actually given all these points that are the coordinates of where how it should draw the art should draw the line I have also appended to the very end of it the first point where it started so that it will close the gate it's gate it is square brackets one exactly you can print out some of these things and see what they actually are and see what is the first thing it doesn't make sense is this okay then might get some fraction because yours look better it's okay you can say they're better but because I defined the viable gate I changed the numbers indices which cells are greater than 1 2 instead of 1 which cells are less than 1.3 so the reason why it goes like that instead of just like that is because there's one point there that's why it doesn't satisfy you manually would do it but it is the same thing that's functional you could say anything is greater than 0 right so is everyone okay with this plot so far yeah so remember if you're reading this later and you forgot what was going on just do the question marks we'll see you all and see you know go over the example just most of your questions when we're drawing the lines for the variable x the sub set gate start points why is the gate start points to comma being nothing after the comma that's pretty much us why is there nothing there oh that's the um it's a matrix so I'm only taking those indices and all columns and the only two columns I have in there are the CD3 and the dump channel so I'm only taking the rows there's no no super question there are some there are some okay so what have we done we have written some code that does the thing we want to do on a single flow frame we have verified that indeed the way that we composed our algorithm is valid it does what we expected to do we're happy with it now we want to go and put it into a function that we will never have to write this code ever again we'll just call it right as a function and so this is how you write a function um first you name it something let's name it get viable frame yeah how does R know it's a function by putting this arrow here and then saying function and by convention you do this little thing where you put the capital letters in the middle of the name yes that's just for readability yeah some people use the convention where they put underscores between or dots or whatever but this is the best way because it avoids any of those special characters and by convention the first letter is small because it's not that important there's more important things in R that should be capitalized your little function I'm just kidding it's just a convention yeah yeah I see that now yeah it's just like when you're defining a vector or a matrix or a list you would say list something this is a function so it's more special than those things and the way that you activate this function the way that R's not gonna like start now making this usable is you just execute this whole chunk of code R's not gonna do anything it's just gonna read it and be like okay from now on when you say get viable frame I'm gonna know what to do I'm gonna do these things that you're saying so exactly that's exactly what it is so you give it a name and then you say this is a function of and then this is your input variable this is the thing you expect people to give to you in our case it's gonna be a flow frame and you can call it whatever you want doesn't matter at all you can call it X you can call it A you can call it flow frame don't call it flow frame that's a reserved word right also don't name your function something that you know already is a function like don't name this you know transform there's already a transform because if you do name a transform and then you execute that you have deleted the previous one so there's a bit of competition a bit of competition there's actually sometimes I have used some old versions of some packages where someone didn't quite think hard enough about their naming stuff so I would load flowcore and then I would load this other package and this other package has these functions that replace some of the core functionality of flowcore and so funny things would happen when I run my old code and I wouldn't figure out why until I realized okay it's because I load this package that it deletes some of the functions that I relied on previous so just try to name it something you know more very specific to what you're doing not some generic term that probably exists okay so that this is the first line is you give it a name you say that it's a function of some variable that you're going to be working with within and you open a curly bracket you write your functionality here and you close the curly bracket so here's a functionality whatever you give to the function that's going to be our f now remember here when we were just practicing our algorithm we said f is going to be the very first flow frame just for practice purposes now your f is going to be whatever the person calling the function sends you so f is going to be something given to the function so we don't have an f definition inside so once we have our f we do the same things as before we define our cd3 positive indices greater than one the the dump channel indices less than 1.5 combined subset the viable frame is going to be now the subset frame that was given to you and then you have to return the viable f so when I call my function I will give it a flow frame and what I want to get back is just the viable cells, the viable cd3 positive cells that's why that's what we return because that's what we want to get back when you call a different function such as get global frame you give it a flow set but in return you get a full frame right that's what my function returned to you I do some stuff with the flow set that you've given me and then I create this flow frame and I return that flow frame to the function call does this make sense now if you just select that code and run it execute it so that R will read it and know that okay now every time you say get viable frame I am going to know what to do now let's try what we did before with the same flow frame F that we remember this one where we it was the trans.fs the first one the one that we used to practice and we wrote these like 5 lines of code basically for now we can do the exact same thing but just in one line of code and just get viable frame F and then you can why don't you just play around with that for a minute I'm not sure that I put code in there to plot it again okay plot it again and make sure that it's the same thing that we had before just now in one line of code instead so it seems like you know I have replaced I had 5 lines of code before that did this now it's one line of code but really I also had to write these like 7 lines of code above to define this function so how does that really save me anything right? I can reuse it again that's true that's one benefit another benefit is I would take this out of my script my analysis script and I would put it in a separate analysis script just like I have my support functions.r I have a bunch of little functions defined in there they're not inside of my main analysis script where if they were I would have like a thousand lines of code to do with the analysis of one dataset but because I'm calling them from this other script then my code looks really nice read.fcs transform.fcs whatever get global frame get viable frame plot frame done so your code becomes more readable the more you separate the functionality into sort of different modularized and basically you know the defined little functions that even just 5 lines of code if you take it out it will just doesn't look much better when someone's reading your code and trying to figure out what were you doing they don't have to figure out ok what is he doing here he's getting the cj3 indices and then he's getting this and then he's getting that and combining them and subsetting them oh no he's just getting the viable cells so also another point make sure your function names are informative of what it is you're doing don't call it function1 or anything like that cause that's no good it also helps a lot with the retainability so if you're doing the same thing over and over and over again you're using a lot of different scripts and then when you're down the road you find you made a mistake and you may have a bug in your code and then you don't want to have to go back and fix that and then your code and your code and all your assays will last next month so you just have to fix it in that one spot yeah and we have secure code cause you will find bugs and tell it what to do you might not if you don't look back you might never find them you might not but if you're doing so what are the rules that I have about stuff if you're doing stuff more than once then try and automate as much as you can and so you're using the same bit of code again and again you work with your functions and then put that aside you have less repetition but if you're trying to repeat stuff does that make sense so is everyone okay with writing their own functions now? I see the pins out is that possible? yeah it seems like they're all pros no as long as I can just copy your code I would give you a question about what's the use of points to be part of or to be very positive function I'm not seeing I'm not seeing my points I'm not seeing what the function is let's see yeah exactly probably third of everything on the other side of it so a good point if you're plotting something if you're not seeing what you expect to see and you know rerun it from the start remove all the variables start it again still it's not showing up what you think you should be seeing what are some ways that you can try to debug identify what the issue is is it really our studio that's not plotting what it should be plotting or is it maybe one of your things you're trying to plot you had a bug somewhere earlier and it's not what it should be so for example she was trying to plot if she was here she plotted these dots here you know they were black and then over top she tried to plot the green ones but they weren't showing up for some reason so she checked n-row of viable .f for example and it was 0 cells so it wasn't that our studio wasn't showing up the green points it was that there were no points in the frame so obviously above one of her less than or greater than signs is probably the wrong way so that's just something to keep in mind when it's not doing what you expected which is going to happen way more often than it never works from the first try ever right? well now that you have your function you know that you've tested your function you've made sure that it's going to work it will not happen but it will happen occasionally and there's now you used to do a lot of coding what was that you don't know much but when I used to coding I'd print statements all the time so whenever you have a bunch of functions that are doing stuff and doing variables you just want to put a print especially when you're learning to code the first time so you have a bunch of functions that are taking stuff and translating that what was in that and doing something else doing the three old steps just a print statement to show what's actually in that thing with the one way you get around to code like this so you're always doing visual checks to see that there is something in that that's what in the range you would expect what can you do here to address that? for example let's say that you want to just make sure that you are subsetting this correctly one thing you can do is before you return you cannot print open bracket and row what's going to happen there is every time that you call that function once it does it's a little subsetting of the CD3 positive viable cells and it does this then it's going to print the number of cells that are inside of that frame just before it returns up to you so if you see a zero there that should be telling you why is it not finding any CD3 positive viable cells some things there might be a bug in my function or maybe the frame that I passed to it already didn't have any cells in it so that's one place where you could add a print statement the next thing that happens is you're starting to ignore and always put statement thread rows because there it is working then you start putting checks in so if you think that there should be if you're doing a function that's supposed to return and supply the cells you're expecting that that's going to return the number of viable cells greater than zero so you wrote the code by the time you get tired of seeing that zero all the time all I'm doing is checking to see if it's if it's viable cells in there then if you have a different statement and you check to say if you're expecting there to be some viable cells return then you better check see that actually happened so you do a check to see if the number of viable cells greater than zero and that on the screen and that way you get a warning that something's not happening I usually print out error and capital letters with a bunch of exclamation points after that I go over a bunch of lines so that I'll see it on my screen I won't just ignore it they won't look like a thing that I normally see and then you can verbal if you're just trying to get tired it goes up yeah so this is all about no matter what code you're in you'll only use tricks over above just don't have this good stuff flow so now you don't have to do low power stuff things like that yeah it can be how your program are you can't really do anything now so I'm looking at the CD3 which is the line guys take anything or the CD3 go straight I'll execute that and then I'll just say I just wrote in that index name because like we're saying I'm looking to see that which I just defined and when I asked for road when I asked for events that were greater than one for CD3 I get a responsive integer right so that means that it did not find any cells that were greater than one so yeah so your original flow frame does it have any cells in it my original so your F not your viable F but your F can you plot it and plot the CD3 versus dump channel and make sure that you have cells over one maybe your transformation from yesterday was fudged in which case I would suggest that you open up the module 3 code and just run the whole thing and make sure that your trans.fs is saved properly do plot dense of f, and concatenate the CD3, dump channel to see what it looks like and make sure that it's on the same scale that I was looking at that there are cells over one so everyone's comfortable with the function and everyone's comfortable with how we use the function you just call it just like we did get a global frame you give it the thing that it takes as input so all we might do is a function copy that so open new script getbottleframe.r as a new script you can, yeah and then we wanted to use it in our metascript we would source from this directory getbottleframe.r exactly that's exactly what we wanted you can do that you can even do that if you want, why don't you do that I'm already playing together okay so a function and stuff maybe you can share that with the class now that we have this function we can just apply we don't have to do it one frame at a time anymore we can just use fsapply which takes a flow set and applies a function to it to each one of its entries to each one of the flow frames if it applies a function that's part of flow pool it's a built-in thing I didn't write that sorry remember how we did nrow here instead before and it just gave us a list of all the numbers of cells that are in each flow frame well now we can just put another function in there instead like when you would need drop plots or do that do a four loop in there yeah it seems limited if you can't scroll it squishes everything that's our studio yeah yes so there is something you can do so I rarely plot things just like in such a large scale you know 20 things on one plot just to look at it on my screen but you can actually write an image file to your computer system directly and give it the dimensions you wanted the picture to be so maybe you want it to be really long and narrow or a huge thing and then I look at that, I open that up so we will do that later yeah but yeah in R if you're just using a terminal and a window pops up you can resize that window even in our studio you can resize it but because it's such a limited region so this is how EFS apply apply the function is everyone okay with that really let me take a walk up yeah sorry did you save any more let's look at your images I'll leave them here okay so it must not be all of it all of it starts yesterday yeah listen to the very end of this yeah oh yeah yeah yeah once you get it just run everything up until the point we're at so when we execute this line fs apply you give it the flow set that you're starting with so the trance.fs you give it the function name that you wanted to apply and you make sure that this function is such that it takes in a single flow frame and then it just stores all the return flow and it returns a single flow frame and then that way it becomes a flow set so now viable.fs you can print it out to your screen viable.fs enter and just see make sure it's just 20 flow frame objects inside of it just print it out if it's a blow train if it's a skip file it should speed up okay so your function are you going to find it I am very good so yeah so now that we have this flow set we can sort of examine it a little bit so we can if you do fs apply viable.fs and roll that gives you the number of cells in each of the flow frames of the viable flow set right dividing that by fs apply trans.fs and roll that second part gives me the number of cells that were in the flow set before I just took the CD3 positive viable cells so these live counts are actually proportions of live CD3 positive cells that I have if you're not sure how this quite works just this fs apply viable.fs comma and roll just that part and print it out on your screen just run it in your console and see what that gives you and then you can copy the second piece and print that out on your screen and see what that gives you and then convince yourself that it is the live proportions go to your other box and turn up the virtual box and is there any chat working on it? yeah it is yeah it's fine you can actually drag it up if you want to okay so seriously computer right it's your box so you're not working on this so you're not working on this yep so so does everybody have the live counts and plotted the next line plots the density of the live counts does it make sense what those live counts are? their proportions of the basically what percentage like if you were opening this in flowjo and you put this gate here and it will give you like 30% or something like that that's exactly what it is when you have two vectors of the same size it knows to divide element wise yeah that's how R works that's not necessarily how other packages or other programs work but that's what R does by default you get the compression that you did what? yeah play around with it plot the histogram as well as the density see which one you find more informative I remember in the histogram you can also put comma and then some number and that will tell it how many little bars to plot how to what? besides the random oh I just that's like something if I was looking at the yeah when you guys are in R studio you know how you can run execute line by line by pressing control enter or by selecting line and pressing run or selecting a section and pressing run if you click source on the top there will just run everything in that script that you have over so if you wanted to rerun your code from scratch because you think you maybe rerun one line a couple of times extra then you can just source excuse that master this guy someone's didn't even span that's really good sorry all the devices in the window where your virtual box is open or you have a menu like I have this menu let me show you because I don't like this is my window where my virtual box is right I have this menu here devices do you have something like that somewhere in the window where your virtual box is running that's the find the virtual box yeah yeah yeah all the problems are I'm trying to jump to the point where how is there a video oh yeah yeah oh oh oh oh yeah okay um would I do one that's that's a different question oh yeah yeah so all this time all the problems yeah second question that's why how you can make can you come one by one kind of like a puzzle to see like that's how much you're making like your menu is too big So what happens when you plot the density of these live counts, it just so shows you throughout your full set how you know what kind of live percentages you're working with or what kind of seed three live percent positive percentages you're working with and it looks like there's two peaks right does it look like you have two peaks there's like so a little dip in between them so whenever you see two peaks that kind of looks like you have two groups right maybe you can separate them somehow so maybe even just these live counts live city three positive counts could be enough to separate your data well who knows we would need to have some kind of p-value to evaluate that more precisely or even to make sure that it's not just it just so happens there's some of them have a lower count some of them have a higher but it has nothing to do with the survival time so on that note maybe now it's time that we actually take advantage of the survival data that we have so like I said these 20 patients I took 10 that had very low survival times and 10 that had fairly high survival times the whole data set had 466 patients so I just picked 10 of the one kind 10 of the other kind basically so in normally you would read in an Excel spreadsheet that held all the clinical information and you know the survival times for each of the patients in this data set it just so happens that that information is already inside of the FCS files so if you recall we were scrolling through the description of the flow frame and it had this thing there CD survival time from zero conversion I don't know if you recall that from yesterday and it had the number of days of survival survival before either the patient progressed to AIDS or died so how do we get all that information from the whole flow set you want to know the survival time for each patient well because it's inception yes remember how with FS apply we put in a flow set okay and then we put in a function right okay yes you know how when you plot the expression of F you can either first define a matrix E equals the expression of F and then you plot E or you can plot the expression of F so you can kind of save yourself some some code if you just write it within one line you don't have to but you can in this case if you were to write a special function and save it in a different file and source it so that you can call it you would just be a one line function it's not a five line function like we did earlier it's a one line function so for that reason you can actually just put it directly in there so what we're doing generating is a new matrix that has data from the Bible going at this matrix flow set adding we're aligning another matrix we're aligning a one line matrix it's not a matrix it's it's a remember when you do a flow frame at description it gives you all those gibberish things and all the keywords I'm actually actually accessing only one of those keywords the one that was called CD survival time from serial conversion and that is all that this function does it's just it takes in a flow frame why is this a flow frame because I'm applying this to a flow set and FS apply takes a flow set and then whatever comes next it applies that to just each frame one at a time so so far don't look at this just look at this part just FS apply Bible FS I'm going to apply to this flow set something to each flow frame one at a time so from here on from from this comma on whatever I put is only going to be really applying to one flow frame at a time yeah yes and then you could just replace this with the dates yeah and that is what I would do if I were wanting to check my that my controls were obtained acquired on the same day as my non-controls make sure I'm using the correct control so this this function here it's just a local definition I'm never going to be reusing it again because it's only alive in this bracket here of this line of code and my code I'm never going to be calling this function again that's why it doesn't have a name has no name would you just tell you to accept what we had scripting those on inside with unfortunately no but because the ads is just a symbol right if it was a symbol yeah that is exactly the only reason why I have to do this complicated thing my right function X and X out because I can't just put at you know how when I did Enro I did Enro of X Enro of like I put the brackets I'm not doing that with the ad symbol it's a different type of game at all so because of that I have to do do it like this don't worry about it too much just know that if you needed to access one of those things that you get with the ad symbol you would have to do it like this that is all you need to learn but you will see this that's why I'm not trying to confuse you and you know teach you things that you will never see yeah I will never ask you to do that you do you will never be forced to do this there's other ways of getting this information but you will see this so I just wanted to demonstrate it's not don't worry too much about trying to be able to do it from scratch or your own or anything like that just know that it's just one way of defining a function that you're only going to be using this one time so you don't need to write a separate script file force that you can just define it on the fly so function needs to have something the right yeah and whatever you call it you're gonna be using that variable name for the actual things the function does if I put f here I would have an f here yeah that's it and that won't be that won't affect my f from like the previous screens it will not affect this f at all it's just a temporary variable just for the functions purposes of knowing what you're doing to the thing that I gave so just when you run this one line of code you see sir you can print out survival to the screen just this line don't run the next line yet just run this one and what what does it look like before you run as an American but yeah right it's a matrix and do you see how the days are in quotation marks during quotation marks right the number of days those are actually not numbers right because you know someone entered those of the flow spectrometer but you know ours is smart enough to know that those should be numbers maybe they shouldn't be numbers maybe there should be strings you know maybe there are patient ID numbers it's not a numerical number that I should be adding multiplying it's just a label so when I run the next line which says as numeric it basically forces this to be a vector of numbers so now it's turning it into something I can actually plot I can add I can multiply so that's why I have this this line of code here so you can do you can do question mark as numeric or question mark numeric and I will tell you about that and then we'll probably say towards the end of the help it will say see also or also see the following search terms that would be so there's as character which is if you had numbers but really you don't want them to be numbers you want them to be words you don't want someone to accidentally try and add them to something or multiply them then you can convert convert them to care a character vector and integers one two three numbers 1.2 yeah absolutely yeah yeah so in this case so I was wanting to see what the error message was going to be oh yeah yeah it's possible in certain cases maybe if you're trying to do a fancier plot maybe I won't do it like I don't know if you try to do a density plot or Instagram but maybe then it will fail it will fail at some point you shouldn't be trying to plot words ever even if those words happen to be numbers to you so is everyone clear on this what this plot is the x-axis is the survival time the y-axis is life counts what am I plotting exactly so for example this dot here this patient survived 2000 days and they have about 20 percent of CD3 positive live cell proportion does that make sense that's what I'm plotting so when I look at this you know to me it kind of looks like there's very very few samples that I've plotted I can't really see too much but you know if we're trying to make it interesting for you guys there seems to be a little bit more of these people that don't survive very long and also happen to have fairly low live CD3 positive proportions and then there's a little bit more of the higher survival people and they also tend to have higher proportions of CD3 positive live cells maybe this is a biologically irrelevant thing that I'm plotting maybe probably what you're thinking is I shouldn't be plotting the proportion CD3 positive live cells I should just be plotting the CD3 positive cells out of the live right I shouldn't be taking it out including the dead cells in this at all which I am by by defining my proportion like this but just for illustration purposes let's assume it's somehow useful so this is that this is sort of like an exploratory data analysis thing you just visualize your data and try to see is there a pattern here or not if you see some kind of a pattern like perhaps this is a pattern if you had way more samples you see it more clearly then from this you could get an idea about maybe doing some kind of p-value type thing you know associating a p-value with it so that can demonstrate its significance but as a first step just plot it and see see what it looks like maybe it's interesting maybe it's not if all the dots were just all over the place it would definitely not be interesting as a first step so I decided not to go into detail about k-means just because it would be I think too much information what is k-means has anybody heard it before other than today when Ryan said it yeah you have an idea of what it is it's a clustering method what's a clustering method is it everyone's heard that obvious in clustering is just you know you have the intuitive understanding it's it tries to group together the cells that are somehow close to each other and to one cluster and then another group of cells that in another cluster doesn't have to be cells it could be anything could be patients tries to go patients that are similar to each other based on some criteria such as CD3 live proportions close if they're similar to each other it's going to cluster them into the same group versus other patients that seem to have different types very very different from that first groups live cell CD3 counts that's what clustering is it's grouping it's a method to group like things together k-means is one clustering method one sort of approach to clustering and what it does is literally take I mean there's there's different ways you can define distance here but it kind of take the Euclidean distance like the distance from this point to this point all of these points are really closest to each other and these these points kind of are closer to each other so if you ask k-means give me two clusters this is what I this is what this line does k-means here's my data my data is just live counts can you break those into two clusters for me that's what the fun using just literally you have a bunch of dots here and a bunch of dots here it's gonna say these are one cluster these are another cluster we have some dots that are kind of like this so if it just kind of splits it in half maybe it's not very logical how it does it but just exhaustive to split it in half so here it is it's puts it in half so these are the the clusters I have plotted here in color so the first one of the clusters isn't in black so these dots were grouped together all of these and then these red dots were so clearly drew the line right there like around 0.3 it said well there this is your split line best split line I can find two groups that doesn't seem very smart does it but doesn't seem like such a big deal you know so I just wanted to demonstrate this is a clustering method so clustering you guys are are usually associated with something complicated and advanced and fancy clustering method no it doesn't it could be you could they can get extremely fancy and complicated mathematically much more robust than this and statistically based and not they can get really fancy but the basis of what a clustering method is is you just group things so just because something is called a clustering method does not mean that it's automatically amazing so the way that came in works like I said it just kind of finds literally looks at the difference between this point and this point and this point and that point and says no no I'm gonna be a part of this cluster this point is way more similar to me than that point there it's that's all it's doing there is much fancier clustering methods which use a statistical distribution to try estimate you know how these points are like diffusing with each other much more elegant ways of you know am I closer to you or you know these points are actually way more spread out than these points and you're here so imagine you have a clump of points really close to each other and then you have all these points that are right here then you have this one point that's here it might actually be more likely that it's part of this like really spread out population right not this like really clumped one maybe K-Means is gonna put it here because like literally the distance from here to here is shorter than the distance from here to these other spread out points but because these are really spread out actually you know sound that far you know not that far off and see us that's one way that they can get really fancy and actually more intelligent and more robust and more reliable and would be much more sensible to real life but in its basis clustering is really not a big deal but it is one one the game K-Means is a fair game to try to give it a try it's all you're telling it is how many clusters you want you know you have two groups of patients we're giving it the number of clusters you want and if we happen to not have these three dots and these three dots wouldn't be nice like we would have gotten a really good separation that's why I circled in blue the ones that man I wish these other three patients weren't there because then it would have been really great we would have said oh wow look at that you know it seems like a lot of the K-Means agrees with us that the low lifestyle counts really correlates very well with survival time for example it would have really supported sort of our visual intuition about what's going on if you survive all if you sorry if you cluster survival it will find a perfect separation because I have actually happened to purposefully pick patients with really low survival and patients were really high I actually did that first when I was preparing this workshop like oh sweet these people are gonna be like so amazed and I'm like wait I can't cluster both because that's by definition doing what I I'm hoping it would do on its own without me telling it so you can't include your predict the variable that you're trying to predict which survival time you shouldn't include that in your analysis right do question mark K-Means and it will have an example so life counts right now is a vector but you can actually make it a matrix you can actually use the function C bind or R bind I think C bind will be the one to create a matrix where one of the columns is life counts the other column is survival and you give that to K-Means you can give it as many dimensions as you want yeah if you do question mark K-Means will have some examples at the bottom and they're not just with a vector they're with a matrix and it will look much better if you do that when you do K came this line and then you print out KM you can see what it looks like on the screen and try to understand you know what are these things you know it has this like one one one two one two two whatever you know that's just the 20 patients that you have it's assigning a label to them you belong to cluster one you belong to cluster two you belong to cluster one you belong to two two and so on and so that's why when I do call the color equals KM this value of cluster that's why that's how it gets colored it's because the color is now a vector of length 20 and for each point that I'm plotting it's using the first color sorry for the first point I plot it uses the first color that I pass it for the second point that it plots it uses the second color that I've passed it and because I'm using the cluster numbers some of the points get color one which is black some of the points get color two which is red you can define a variable just before your plotting function everywhere you say one you can say color equals black it's it's two you can say color is blue C bind okay yeah there you go so is everyone okay with this this is I'm not teaching you K means I'm just giving you sort of this is what it looks like if you're interested go and read about it on your own time there's actually I included this link here a really nice very simple to follow tutorial that explains exactly how K means goes about separating your points into these two clusters and it is a really nice sort of illustration of computer logic how the computer would when you look at some points yourself with your eyes your eyes actually are really fancy in the way that they work with your brain they you know you immediately see the pattern you must have a really good aptitude for seeing patterns just like looking at it computer is not very good too good there you go a computer is not biased like that it will not just assume it has some different ways but it's an equal opportunity right it's a different opportunity okay that's one of the things I'm looking at my colors all I can see now is the populations that I first defined yeah no matter how hard I try you know I find myself going but I want to look at populations I've always looked at before yeah you need to remove all variables they're your work space of your brain oh what is that how what time is it so 10 more minutes then or what is 12 30 was the lunchtime I forget okay okay so what so far we extracted the viable see three positive cells those are the ones that are biologically relevant for us and now we have a few more colors that we can work with and play around with the phenotypes and see what phenotypes are actually responsible for the survival time difference so now we have to gate basically define the in order remember in order for us to check all these phenotypes we have to first define the gates for each marker individually right because otherwise how am I gonna be able to define the phenotype see a policy for negative city 127 positive I have to tell our top mix what do I mean by positive and negative what is the cutoff value that I'm using to call anything about that positive negative so as a first step we're gonna let's consider city 4 first and we want to define a gate and let's let's do it like we did it before we created so first of all I'm gonna name city 4 to be this this v 655 dash a I'm gonna just put that as a variable CD4 so I don't have to type this long thing ever again step one like we did before we did a pooled frame get global frame out of this viable flow set the one with the city 3 positive live cells and let's plot it there it is it's plotted is everyone happy with the plotting and getting global frame thing we just did that there it is I can see the gate looks like should be 1.5 or something like that in the middle but I don't obviously like you guys have been asking all along you know I know we don't want to be just looking at it and trying to see is it 1.5 1.4 1.4 2 so we want to do this in an automated fashion and DE gates is a function of flow density and it's in fact the function of flow density that does the gate of the density gating so it's density gates that's what it's in what the DE stands for but you don't want to type in density gates so it's do you get no so so what I'm getting morning message absolute of the iteration I know missing negative infinity so there's one there was one event we looked at where the Yeah, it probably is, so when you run DE gate what it what it gives you the if you print out city 4 dot gate what value does it print for you guys 1.5 something like there you go I wouldn't have guessed that by eyeballing it but you know close enough so it's very close to your own like what you're seeing right and and you can if you say graphs equal true equal to true and DE gate it will actually generate this plot for you the density plot and this is how it does it okay so this is city city 4 over here on the y-axis of the the nice looking plot and you see there's a bunch of cells here a bunch of cells here and sort of like very few cells in between them so the density looks exactly like this there's a big peak where the negatives are a slightly smaller peak where the positives are what dense dense flow density does is identifies all the peaks in your density data so all the peaks so it will find this peak it's at one right the value one it will find this peak which is approximately at the value two and then it will actually find the minimum point in between those two values between the two peaks so it literally tries to find the exactly best position to place this gate that you would otherwise be eyeballing like where should I place it a little bit tighter towards the upper population or lower what so it actually places exactly where it's optimal density wise so if there were three peaks that have some very detailed reasoning logical reasoning first of all if there's a very very tiny little peak there actually is one you can hardly see it if that peak is 20 times smaller than the highest peak in the population it's going to ignore it as a like kink of the data like one cell there two cells there it's just a little noise basically so it's going to ignore the extremely small peaks that are clearly just noise in the data if there's three peaks that are then I'm not actually sure what it does right now because I didn't write the final code but it has some kind of reasoning as to which one to return like if it's yes it only returns one yes unless unless you if you read the help on the flow density package especially if you do question mark DE gate there's actually a bunch of other parameters you cannot hear if you add graphs equals true it will all give you the gate but also print this plot for you if you do comma all dot cut equals true this here we call it a cut point it's a point of which we decided to cut the data into two if you do all dot cut equals true it will actually if there was a third peak here it will also give you the second cut point so we'll give you all the cut points that it found so if you found four peaks three of the cut points and then if you know what your data is and you know that you expect to see you know sort of two negative populations one positive whatever it is biologically that's meaningful then you could totally make the choice for do you want to take the first cut point or the second cut point or do you want to do some kind of test using those cut points and then decide which one is the one that makes sense for you so it's actually a very versatile and it's not as black of a box as you may wish that I were like it's not like it will tell you by definition and this is the where you should draw your gate that's kind of up to you a lot of it it helps you out with using a density and trying to give you a logical place to draw the gate but in the end no so as long as it's a peak it will find it so that's where Ryan was talking about the five ways that will set a gate so if let's let's ignore this first one so if there's only one peak what it will do is I'm not sure what the default method is but what it can do is you can either use a standard deviation method where first it's gonna locate your peak your mode and then it's going to estimate what is the standard deviation of this distribution if it's kind of like a normal distribution no yeah yeah yeah it's gonna it's gonna take your peak and it's going to say okay I'm gonna make the standard deviation I'm gonna put the gate to standard deviations away from the peak that's gonna be my my gate first it's gonna try to estimate is this population all positive or all negative and it's gonna do that based on is it like higher than the middle sort of or is it lower than the middle of the range of values for that channel not quite but something along those lines you can read the details and then based on is if let's say that they decide this is mostly a positive population then it's going to do two standard deviations below the mode you can also specifically tell it is it positive or negative you can also instead of the standard deviation you can tell it three standard deviations or one standard deviation if you want your gate to be tighter you can decrease the number of standard deviations away from the mode you want to go if if you want your gate to be kind of like fairly loose you know the I really want to be as far away from that population as I can with my gate I don't want like any other cell that's right here to be positive or to be further from the vision can say four standard deviations instead of that you can also use a percentile method which is for if you have like I used Armstrong's data that's when I was analyzing your data you had negative controls if you have a negative control and you want to define a gate based on a negative control then you could use a percentile method and say the 99th percentile of my negative I know all these are negative for sure just give me the 99th percentile that's gonna be I'm gonna take that value let's say it's 1.47 and then on my actual not the negative control data but my actual data I'm just gonna use 1.47 so it's very versatile it can either if it has two peaks we'll try to find the minimum cut point if it only has one peak it can do things like standard deviation percentile it can try to detect an inflection point okay so the transition from finding the trough to switching to standard deviation can you adjust if it finds two peaks it will give you the cut point if it if there's only one peak then it will do something by default I think the 95th percentile by default unless you also specify use the standard deviation method you can say gate this and use the standard deviation method if it happens to be a case like this where there's two peaks it's gonna ignore the fact you said you standard deviation because it hey there's no need for that you have the two peaks right there but if it finds only one peak in one of your samples then it's gonna revert to fall back on those standard deviation method that you specify you once this package comes out and by conductor I guess we'll send you an email and there will be a much more detailed step-by-step example of if your function looks like or if your density looks like this this is what it's gonna do and so on sort of if you do question mark d gate it will have a lot of sure yep yes but there's also something that I will need you to do a while everything's fine everything's working now let's pretend that never happened so we talked about what do we do so far we just looked up CD4 and we plotted it we saw that there's a nice separation in the city for positive negative population and we decided that using full densities de gate function is reasonable instead of us trying to eyeball everything every time so now why don't we try to gate the other channels that we have we have k i 67 c 8 and city 127 we already did city 4 right we already defined up so here I just I'm just defining it again I don't want to type out all of these be 515 blah blah I'm just gonna rename them so then also when I'm reading my code later it's gonna make more sense to me it's not gonna be saying plots whatever all these things gonna say plot c3 or c4 for example more sensible so we're good on this and now I'm going to be looping through all of the channels CD4 CD8 CD127 k i 67 all the channels of interest and I'm going to be using de gate to define the channels but I want to store them in a vector right I don't want to just loop them calculated them then replace the value and just plot it I want to actually store these values so for the purposes of storing these values I'm gonna define an empty a basically a vector with the values negative infinity in the end I want that the vector store dog gates I wanted to look like this I wanted to be 1.47 comma 2.1 comma 1.3 comma 1.6 those are gonna be the gates for CD4 CD8 27 k 67 that's what I want does that make sense I want to have a variable which stored my gates minus infinity because it's just it you can pick any value because it's going to be replaced when I'm looking through but the starting value I said personally to minus infinity because to me by default I don't want the gates to to make any sense right I don't if I if I initialize all of the gates to be one the value one then if something went wrong I didn't miss the line or didn't look through all of them by accident and I get my final gates and they're like 1.2 1.3 1.1 the one might kind of I might miss it it might actually be above so if you want to zero you can do zero but zero is still legit value right you could it like zero is kind of so within the bounds of reasonable it's a personal choice exactly well yeah it's not imaginary it's not a number either but that is a different workshop together yeah it's a concept exactly so this is my own personal programmers choice of default value for a gate you can choose your own and it will be replaced so I am also just for good programming practice going to give names to the elements of the vector because I want to make sure I don't accidentally mess up was it was the first number at the city for one of the C8 one I don't want to you know I don't want to have to keep it straight in my mind always that I decided to to do it city for first and then Cv8 and then C127 thank you I don't want to have to remember that and also just for debugging purposes so if you print out store gate you should see just a bunch of negative infinity values and with names so now while I'm assigning values to this vector I'm also going to be plotting them and visualizing them just to make sure that DE gate is doing what I expected to do so I'm gonna go through all the channels in this order city for C8 city 127 K67 what I'm going to do in this loop is plot the using flow density plotting function the pooled frame and on the x-axis I'm gonna plot CD3 and on the y-axis I'm gonna plot Chan that's the thing I'm looking over so I expect to see four plots CD3 versus CD4 CD3 versus CD8 CD3 versus 127 CD3 versus K67 because I'm looping through this one of my channels is going to be changing right the other one I chose CD3 you can choose side scatter you can choose forward scatter something that doesn't change for the x-axis it's up to you I chose CD3 just for the purposes of visualizing on this line now remember how you can get the first entry of a vector by saying you know your vector name and then square brackets one right that will just give you the first or if the entries have names instead of one you can just say the CD4 channel name you can index by name instead of by by number because I gave my vector names to begin with now I can just that entry instead of negative infinity I want to replace it with the actual gate value okay D gate of the pooled frame comma the channel that you want the gate for is going to return a value like before it for the one and return 1.47 something like that so this part here is going to return something such as 1.47 I'm going to put it in place of the negative infinity of the store gates in the spot for the channel that I'm at in my loop if you want you can add just before this line you can add a print statement you can say print D gate pooled frame channel then I'm also going to because this plot is going to be active I'm going to add a line to visualize the place of this gate questions so see the names exactly the names are like one of the fourth name is Ki67 but Ki67 is actually B515 right I'm keeping them like this because they're still like this in the flow frame so if you if you gave if you try to plot the frame and you said CD3 it would be like there's no CD3 there's only r780-a in the flow frame so CD3 is actually a variable it's not a string and it's a variable that just helps me avoid having to type the string but really it's the string that's there doesn't make sense sort of so you can see here printed out this is what the what it ends up being right so the gate for the first thing so CD4 is 1.498 whatever the gate for V800 is this your numbers are they slightly different than mine yours what is it so basically okay yeah so that's the issue with the gate remember how I said it's not an issue like it's wrong it's you have to help it out basically if you do question mark de gate you can say actually look at the upper actually this is all if you do upper equals true and now you de gate here where you call the gates okay so see you will only you will have to either out an IP statement to only do what break was true for this channel or you can just remove that and then replace just the Ki67 entry that's okay yeah I will show you sorry this is probably trying to get things working in this chain try to do super fun on me yeah yeah so you know how you move these tweets yeah the first time so let me just see where we are shoot okay so sometimes what happens is you only have a single cell population and you know that this should all be negative but de gate by default is thinking that it might actually be positive the reason is because the density you know the density looks like this right but sometimes density looks a little more like that sometimes it looks a little more like this it tries to estimate which one of these doesn't look more like and then I think based on that it decides if it's all negative or positive so yours just looks a little backwards according to full density so you can help it along by adding a parameter which is upper equals true which tells it put it in the upper sub with the gate on the upper side of the peak that you find yeah right it kind of forces it to go that way so this is how you can do that if Chan equals Ki 67 this is for people that decided to do a tighter gate for the C3 and then now they're a little bit they're getting slightly different results but this is how you would this is also useful for you to know if you wanted to add an if statement let's say you're cycling through many many channels not just these four like 13 or something and you know that for for one of them this will do the right thing for some of them maybe let me just type it out and I'll tell you let's say that for some of them you you want to use what make use of one of the extra parameters that the gate has for example if the channel name is Ki 67 then maybe you want to add something extra to the gate such as use SD use the standard deviation method and the number of standard deviations three otherwise just do the normal thing you normally do because that seems to be the right thing for the other channels in fact if your gate is so bizarre that there is really no logic to it but you know you have decided it's gonna be like this you could even do something like this instead this is one option you can just give it the value instead of just battling with trying to find something that will automate automatically make it the thing exactly that you know you want why don't you just tell it what you want I have had to do this when I get data sets that have not really good compensation and really there is no way of automating the manual analysis and I have made the biologist aware of the fact that perhaps their data quality is not sufficient for any analysis but they insist on me trying to do something anyways and I say fine for you anyways you tell me the number where where the gate should be and I send them a plot of this is what I'm looking at you draw the line where you think it should go I can't tell and they just draw the line somewhere and then I just put that value in and give them the results and look the results are terrible there's nothing it's all random because it was all not very good but you can give it a try still or maybe it is one of those weird things that it just always looks really weird and you trust that your expertise is sufficient that you can manually set that one gate so there now you have enough statements if you ever need to use that and so this is what mine looked like yeah right so tell me about the city to 227 you tell me does it look any good or does it look wrong to you it looks wrong to me yeah I agree with you I don't even know what city 127 I also don't know what I would do is you know at this point just make the decision to discard this I would obviously talk to them a lot just first but then just go you know how much info I put into it for the purposes of this workshop I would discard it actually so does anybody know anything about city 127 that they could tell me that could tell me how to gate this better actually no it looks pretty um basically it looks what could be okay I don't know if you've seen it before what I would prefer is to have a control yeah it would be nice I can ever more any as a police sort of and give me an idea of where the gate may fly what flow density is seeing is sort of it's actually here using the truck slope where it's kind of seeing a little kink in the density around there and that's it's thinking maybe that's what you're looking for it's trying to please us but um when I'm when I look at this I would have to have a very long and deep discussion with the biologist about please explain to me how this makes sense and why there is no control for this kind of stain that looks like this right and uh if they're if they can you know convince me that there is some kind of logic to it or if they tell me you know what just this one time let's let's run it then I will just use their manual value for like I would add another statement but the others look okay right are we okay with the others so I just want because remember this is the uh pooled frame I'm plotting the pooled frame so what if the problem is not necessarily with city 127 but the problem is with my pooled flow frame maybe maybe for this particular channel there is a quite a bit of variation to the point where if I were to just plot one sample maybe there will be obviously obvious where the gate is but because I'm combining all of them maybe I'm super imposing them to the point where I'm hiding the split point so what I have done is I have plotted uh side scatter versus city 127 for every single sample and just trying to look at that and see is it still not obvious where the gate should go and to me it still is not obvious where the gate should go so I would also present this to to the biologist and ask them you know is this what's going on does this make sense so far you're trying to plot it against other things or yeah yeah so so that's a good point actually um yeah it didn't help it didn't help I still don't think yeah I think it looks like it's a compensation thing or like audio right yeah yeah yeah exactly yeah I tried to I tried plotting it against other parameters as well it didn't it didn't make it any more obvious okay okay I've actually yeah so plotting against qi67 gives you something because qi67 is a proliferation market and 1.7 is a memory margin so you actually start to see so yeah so you actually start to see sort of gates put in there because that's because each patient has a different portion of proliferation and resource that is really new it's hard yeah that's so I'm going to ignore this worker no for the purpose of this workshop for uh there is there is a way that you can handle yeah I need that result yeah I need a second patient clinic as we speak yeah I'm about to walk into sanitation right now this is a very good point as well yeah yeah yeah just on this one absolutely yeah so things to try when you see something doesn't look like it's nice at all try different transformation depends on the original value to transform the transformed that's why you need a good transformation it's not just so we look at it and it looks good but the method looks at it and is able to you know when you have two peaks that are extremely close together sometimes it's the method won't be able to believe that these two peaks are actually meant to be two separate populations they're so close together it could be an artificially created double peak because what D gate actually does is the way that it estimates the density of your points that you have is it kind of smooths it out a little bit because if you do it exactly based on all the points that you have you have little kinks right like the actual density is little kinks or dots every little dot that you have is going to be a little peak on its own right so there's a little bit of a smoothing procedure that goes on and it would just move right over if you do that if you don't transform it so things to consider is possibly going back before you transform your data and transforming just this one channel using a different transformation maybe using different logical transform parameters other things plot against other variables like other parameters this is the thing about the biologist should be able to to tell you right they're the ones that design this panel in such a way they must know hopefully may have known how they were going to get it before they made the panel maybe not another thing to consider in this for this purpose of this workshop I did truncate the data they were actually 13 colors so maybe one of the ones that are removed would have helped us here right there we just unfortunately to analyze this will take months right so so anyways for these are just things to keep in mind that you can go back and try to do a different transformation like you said plot against different something different and if ideally if it is possible to have some kind of control the way you would incorporate the control would be in this if statement if the channel is cd 127 define the gate to be the one based on my control right so you would you would first calculate previously something calculate gate for cd 127 based on control using de gate and the quanta or percentile so you could have you know read in your control file fcs file transform that in the same way you did this one and use de gate and a percentile to calculate where they get like maybe the 99 percentile 99.5 percentile whatever you feel comfortable with to calculate for the cd 127 channel and then maybe that value would have been 1.67 and then here you would have put 1.67 that's just one thing you may encounter as something useful to do so for now let's just ignore cd 127 so at this point we only have cd4 c8 and k67 to work with and we can at least try to plot the gates for for that and at least make sure that those gates we are happy with for all of our samples I only I'm only showing the cd4 one does this plotting thing make sense it's the same plotting the same like we're just looping through first of all I define my plotting region to be 5 by 4 with my margins a little bit smaller than the default one so we can actually see it on the NR studio then I'm going for i from 1 to 20 and plots dense the flow density plotting function here's my flow frame my two channels I want to plot time plotting cd4 and cd8 and then I'm doing a c8 versus k67 you can do whatever other one I just want k67 to be also plotted then I'm drawing some straight lines where the gate my gates were so a b line is the straight line v is vertical and what value do you want want for the vertical I want the store dot gates cd4 I want the whatever value it was like 1.67 whatever 5 I don't know I want that that value to be plotted twice the line width of normal line blue any questions about this this plotting thing no everyone's totally cool with that because we did it so many times and you can do it now too from scratch I saw this where we define that yeah so here this line start all gates rep means repeat negative infinity four times it's going to give you a vector that has the values negative infinity negative infinity negative infinity does that make sense a quick way of doing a for loop what do you mean oh yeah sorry um so there the way you define a vector is uh so there's one way yeah exactly right yeah it's just a function to uh just like you know how we define the vector uh for the values from one to 100 I have one colon 100 and that does like one two three four five six seven four us that's a standard thing you normally would be using in programming we also had the other way of defining a vector which was seq sequence where you give it a starting value from two and then buy so if I want to go from one to 100 but I want to go every third number by three it's going to be one four and so on rep is repeat because also oftentimes you just want actually many times before just initialize initialize things to zero a lot you know if you have a hundred things yeah it's just one other way of defining a vector I have probably defined a vector all of those ways right just to try to define a different way every time so that you see it in a different way because you will see people using all the different ways Suzanne does that make sense yeah and so the plotting thing we all know how to plot this stuff it all looks fairly all right right okay so before I go on to the flow type um I didn't want to do too much and and you guys completely confused so I'm going to just say this we used DE gate to define the gates right using the density it was very nice very automated and blah but we really used the pooled frame to do that right we took one pooled sample from all of our data and used DE gate to help us find the gates but we're applying that exact same value to every single flow frame but like you guys know typically the gate varies just ever so slightly between samples it would be really cool if it was just you know automatically move it for every flow frame individually and you you technically have the tools to do that you should be able to do that on your own if you would have probably four loops and things like this or how would we be able to do that where do I get the gates here so instead of having a for example instead of having a vector uh for you for for the gates so my vector here it has one gate for city four one gate for city eight one gate for this one gate for that instead I could have a matrix where each row is for the sample each sample each flow frame has its own row and so I could create this using another for loop where for each flow frame so I'm going to go for the first flow frame now for the second third for each channel it's an assumption it's a for loop within a for loop calculate the gate not for the pulled frame but for viable dot fsi for the i-th frame calculate for this channel the gate so don't don't overthink it just they're just the one I'm saying doesn't make sense that makes perfect okay because it does sometimes when I hear myself it sounds like because you look at this and you you can see the gates on many of these are yeah and that makes you very upset I can see it's good and it should make you very upset for the purposes of this workshop though I'm upset you're very afraid I'm upset for the first uh so it's because honestly the this analysis it's this is really the steps that I go for every time I analyze the data set I'm not hiding anything I'm not doing anything fancier than this really honest it's just um for every different data set there's slightly different things I have to handle maybe the transformation will take me a bit longer to get the right one maybe the removing the debris won't be so easy and I'll have to fudge with it a little all these little things I it will take me a month to analyze a data set like this anywhere from 18 to 2 yeah oh oh oh 10 000 well no studies yeah published studies yeah and if you're doing something I think that then you wanted to probably not run all all your stuff you're doing here if you run a more powerful computer so we have a cover this year but hyperforcing doing really easy to do run full data getting stuff on servers everything works which are we don't try and do that but we don't try and do large large files and like next year so no don't you have a good so it's if we break it so uh still still based is channel viable dot if this then first you will have to initialize exactly remember how we initialized those door gates via vector or with the negative infinities now you have to initialize it to your matrix you couldn't do question mark matrix and see no she told me that yeah okay matrix matrix yeah just matrix can everyone type in library flow type in library archieoptimics just to make sure this is I'm sure it will work from the first try did everyone's uh gates kind of look like this at some point yeah okay did it work for you oh well let me show you what you should add in your store gates that's the problem that's the thing you can or just sometimes and just want to do it and you don't want to waste your time because you can't see what it is doing. Just type in, no, just not the truth about it. Can we move to flow type? Yeah. Yeah, okay. You can do it with our coffee break coming up soon, so you can do it. Okay, so flow type. It didn't? No. Okay. Some things can happen to you because we also have this one to know what you're also replacing with like this. So if the channel is this, usually not. Else, we do that. But then you also do that again. That's what the problem is. We replace whatever volume of the century. Oh, is that just another thing? Oh, I shouldn't do that. That was there originally. I actually saw it. I also found that we've got to know what we've got. So remember our guptamix.plot with the red and yellow and blue and green with the phenotypes in it. That's the one that we want. In order to create that plot, we first have to find all of our proportions of all of those phenotypes. So let all of our phenotypes, for example, one of the phenotypes will be CD4 positive. We have to calculate what is our proportion of CD4 positive in each of the samples, right? That's what flow type does. So flow type is a package which calculates the proportions in all phenotypes possible in your data. So you give it your flow set, rather your flow frame, actually. You give it the markers that you're interested in. So we're not going to be calculating phenotypes based on forward and side scatter, right? So we're not going to give it those. We're going to specify we only care about CD4, CD8, and KF67. So when you call flow type, the first thing you do is give it the one flow frame. The prop markers, the proportion markers, the ones that you care about. Here they are 7, 6, and 4 because CD4 is the seventh channel, CD8 was the sixth channel, KF67 was the fourth channel in our data, in our flow frame. Then you have these methods that you have to supply to flow type. And that's the method. How do I separate positive versus negative? What method should I use to tell for each of those proportion markers you're giving? What method should I use to separate the positives versus the negatives? Just the gates, right? It has an alternative where instead of the gates, you could specify some automated method of calculating the key means or something, but that's not going to work very well. So we're just going to always work with our own gates. We're going to first calculate our gates and then give them to flow type. So here's our gates, store.gates, and what does this do? Does anyone know? Why do I have this? Let me actually do this in the virtual box. So these are our gates. Remember the CD4, CD8, and KF67 are this. To get the CD4 one, I can just go like this, right? If I wanted to rearrange those, or subset them, just taking these three for example, I could do that, right? So sometimes when you have many, many parameters, many channels, many gates, you may accidentally get the order wrong in one of them. So in order to keep consistent your order, instead of subsetting things with indices like one, two, three, try to use names as much as you can. So the reason why I have this here is because I'm using, actually remember how I had CD127 before and now I don't, right? So if I hadn't done this part, actually I should be like this, right? I can't just give it store gates because that has four things, but flow type expects me, I'm only, I told it that I'm only interested in the three markers. So which gates should I be passing? Instead of saying, I know that the CD127 was the third gate, so I could have said store gates, give it the first, the second, and the fourth. That's the same thing, right? So that I don't get myself confused with the seventh channel of CD4, but the first gate corresponds to CD4, I don't want to get myself confused, so I'm just going to call it CD4, CD8, okay, I67. I will always use that order because alphabetically. Yeah, that's exactly it. So this part makes sense that the method that flow type is going to be using to define my phenotypes, which cells are positive, which are negative, is the gates that I have calculated in my own way. However you want to calculate them, flow type does not care how you calculated your gates. You could have just given them numbers, manual numbers, two, four, six. Or you could have used flow density, or you could have used any other method you want. And then just for the purposes of flow type making sure that it's going to give them nice looking labels so that the phenotypes are going to have nice names. I'm using the, first I'm redefining the call names of viable.fs, and I'm going to just give it that so that it knows what are the antibodies that I'm working with. This is how you can rename the call names of your flow set. Remember how before they were forward scatter, forward scatter, side scatter, V650, whatever stuff. At this point, from now on I'm going to be looking at phenotypes. I don't want to have phenotype V650 plus, 4, 7, 80, they are minus. So instead of using this, I'm effectively going to get a minus. Yes. Technically speaking, you could have done that as a very first step when you first read in your flow set. You could have renamed everything. I didn't do that because sometimes when I'm plotting things, flow density also adds the channel name and the antibody name. Some people, I know what it has meaning, which the texture was used. So that's why I didn't do it earlier. But at this point, all that I will be caring about is the phenotypes, which are based on the marker names exactly. Does it make sense how I renamed the call names of VivalFS? Does that make sense? I just replaced whatever that would typically return to me with the thing that I wanted to return from now on. Okay, so this is what flow type, how you call flow type for one, flow frame. So now FT1, what it's going to do is it's going to take your flow frame, the first one, VivalFS1. It's going to go through all of these, it's going to go through CD4, CD8, and KI67. And using your gates, it's going to calculate, okay, how many cells are CD4 positive? Then it's going to go, how many cells are CD4 positive and CD8 positive? How many cells are CD4 positive, CD8 positive, and KI67? How does it do that? It just uses the same things we did before. Which express CD3 or CD4 is greater than 1.67? Intersect width, which are KI67 greater than the KI67 gate? So it just uses a combination of which and intersect statements that we have been using so far, but it's going to do that for all the possible combination of these three markers in pluses and minuses. So it's just for your convenience. It's not doing anything super crazy, it's just saving you some time. It's the first flow frame of the, it's like a list, right? You don't have a vector, you have single brackets. A list, you have double brackets, because a list is more of a deeper structure than just a simple vector. A vector just has a number, like 1, 2, 3, 4. A list can have much more complicated things in it. So for that reason, we're like, if an actual programmer heard me explaining this stuff, I'd write it. But you were talking about stuff that's just intrinsic. That's just the way it is, yeah. It's just learning the language, yeah. We only have so many characters in the world. Yeah, yeah, exactly. So what does FT1 contain? Did you guys take a look at it after you calculated it? Like when you print it to the screen, it has a lot of things. How would we like, instead of like scrolling up all this way, how could we explore it a little more efficiently? FT1 at tab. Did everyone get to that? Yeah? I typed FT1 and then at, and then I pressed tab to see what are the possible things. So let's look at the first one. What is that? So let's see the first entry of that. City 4 minus C8 minus 700. There's 700 cells, which are city 4 negative, city 8 negative. And see what I mean? Like if you had to, if I asked you to, on your own, can you calculate how many cells, can you count how many cells are in the phenotype city 4 negative, city 8 negative? What would you have done? You would have had one line of quotes saying, the city 4 negative indices are which express my frame, city 4 is less than the city 4 gate. And then intersect that with which express f, city 8 is less than, blah, blah, blah. That's what full time does for you. It kind of just takes care of that. Will that work if you use the sort of decking of people? Yeah. If it splits it into three? Damn. Yeah, so not this one, but the new version will have three that I will instead of just always having to call it negative or positive, you will be able to have a medium population. Just a medium population? Yes. I'm actually not sure. I don't know, I hope to ask. It should be coming up or Ryan should be making them make it come out very soon. Kieran and Nima? Yeah, I'm just waiting for it. You can't make them do anything. Okay. There you go. It's coming out soon. So does this make sense? What full type has done here? Does it make sense? Okay. What else does it have? For those of you who are interested in MFI's, it has also done that. If you have such a data set that you feel is going to benefit from using MFI's instead. These are transformed. Yeah, this is where it's starting to get exciting. This is where it's going to get exciting. You would not want to be doing this with scratch, huh? You would be giving me an Excel. Oh, yeah. That's the other place we play with this. So that's where you would be able to do this out? Which you haven't tried so much yet. I haven't. So there's two options. One is you can... Let's see. We did the logical transform, right? Inverse logical. How does it work? This is like coming out one by one in section. Exactly. Wake up. Wake up. Whoever... It's funny. You're not supposed to tell anyone. Yeah, so there you go. You can probably do that. I have never actually done that. So, you know how we define the logical transform? And then we apply that to our data to actually transform it. So now you can define. Yeah, why don't we try it? Is it important to want to compare the MFI? It depends on what you're doing with it. See, that's... First of all, I don't like using MFI's because there's variability that I really don't believe when they tell me, oh no, trust me, this year there's no instrument or day-to-day variability, right? I don't believe that. So I don't use it, but some people insist on comparing it for maybe for standardization purposes. So, the project that Ryan is involved in right now is trying to get different centers to standardize their panels for certain things. And part of the procedure, and how do we make sure that all these centers are doing what we are telling them to do, is we actually try to get their MFI's to match. That would be nice. So, in that case, you would probably want... There's no need for a package. I mean, you would have to give it... What do you mean by package? I mean, all you have to do is just... I see what you mean. I can't remember if... Lauren, did you hear her question? Does the FlowQB package have a converter between MFI and MESF? No, that would require a lot of information you'd have to give us about what voltage you're using and things like that, right? In order to be able to convert between your MFI... If I just tell you, hey, my MFI over here is 700, what's the MESF? I need to have some things to transform. Yeah, exactly. Yeah. Okay, so you push that into a system and you see what your MFI values are. Yeah. Then you get out the information on the MFI curve that you're going to find your unknown MFI MESF value. Yeah. Which is something that FlowExcel is what's done right now. Yeah. There's one or two. There's only one. We need to make that right now. Okay, so we need to have MESF units for the fluorescence of the receiver here. So for a CB4, CB4 negative, CB8 negative, it looks like we're pulling it... Then... ...for AI-64, but you know, 1.5 and 9. Yeah. Let's say I have somewhere else an MESF value. Yeah. It says that 1.5 and 9 is actually 20, for a recent office or for a real MESF. Is there... How would you transform this number back to the original one? Is that the question? Yeah. Yeah, yeah, yeah. So remember how what we did was we defined our logical transform. So here's what we did. Then I did my value was 220 or whatever you said, for example. And it's 0.6, actually. So now I want to go back though. From 0.6, I want to go back to 20, right? So here I did... I wasn't sure if that was possible. So I did question mark, logical transform there. Something, something, something, something. And typically, C also is what I go to when I know logical transform is not what I'm looking for. But something related to it, I go to the C also. And there was something called inverse logical transform, which I thought maybe that's what it is. And I scroll down to the example section so that I could quickly test out my theory. So here's what I'm looking for, right? In logical... I misspelled it. I see 19.9. I mean, I didn't give it the exact value, but there's always a little bit of an error when you transform then reverse the transform because logical, there's no exact closed form solution to it. So it's always slightly estimated. But so let's do... You want it to get this one, right? That's just 390. So... Yeah, that's one way. So I'm going to... Just... We first defined a logical transform. First I defined the transformation function I will use. I didn't actually transform anything. Notice that here, when I say logical transform and I open and close the bracket without putting anything in it, that means that I am accepting the default values. In this example, if you notice, someone defined a logical transform, they did not accept the default values. They said their own. They decided they know better. Right? So... Yeah. Yeah. And then I transferred my value 20. Previously, yesterday, we transformed all of our expression values of f for every channel. We transformed it using this logical transform. Now, I did my thing, blah, blah, blah. I calculated my MFI and it's 1.59, but I don't really care. That's the logical transform 1.59. So I defined an inverse of my logical. Right? It will not be the same as their inverse because their logical uses different parameters than I did. So it's important that you define explicitly how is our, from now on, going to invert the logical... my logical. And then I applied a transformation on this. So in fact, if you want to visualize it a little better, oops, because I misspelled it. The inverse logical of the logical of 200 is 200. I think we're... Yeah. Yeah. Does that make sense? What do you mean? Maybe. Yep. At t1, I have to analyze it. Yeah. So it's based on... What if we want to see the expression values for the other? Then you would have had to include that in flow and flow type. Yeah. We have to cross around it. Yeah. Yeah. You... So the thing is... No, so the thing is you can do it on your own without flow type. Right? So you want to see what is the MFI of CD127 for this subpopulation. Then you would find that subpopulation, find the indices. Right? So CD4... Flow type doesn't really give you a way to do that. Oh wait, maybe it does. No, it doesn't. Sorry. I would have to actually do the dirty work that flow type is doing for me. So flow type then goes as far as you wish it had, basically. Because I guess typically when you have, you would include those in flow type typically. Not to include in your flow type expression of something. Then what you can do is give those to flow type and get your MFI's that way, but you don't necessarily... You can just ignore the phenotypes that you don't care about, right? They'll cluster up. Yes. Oh no, but they will be included, yeah. They'll be included. So yeah, so let me just clarify this for everyone. So here's all the phenotypes that we've done. If we had included CD3, we would have had phenotypes CD3 minus, CD4 minus, CD8 minus, blah, blah, blah. But then we would also have a bunch of phenotypes that don't have CD3 in them at all. So it doesn't hurt to add an extra parameter. It hurts because it slows you down a little bit. More computation, but you can just ignore the phenotypes that have the CD3 afterwards. For example, if you're interested in... For this dataset, for example, we were interested in the CD3 positive cells, right? To start with. We started with CD3 positive. We didn't have to do that. We could have let flow type do that for us by pretending that we didn't know that we should start with the left side. So we just added CD3 as one of the markers. And then in the end, we could have scrolled through and been like, oh, I don't care about all the CD3 negative ones. I'm just going to focus looking at these ones. Or who knows? Maybe flow type would have found something interesting for the CD3 negative cells that we didn't even think we cared about. Okay, so let's just... I'm going to... So it's, what is it? Time wise. 3 o'clock. We have a break. And then after the break, I will actually do flow type for about... Yeah. I don't have to do the last two slides, but yeah, so... I don't know if... Oh, that's going to work. How much time can I steal from your slot? So if we return after the break at 3.15, then I can take until maybe 3.45. Is that okay with everyone? And then Ryan's going to shorten his... He's going to speak faster. He was already going to speak really fast. He was going to try to finish early, but it's not happening. Sorry. Okay, I'm going to stop the competition. So, so far we just applied flow type to just the one flow frame, right? So for that one flow frame alone, we had all these phenotypes and all these cell counts and MFI's and stuff like that, but only one flow frame is not useful. What we want is to have all these phenotypes, those cell proportions or counts, we want them to be calculated for every single flow frame. Why? What are we going to do with that? Once we have that, we're going to try to separate the two groups. Yeah. We're going to try to identify which phenotypes have good p-values for separating between the groups of low survival versus high survival time. So, first we have to use FS Apply, right? Because we need to apply flow type to every single flow frame in our flow set. And for that we use FS Apply, right? So, how does FS Apply work? We first give it a flow set and then anything after that is going to be some function that is going to be applied to each frame of the flow set. And here's the function. You give me a flow frame X. I'm going to give it to flow type. I'm going to have the same CD4, CD8, KI67. Yeah. Sorry. Stupid. The function because it's in the brackets there is going to give you a flow frame because it's FS Apply on the outside. Yes. Sorry. Correct. Sorry. Yeah. I don't know why you're sorry for being correct. You should be proud. You should be saying proud, proud. No, it's good. It's great that you're following. Yeah, right. Okay. Never mind. Yeah, you are. Here you go. Welcome. It's just asking stupid questions. It's not stupid, but okay. I will carry on. Sorry. It's not stupid. I'm sorry. I'm going to give it the gates just as before and the marker names so that it knows how to name the phenotypes. It's not going to be the seventh marker plus, sixth marker minus. It's going to be CD4 plus, CD8 minus. So prop markers is a function for flow type or a... Prop markers is a parameter for flow type. Yes, exactly. And methods is a parameter for flow type. Yep. And marker names is a parameter of flow type. Yes. And cell frequencies is a parameter of flow type. No. Cell frequencies is a parameter of all probability that it is. No. It's a description. It's actually an attribute of what the flow type object is. Remember in the virtual box how this is what flow type returns, right? So in one line, I have added many things, which is a call to the function flow type with all these parameters that flow type wants, and I'm not going to be returning for each flow frame the cell frequencies and the MFI's and the blah, blah, blah. I'm only going to work with cell frequencies now. Okay? Keep it simple. One thing at a time. I can rerun this and use MFI's instead, but that's a whole other workshop. Not a whole other workshop, but that one you can do on your own time. I'm sure. And then because you want it to do for every row, it's in row frequencies. I'm dividing by n row of x. So this is what's happening. For the first flow frame, what's going to happen is flow type is going to be called, given that first flow frame with all these parameters, it's going to return the flow type object, but I'm only going to be interested in the cell frequencies, but all of those I'm going to divide by the number of cells I have in my flow frame. I don't want the cell counts, because remember how this is what it returns. 700 cells, 46 cells in this phenotype, 3200 cells. I would rather have this where it's 11% of the cells are in C4-C8-0.7% are in C4-C8-KI627+. I agree with you that it's very a little bit too much in one line. So you have to read a carefully one piece at a time. And I haven't seen it yet, but as far as I understand, in the newer version of flow type and archaeoptimics, this stuff should be simplified a little bit. You shouldn't have to specify that many things. Some of these things it should be able to infer on its own, right? It should be able to... I mean, when you give it the flow set, first of all, it should be able to automatically get its call names, right? Why should you be giving it the flow set and the call names? Anyways, it's going to be a little bit simpler. There are going to be slightly fewer parameters for you to try to decipher what's going on. But right now, you're learning the harder version, tougher version. What is X? Function of X. So for the purposes of this line of code, I am writing my function which will be applied... Because I have this FS apply here, it's going to be applied to each flow frame. So the flow frame that I'm currently working with, I'm going to pretend it's called X just so I can tell R what to do with X. Yeah. So does it make sense? This is going to return basically my cell proportions. Again, I prefer to annotate my variables as I go so that I don't... It doesn't become out of order for me. So just like I like to keep the C4, C8, K6, to 7 always there as a name as opposed to an index. I'm going to name my rows according to the sample names, right? I just want to keep track, you know. Which every row of... When you print out flow type, the flow type result, every row corresponds to one patient. So the first patient's sample name was this number dot FCS. That was the sample name. And then the cell proportion for the C4 negative, C8 negative phenotype was 13%. The C4 negative, C8 negative, K6 to 7 positive was this percentage. Yours may be slightly different because you had maybe slightly different gating than me. Do the entries of this FT object, the flow type object, make sense to you? What is it? Every row is one patient. And every column is one phenotype, right? And what? Yeah, sorry, that's what I meant. That is the... Yeah, yeah. Stop, yeah. So let's... I'm just thinking, it's hard to keep track of the number. Yeah. So we have their... Yeah. C8, K6, K6. Yeah. Those are referencing the column. One, two, three, four, five, six, seven. The seventh marker, C4. The sixth marker, C8. And the fourth marker, K67. It's so that flow type will know what marker order I want. I wanted to first start looking on the C4 and then start looking on the C8 and then the K67. Okay. That's where my brain is just connecting. It's pretty easy to add. You need to talk about naming things. That's because flow type wants it like this. I didn't make flow type. There. Maybe it will be fixed in your version. I don't know. Okay, so there is something. Yeah, yeah, yeah. So we are okay with what flow type represents, all of our information that we want. What are we going to do with this now? We're going to try to evaluate how well these phenotypes separate the two groups of samples that we have. We have some that have fairly low survival time. Some that have fairly high survival time. In fact, let's actually define what we mean by low and high. Remember though at some point we had this plot where we were gating the live cells and that was not here. No one. Remember how that was yesterday, I think. Some of the viable cells, the ones that are CD3 positive live, had almost no cells in them. In fact, why don't we check how many cells. Notice how the first one, it has about 6,000 CD3 positive viable cells. Great. This one has 73. This one has only 16 cells. Do we really want to trust the phenotype percentages of 16 cells? Probably not. First, let's do some quality checks and flag some stuff. This data set has over 400 patients. Some are bound to have no live cells. Who knows for what reason. You want to make sure that you don't accidentally give this stuff to flow type and then flow type gets confused because, you know what I mean? You have to make sure you catch the ones that you shouldn't trust. So here, let's remove the low cell count patients. So remember FS-apply-viable.fsn row. That gives me what I just did. That gives me all the cell counts as numeric makes it so that it's not a matrix but a vector so I can actually do the less than 1,000. So I've decided just out of the blue, out of thin air and decided anything that has less than 1,000 cells in it, I'm going to throw away for now for the purposes of this workshop. Otherwise, I would flag and tell the biologists, here are these samples. Do you think I should remove them from the analysis or do you think that they're very important to keep them even though they have low counts? Sorry, yeah. It doesn't work if you're drinking Pepsi or whatever. So I have identified the indices of the samples that I have which have low cell counts. Is everyone okay with this? This isn't... A variable. It's a variable, yeah. It's a witch, you know? You can even print it out and see what it says. Commence yourself that those are the roles that have 16 cells in the row, whatever. And the minus, it tells you to subtract from... Those rows. Subtract those rows from... My matrix of phenotype proportions. So here, now because I have removed those, I should also remove those same indices from the survival times we had. Remember the survival times where the days that the patient survived? I'm just gonna... Just in case I decide to go back on my removal of anything less than a thousand, let's say that I run this and I'm like, maybe I could add a few more of these, maybe 500. I'm just gonna keep track of this variable. This is just for my own purposes. I want to keep track of this variable before I remove the information. You lost it. Yeah, good point. Maybe I should have saved that one. Excellent point. Now when you look at the values that are saved in survival... I'm actually not even gonna bother with looking at it. You guys look at the values saved in survival. So just type survival, enter in your console, look at the values. I personally decided that to define my group of people with low survival, I decided to use the value 1000 as a cutoff, a thousand days. Patients that survived less than a thousand days, I'm gonna call those as group one, low survival. Patients that survived greater than a thousand days, I'm gonna call high survival. Do you agree with my choice of value? For the purposes of this virtual base. Otherwise, you wouldn't. No, you can't because you haven't even seen the whole data. It's 466 patients. And I only picked a few. And we wouldn't be doing this kind of analysis on your rules. We'd be doing slightly different. For illustration purposes. Typically you might have a group of patients that has a disease and a group of patients that don't have the disease. So those would kind of define your group one and group two, right? So otherwise, maybe when you go faster it continues very well. We need to do something else. The fancy way of saying that you have to use the receiver or the operator here and the real way of doing it is you just punch along and try it. That's the difference. Choose the one you like. Yeah. But in order to use dark out for the ARCoptimax plot, we needed two groups. Okay, so now that we have our two groups, we have to come up with some p-values. Remember how the ARCoptimax plot with the red and the blue and the green and the yellow? Those colors are based somehow on a p-value or some kind of statistical significance estimate. So here's one very straightforward example of how to calculate a p-value just to give it some kind of visualization property. In R, just... I got it. Okay. That was cool. That was weird. That was all of me. And that was related to all of us here. I love to have a big picture around me. Yeah, they're like, no. No. No. No. No, it's a little creepy, yeah. Wait a minute. Are you, like, hurting the nature? Wake up. That's creepy. That's true. I apologize. It's all right. Okay, so... That was just the NSA. So we have to define a p-value. Here's one way to define a p-value, one p-value. A t-test of... What does this have? FT. That's the matrix of all of our proportions, phenotypes. Group one. I'm only taking the rows that belong to the low survival patients. And I'm choosing this phenotype, c-4, negative, c-8, positive. Comma. So the t-test, the function t-test, it takes two vectors, x and y. And x has the numbers from one group, y has the numbers from another group, and the t-test tells you, are these two different groups, or are they kind of the same? That's pretty much what it's doing. So the first entry that I'm giving t-test is the proportions of this phenotype, c-4, negative, c-8, positive, for the low survival patients, group one. The second thing that I'm giving and the second set of numbers are the proportions of c-4, negative, c-8, positive cells in the high survival group. I'll give you just a moment to play around with that and see what the t-test gives returns and see that when you take the p-value, p dot value attribute of it, it actually gives you the p-value. What is it? 087. 087. Yep. Unfortunately, here today, we're not going to see anything particularly exciting because we have very few patients and very non-exciting colors. So tell us on p dot value is what retains the p-value. The p-value. It tells you that it's true or true and p-value. Yeah. You can delete that, run the thing again, and then see what it gives you then. You can see that the t-test function returns some things, including a p-value. It also returns some other things. I'm only interested in the p-value, but you guys should totally check out how that works. That means that you mistyped one of these things. Did you not run something earlier? The groups. Did you just find the groups? No. How about the group one and group two? Is that what your group one and group two look like? Are those the exact values? If you print them out in your console like this. What's that? Yeah. An FT looks like this. A bunch of numbers. What if you do n-row FT? It's because you ran the command that minuses the low, remove low. You ran it twice. So you removed a few extra than you wanted to. Your FT, yeah. So you're going to have to rerun that where you define what FT is to begin with. This is what happened. Because you just find the... Oops, I meant here. This line removes some rows, right? Remove.low. What is remove.low? So it was rows that had less than... A thousand, yeah. So I defined that to be that. And those were rows number six, seven, eight and sixteen. So she removed those rows. But then she ran that line again and removed those rows in the newly defined matrix. But none of those rows would have had less than one. She accidentally ran it twice. Just like when you accidentally logged in, you're racing so you're logical. Yeah, so she removed four rows in this line and then she removed four rows again. That's why she ended up with 12. Did you rerun it, Suzanne? And now you can... Then see if you can recalculate your P-value. Everyone's okay with this now? Yeah? Good. So that's one P-value, but I want all the P-values. I want them for all the columns of Ft, right? This was just for the column that was the phenotype C for negative C8 positive. But I want them for all of the columns now. So I'm going to initialize a vector P-values to be one. I'm going to repeat the number one, the number of columns of my matrix that has all the phenotypes. Because each column is a phenotype, so the number of columns gives me the number of phenotypes. I chose the value one this time because P-value of... the worst possible value I could ever have is one, right, for P-value. So if something goes wrong and I made a bug and I skipped one phenotype, I don't want to accidentally think that it has a P-value of zero. That would be really great. I would accidentally mislead to think that I found some magical phenotype. So that's why I chose my value to be one here by default. Then I'm going to look through each phenotype, so one to the number of columns of Ft, the number of columns gives me the number of phenotypes. I want to go through all phenotypes. And this is tricky. First, ignore these three lines, the if and the else thing. Look at just this line. P-values I, so the I is the phenotype number that I'm at, is the t-test of Ft is my matrix with phenotype proportions. Group one is the rows of the low survival patients, group one. I is the I phenotype, so I'm t-testing this versus this. Is it the same or not giving the P-value? Does this make sense? Just this one line, does it make sense? Okay. Ideally, that would be the only thing inside of my for loop. However, a P-value is not defined. It returns an error if you try to compare two things that are identical. So if I give it, here's my vector x is the values 0, 0, 0, 0. Can you compare those with the values 0, 0, 0, 0? It's going to give me an error. It's going to say this is, because in order for a t-test to be calculated, it calculates the standard deviation of your data. And when the standard deviation is 0, because your values are constant, it tries to divide by 0 and unfortunately doesn't handle the error very gently. It just crashes. I didn't mind that function with P-values, the t-test rather. So for that reason, before I call the t-test, I'm going to check, does SD is a function in R which calculates the standard deviation? Make sense? SD, standard deviation? If the standard deviation of my whole phenotype, that one phenotype just happens to have probably zero cells in all patients. Imagine some phenotype with some really weird combination of markers that just will never biologically happen ever. So you will have zero cells in that phenotype. There will be no variation in that phenotype between the two groups, of course. And your standard deviation of those phenotype proportions for your whole data will be zero. So if that occurs, then leave my p-value at that spot to be 1. Because I want to say the p-value of this phenotype is 1. 1 is the worst possible p-value you can have. So this one is definitely a bad phenotype to differentiate the two groups. Does that make sense? What you would have done if I hadn't told you this, is you would have tried to maybe do just this one line that has the integral part of the for loop, the p-values i equals t-test, blah, blah, blah. And then you would have run this, you know, and being all excited that it's going to do the right thing, then it's going to suddenly tell you error, something constant, values undefined. And then you would have had to spend some time reading through the help for the t-test function, which is what I did, you know, this is what I actually do. I don't know these things either, the first time I do them. Then I read through the help and then I realized, oh, okay, this must be what's happening, there's no variation, so it's, you know, I can't divide by zero, so it's giving me an error. So I better make sure that I don't ask it to do that for values where an error is going to occur. So I'm going to try to catch my error scenario and do something meaningful with it. So that's why this is a little bit of a complicated for loop, because of that if statement. Next I want to make sure again that my p-values don't accidentally, you know, forget which phenotype corresponds to which p-value, and I don't want to know like the first phenotype is CD4 or whatever. I'm going to just use the names. So I'm naming the p-values to have the same names as the colon names of FT, which is the phenotypes. Remember the columns of FT are the phenotypes. And then you can print out p-values and see if you have your p-values and each p-value is associated with a phenotype. Does everybody have that? Everybody has their p-values? Yours might be slightly different because you're a slightly different game or also, you know, use the pooled frame for some things and yours was random, so... Okay, we have our p-values, right? Now the next hard thing, our key-objects. This is a lot of information for our key-objects, so if you don't fully comprehend everything, don't worry because it's supposed to be simplified in the next upcoming version. Some of these things will be sort of automatically generated by default for you. So I wish that it was already ready, so I didn't have to go over this. So Archeoptimx takes in some variables that you must specify and it crashes if you don't. So you have to do it exactly as I have done here. The first variable that it requests... You know how we have all these phenotypes? They're like CD4-C8- That was the first one I think we had and then we had CD4-C8+, K67, blah, blah, blah. This variable signs is actually a way to encode where do we have the minuses and the pluses in this phenotype? We have these three markers and we need a way to tell the computer which phenotype are we looking at? We have our three markers, they're always in the same order. Is it the minus, minus, plus phenotype we're looking at? Is it the plus, minus, plus phenotype we're looking at? What phenotype? The way that this was programmed was if you have a minus, we use the value 0 to encode that. If you are not using that marker in this phenotype, we have the value 0 to encode that. If you are plus, we use the number 2. So the phenotype CD4-C8- is going to be 0, 0, 1. 0 meaning CD4 is the negative, that's the first marker. 1 means K67 is neutral, we're not using it in this phenotype. A little too much I know, it's how it was programmed. Unfortunately, this version of Archaeopteramix requires you to specify this matrix, it doesn't automatically generate it for you. And this is how it's generated and it's always the same. This three here is the number of markers that you have, and you're doing this at home, just look at the help file on Archaeopteramix and follow the example word for word and just change your information, the number of markers that you have. And you annotate it again just to keep track of things, make sure they're consistent, their own names are just the phenotypes and the column names are, this is our first marker, second marker, third marker. Let's move on, I'm not even going to ask you if you understood that because it doesn't matter, it's something you must supply and you just can copy and paste it without attempting to have any deep understanding about it. That's the first thing you must supply. Then you must supply, this is the important part, some kind of statistical measure which will tell Archaeopteramix how to go about coloring those plots and making those arrow thicknesses. We have already computed our p-values, right? That's what we're going to use for significance. You know how the lower the p-value is, the better. But, you know, when you're coloring things, you want it like the higher the number is, the higher the score is, you know? So instead of going lower by p-value, I want to go high, so taking the negative log 10 of the p-value does that. It doesn't have to be the negative log 10, it could be, you could define however you want, but just use this, it's a standard way. So for example, if your p-value is 0.01, the log 10 of that is negative 2. And that's why I put the negative, so now it's 2. Does that make sense? If it doesn't make sense, print the p-values and then print the negative log 10 of the p-values and convince yourself that it corresponds to the lower the p-value, the better, but now the higher the negative log 10 of the p-value, the better. What Archeoplemics does is it creates this like complicated tree of, you know, the phenotypes, how they, you know, you go from starting with no phenotype to going to C4 and then adding another marker to C8 and then another third marker, but adding negative and so on. It's a really, really huge tree, and if you looked at the whole thing, you would not make any sense of it whatsoever because it would be way too many things, especially when you have 13 colors, if you were trying to look at every possible phenotype in one plot, you would not be able to see anything. So Archeoplemics tries to get rid of the ones that don't really matter so much. All the phenotypes that have such a high p-value require you to look at them, kind of tries to hide them in a smart way. And so the start phenotype, it's the one that's going to be at the very bottom of the tree where it's one that you have noticed has a fairly low p-value and you want to make sure that no matter what, Archeoplemics doesn't accidentally ignore some high p-value phenotype just before that one and then miss this one that you know is important. It's just one way of trimming the tree so that you don't look at a ginormous tree but you look at just one. I believe this part will also be optimized. I don't know in the new version. And so what does 0, 1, 2 mean? You can, there's a function in our called sort and it sorts things for you. So if you wanted to kind of read through the first few p-values that were the smallest ones you could just do sort p-values and then just kind of read them. Okay, the smallest ones, 0.11, that's pretty bad, right? But for today, with so few samples, that's what you can expect. With so few colors, that's what you expect. And especially for me with my crop gating, not exact gating, right? That's what you expect. You expect the p-value to be not amazing. So this is my lowest p-value and this is the phenotype that it corresponds to. Is it worse? Yeah, yeah. It's like we're trying to get it approved. There's no doubt. There's no... No. Who knows where I diverged. Anyway, see, this wouldn't... This, yeah. Then maybe the case is that like Ron gave an example of some... What was the data said that you said, Parkinson's? There's nothing interesting in this data. I'm going to go out on a limb and say that that's the case because I have selected very few patients and removed most of the interesting colors probably by accident. So here's one way you can show that there's nothing going on. So this phenotype here, C4-Ki67 positive, for me that's the one that had the lowest p-value and I want to tell our key optimist that when you're painting this tree, please include this one. I want to see it somewhere in there. It would be... I know it's important because of the p-values and so I'm giving you a hint to please use that one. So what did I say it is? C4 negative, there's no CD8-Ki67 positive. So C4 negative, the code for negative is zero. Here it is. CD8's missing. The code for missing is one. Neutral. There's no CD8 negative or positive. It's just not there, not using it. K-i67 positive, the code for plus is two. So this is what I have done here. Then I have looked at a couple of other sort of interesting looking maybe phenotypes and I have included them as well in these other archieoptimist trees, tree-like structures, you know. And then you can merge these trees into one because a lot of them are going to have overlap, right? Some phenotypes are going to be common to all these trees. You don't want to be looking at three of them when you could just combine them and it will combine it in the best way possible so that you can just simply look at it without looking at double information. Do you need to have to start from the top? Sorry? Do you need to have to start from the top? Yeah. If you don't, it won't be very interesting I think for this data set. You can try it without. For here, you probably will not see anything particularly interesting for this data. But for yours, you can try it without and it will automatically try to find the best ones. And again, in the new version, it's going to be easier to set the parameters and hopefully more intuitive. So keep that in mind. Now, the way to save your view is view and save your plot. So here's Stephen what you were asking about how to save an image in R. You can do R has this function pdf which basically tells the computer to start creating a pdf document. It doesn't actually open it up and plot it for you. The computer is plotting it on the inside. And you give it the final name that you wanted to write the information to and then you create your plots. So plot merged is going to know that this merged thing is some kind of Archeoptimix object and it's going to use the plot function supplied by the Archeoptimix package. It's going to know how to plot it. But in order for it to use the coloring, you supply the negative log 10p values. It's going to print them as a color bar and use them to color the plot. Now, to tell R that you're done plotting and it should finish writing the pdf file and not wait for another plot, you do dev.off device off. Turn off the pdf device. Whatever little thing the computer is using on the inside to create this pdf document, you can tell it that we're done plotting. Now look at where I have saved it and the virtual box go to that folder and open up your... you can open up the file. Did you see what I did? I went to the folder in the virtual box. I went to, you know, Documents Workshop and right here is where you should see a pdf file. Does everybody have a pdf file? I chose my starting phenotypes, so 0, 1, 2, 0, 0, 2, whatever. Does yours look different? Okay, good. I chose my starting phenotypes so that it will look pretty. You guys, can I see what yours looks like? No, it's not that different. Oh, can I see? Yeah. So really, really very... Yeah, yeah, yeah, okay. So for you, this was much better than just in case it's along. Whereas for me it wasn't. Yeah, it's not very good. I agree. So how does one interpret this plot? Remember, we're looking at this. The higher the number is, the better the phenotype is, right? The more red, the better. What does this say at the top? All cells. What phenotype is that? No phenotype, just all the cells. And we're looking at proportions. So these are all 100%, 100%, 100%. This has a p... or the score of 0 because you can't predict anything when you know nothing. If we only looked at CD4 negative proportions, still not very good. It's light blue. It's not particularly informative. If I was designing a panel for this, I would say CD4 alone is not sufficient, clearly. KI67 alone might be sufficient even because it looks fairly high. It looks almost... the best one, the best phenotype. But KI67 alone, to me, for my data, for my gating, looks like it might actually be enough. Going from CD4 to adding KI67 improves the predictive ability of that phenotype significantly. That's why the arrow is so thick. Adding an additional marker, C8, decreases it. So it seems like C8 adds enough randomness to my data that it actually makes things worse. So I'm better off just checking these proportions. When I'm looking at this, I would say pick one or two phenotypes that look like they might be fairly interesting. Not one or two. You would normally have way more markers, so you would have like five interesting phenotypes at least. Try to pick ones that have significantly different markers. I would maybe pick out this one and this one, although the common theme is KI67, so it's not particularly interesting here. But pick out a few of the significant looking phenotypes, go and manually gate those and confirm those are real, interesting phenotypes and make your conclusions accordingly. Make sense? And I think I will stop there. How does one output the e-values? By looking at the phenotypes? So you hear e-values. So it's just higher and higher. Yes, I forgot to mention one other thing that I did here is I actually saved my results as an Excel spreadsheet in case you were going to do something else with it. And you can just do a question mark right.csv on your own and see what happened there. You can also see it in the virtual machine if you open the folder. The same place where that rkey.pdf file was saved. Results.csv, comma separated value file. When you open that up, make sure that it's separated by comma. You can, you know, share your... or do whatever else you want to do with these numbers later using Excel. You can plot them, whatever you want to do. So FT was... remember FT is my... right after the dev.off function, I have a few more lines of code. FT was my matrix that had all my phenotype percentages. I used the rbind function to add another row to that matrix which also will store the p-values in there. I don't want to lose those. And then basically write.csv, you give it a matrix, I called the matrix results, and then you put comma, file equals, and you just give it a path, the exact file name including the full path where you want the csv to be written. Okay. Yes? I'm using the rbind option. So what do you want... Say you're looking at six different possible... just do the individual project, merge it to all six different... You can, or you can... If you have more interesting dots, to be honest with you, I did this because it wasn't going to look interesting otherwise. But you can try with not specifying any phenotypes, and if that's not giving you what you're... enough information, you feel like there should be more, then you can try to specify. And yeah, you can only merge two at a time. So the one without specifying is running without specifying, right? Yeah. But yeah, you would merge them two at a time, and then two of them merged, and then the third merged.