 All right. Hello, everyone. Welcome to the penultimate session for the conference. It's my pleasure to introduce Garth Kong. And he's graciously said that you can interrupt him with questions. However, his talk will be rather quick. So if it's a lengthy question, maybe hold it to the end. There'll be plenty of time. Take it away. Yeah. All right, thank you. Am I good to go? All right. Hello, everybody. My name is Garth. And today, we'll be replicating the interactive flow cytometry workflow in SiteSeq today. So without further ado, let's just get started. So for today's presentation, I want to go over several things. First, I'll give you a background of SiteViz. For example, what is SiteSeq? What are blood cells? What is flow cytometry? And about SiteViz, what it is, why would you use it and when to use this? And during the demo, I'll be showing one-layer gates, two-layer gates, something we call a back gate and how you can use those gating schemes to help inform further analysis, such as differential expression. And then lastly, I'll show you how you can get the code, how you can get the data, the presentation, and everything. So what exactly is SiteSeq? In summary, it's single cell RNA-Seq plus your cell surface proteins. And this is achieved using something called antibody-derived tags or ADT for short. And what they are are antibodies that have DNA barcodes on them. And these DNA barcodes have poly-A tails on them. And in the normal experiment, you would introduce these ADTs to your cells. They'll bind to the cell surface proteins and get carried through the Tenex droplet sequencing platform. And when these cells are laced in the oil droplets, the poly-A's from your mRNA, as well as your ADTs, hybridize to the poly-T groups in your beads. And then with further processing, you get two sequenceable libraries, so one for your transcriptome and one for your proteins. So on the data end, you'll get a matrix that's an RNA by cells and an ADT by cells. And generally, why we need these surface protein markers is they help classify cell types. So in this example, I'll be using cells from the human blood cell lineage and also for a little bit of background information, each cell type usually express a specific profile of surface protein markers. And so flow cytometry, on the other hand, is already a proven method to help classify cells using the self-surface protein markers. And so how these work is you quantify the cell surface proteins, and then you gate them in two dimensions. So for example, we have CD34 on the x-axis and CD38 on the y-axis. And to get to certain populations of cells, for example, the stem cells, you would do this iterative filtering called gating. So in this example, if we filter for these cells, these are CD34 positive, 38 negative cells. And then we take this subset of cells and then replot them using CD45RA and CD90. And over here is our stem cell population. So we would call these 45RA negative cells and CD90 positive. So how this relates to the previous slide is we would follow this path that we've made, 34 positive, 34 negative, and then walk this path to CD90 positive, 45RA. And then for common lymphoid progenitors, the CLPs, we would take a different path. So 34 positive, 38 positive, replotted using these features and then use that to label this cell population, and so on for a common myeloid progenitor, which are over here. So in the CNPs, we take this one extra path right here. So when we were developing SiteViz, we thought that if we can replicate the flow cytometry workflow in SiteSeq, then we can provide researchers more methods to help classify their cell clusters. And for the technical distillation of the question, can we build an interactive gating program in SiteSeq using the RSHINee platform? And we want to do this because we want to relate everything in antibody space. So on the left, we have a CD4 versus CD8, and then relate your cells in antibody space to dimensional reduction space. So there's a couple of reasons why we developed SiteViz. We found it extremely useful that it helps wet lab biologists leverage their knowledge of flow cytometry and applying it to single cell data sets. And users can also like, it'll help you do more rapid data exploration because the alternative is kind of painful because you have to draw like four borders. It's like, yeah, you draw that bottom border, you jitter it around top border, jitter it around left and right border. You take those cells, and then you re-plot it again. So the alternative is pretty painful. And we thought that drawing physical gates can get us much farther. SiteViz can help for VDM to data data. Yes, in this presentation, we apply it to PBMC. Yep. And lastly, we understand that these multiomic single cell data sets, they can be difficult to generate. They're also difficult to analyze. So we are doing our part to make the analysis a little bit more accessible to wet lab biologists and ultimately really to help the users help them facilitate their novel hypothesis generation to get to novel discoveries faster. And so when would you actually use SiteViz? So let's say you finished an experiment. You do your library preparation. You do bioinformatic processing. And you get out a processed SROT data set, like an RDS. And then we find it very useful because SiteViz kind of unites the bench scientist and the computational scientists together. It's not like you throw it to the bench scientist and they'll happily gate. We find SiteViz is best when used with both parties. And ultimately, yeah, that's all that's to help us get to the novel discoveries faster. So today's example data set is a PBMC data set with eight patients and three time points. It's integrated using the latest nearest neighbor algorithm with a total of 160,000 cells and 220 antibodies. And for today's presentation, we've sub-sampled it down to 10,000 cells, but we're just going to keep the 228 antibodies. All right, so what I'm going to do here is head on over to the orchestra workshop. I'm going to import SiteViz and I'm going to run it. And what I'm uploading now is a processed PBMC data in RDS format. And this is the one with 10,000 cells. Let it upload and I'm just going to head on over to the gating page. So for our first example, I'm going to start drawing some one layer of gates and we're going to go for B cells. So one of those features is CD27. And for the right now, I'm choosing my y-axis feature and that's going to be IGD-CD27. And so now I've drawn my gates. Right now what I can do is hover over that this feature scatter plot and then select my cells. And immediately we'll see those cells reflected over until they can mention reduction space. And this is really useful because you can do this multiple times and kind of help fine-tune your gates. So let's go ahead and just select this and then we'll give it a title, B cells, and then we'll hit the gate button. And then what happens is we've captured all those cells within this gate. So we started off with 10,000 cells. We ended up with 575 cells. OK, and so that was one example of a one layer gate. And then if you scroll it down to the bottom, you can hit a download so that you can capture all the cell barcodes that you've gated. So I'm going to give you another example of a one layer gate. Right now we are going to be getting for natural killer cells here. And one of the features is CD3. And the Y feature we're going to go for CD56. And again, we are going to create our selection. And we see that a majority of these cells are located in the natural killer cluster. And so we're going to go ahead and give this gate a label and then hit the gate button again. Great. All right, so what I've shown is two examples of a one layer gate. And now I'm going to take a detour and try to download those two gate informations and then take all their cell barcodes and do a differential expression analysis. So it can be as easy as importing Serot. I am importing the 10,000 PBMC cells. And then I'm going to read the B cells RDS gate, so B gate and K gate. And I want to show that for this specific class, it's rich with metadata. So for example, the number of gates that was used, what your assay was, what the input cells for that gate, what are the output cells, what was your x-axis, what was your y-axis, what are the coordinates of the gate that you drew? And then a list of like simple statistics here. All right, so and then to get your cell barcodes, it's as simple as B gate, dollar, gate one, subset cells. And there are your cell barcodes for the B cells. So let's go ahead and get those. And what I'm doing here is I'm making sure there's no overlapping cells between the two gates. Sometimes like these gates are entirely perfect. So what I'm trying to do is just make sure one cell doesn't appear in two gates. And then we can run a fine markers, comparing B cells to NK cells. And I'm going to go ahead and view with the output. And now we can do a sanity check here. So MS4A1, that's the top differential gene, MS4A1. And then so this gene should be highly upregulated in the B cells. And so is that true? Yep, I buy that one. And then for the flip side, this gene should be upregulated in the natural killer cells. So it's enriched in the NK cells as well, compared to the B cells. Great. So that was an example of how you would use these gates metadata to help facilitate further downstream analysis. In this case, it was just simple differential expression. I'm going to reopen the app, reupload my 10,000 PBMC cells. And in this example, I'm going to give an example of a two-layer gate. And this time, we're going to be selecting for the CD8-positive T cells. All right, so for the first gate, the first gate kind of selects four cells that are in the lymphoid side of the family of these blood cell lineages. So let's go for this CD11B. And CD45. And I'm going to go ahead and select this cluster here. And so we see we've captured all the lymphoid family of the blood cell lineage, but we also have some bleed over from into the myeloid cells as well. And so these gates aren't always black and white, but our next filter should really clean up the cells. So let's call this lymphoid and myeloid cells. We're going to hit the gate button, and then we're going to have to re-plot. So these T cells have a really high expression of CD8. So what we're looking for is a cluster that has really high CD8 signature, like this one. And those cells appear over here on our UMAP. So let's go ahead and call that CD8T cells. And we're going to hit a gate. All right, so what happened was on our first gate, we started from 10,000 cells, and we went down to 7,200 cells. And from that gate, we went from 7,200 down to 1447 cells. And then if you hit the download button, you'll get a list that contains both of your gates. And then so the next example I want to show is an example of something we call a back gate. So right now we are going from antibody space into dimension reduction space. So we thought that it would be interesting if we could go backwards from dimension reduction space back into the UMAP antibody space. And we envisioned this as more of like a top down type of approach. So for example, we can select this cluster of B cells. And over in antibody space, it'll show up where it is on the heat map. All right, so in addition to gating, SiteVis contains more features that allows users to holistically evaluate their data. So one of them is the feature co-expression, because a lot of times we're interested in the relationship between two features expression. So let's go ahead and oh yeah, so for example, to show off the differences in the monocyte cluster, we can input a CD16 and CD19, 14. So in this case, like the more red you are, the more CD16 you have compared to 14. And the more blue you are, the more CD14 you have. And that shows off in this specific clusters here. And we also went multiomic with this as well. So if you're interested in the correlation between the mRNA level and the protein level, then we can go multiomic as well. So what we're doing here for x-axis is CD14 in proteins. And on the y-axis, we're going to the mRNA. And here we're seeing the correlation between mRNA and protein. So let's see. So the more red you are, the more CD14 protein you have. Or if you're blue, then the more mRNA, CD14 mRNA that you have. And due to the sparsity of these type of data, we didn't want to help you misinterpret your data, because if 0 is low and 1 is high, then the color really exaggerates the difference. So we just made sure to include the unit of expression for both scales. So we don't help you misinterpret your data. And then we have a single feature expressions as well. These are always fun to play with. So here, let's say I want to really highlight the monocyte population, then it's over here. Or if I want to really show off the CD8 T cells, then it's pretty clear that they are here as well. And when it comes to quality control of your data, one of the most useful features we thought to include is to include common quality control metrics, but split by any categorical data in your Sorot metadata. So here we have RNA, the distribution of counts, and then split by the donor. So there were eight patient samples. And is there any confounder? It's like possibly four unique ADTs per cell. And in this case, to me, it seems like patient 5 to 8 may be pretty different from 1 to 4. And do I really believe that? Over on the left, we can just click around, just to explore the data a little bit. And so these are patient 1 to 4, and this is 8. So yeah, I believe they look quite different from each other. So I buy that. And then lastly, there's a clustering page. And 3D plots are supported as well. But this data set did not include a third dimension. And that's OK. You don't need to go three dimensions sometimes. OK, so I've gone over a lot of information today. Mainly for the gating, we've gone over one layer gates, two layer gates, back gates, and how you can use those gating schemes to help your downstream analysis, such as differential expression. And then for feature expression, we have multi-ohmic co-expression feature plots, as well as single expression. And for QC, you can just split common QC metrics by any categorical data in your surrogate metadata, as well as 2D and 3D cluster visualization as well. So to get the code, the data, and the presentation, just head on over to the bioconductor workshops. And then you'll be linked to this page. And so yeah, that's pretty much it. I was fortunate to work with a really fantastic team of scientists. The team at OHSU really kind of defined their need for a certain tool that can help them explore their data faster. And they really articulated what they needed. And for the University of Oregon team, they were the people who really took their requests and really implemented into the package that is site-vis today. And then for future directions, we plan to submit site-vis to bioconductor within the next month. And later, implement performance patches, long-term maintenance as well. And so yeah, this has been a really fun project. And I really look forward to more of the OHSU and University of Oregon collaborations. So thank you, everybody. Any questions? This might be in there. So this really is a Blade Runner reference. But can you enhance and then enhance on your gating? Because that is something that flow cytometers love to do, is enhancing on those gating options, right? Is that something available in your app? Just zooming in. You're just zooming in. Sorry. Maybe too old of a reference. OK, great. And is it agnostic to the feature type you have? So can I actually be gating on the mRNA instead of the antibody signature? Yeah. OK. So nature is really sparse now. We built site-vis so that, just to accommodate future changes in sequencing technology, we hope that it's less sparse. So we did keep that feature in there. But you just have to be very careful because the data is very sparse on the RNA version. Yeah, I wasn't sure about the labeling and how it would render inside of the plot. And yeah, OK, great. So the gating for NK cells and B cells and so forth is a cool example. But early in the presentation, you present some sort of more subtle gating from Jason's paper, for example, where LMPPs versus MPPs and some relatively more subtle types of splits. It wasn't super duper clear to me what the workflow would be for comparing when projecting back and forth when your clusters maybe aren't going to be as useful in resolving some of these populations. Could you comment on their relative ease or difficulty of doing this in practice with the current workflow? So in this example, we selected rather easier cells that are fully differentiated to test. And we have been, I guess, another way is to take one of those bone marrow sites seek and really try to start gating on those as well. We just didn't have the time to dive deeper into those bone marrow site seek and trying to reproduce other people's results or something. Yeah. But with PBMC, it's kind of the easier level one. So there's a question from online. I was wondering if you could save the code to for replicating the same gates with a new data set and for reproducibility. So I think you have a lot of the information stored in that data object. But I think the question is the actual code. Right, right. So you can put in a pipeline or something and make it more reproducible for a figure of a paper or something. Yeah, I think you could. I think the human would just have to do the exploration part first, get those parameters. And I guess we don't really have any functions to help them in that regards. But that's definitely a cool feature to implement though, make things a little bit more reproducible. Yeah, that's a good point. Thank you. Great work. Especially the idea of gating is pretty useful because I've been struggling with not PBMC but some other cell types where you have to figure out what are the new combination of protein markers which can be useful. So one suggestion would be to really suggest actually at the time of gating what protein markers can be useful to really get against. If there's new data set, you don't really know. And basically, if you run Find All Differential Marker based on some clusters and then kind of suggest that these are the potential options which you can get against, that might improve the usefulness of the gating strategy because the unique IP of this method is gating. And I agree with the previous suggestion, like just concentrating on the protein, it might be downselling a tool because you can actually go into RNA or peaks where you can get against much more another kind of, as newer and newer technology comes in, you can actually get on that. But I also agree with your point that sparsity can be a problem but it can be user dependent and you can put a warning there. So that can be really useful. Thank you. So it'd be kind of cool if we could sort the features by the amount it contributes to the UMAP or the principle component or something. So that way, you know, you don't scroll down to the bottom of the features and be like, oh, yeah, you can already filter things out that way. Thank you for the suggestion. So you touched on something or something was touched on in the process of discussing it. This tool is almost complementary to another goes by the name of SCGate that was published by the Carmona Lab. And they concentrated almost exclusively, maybe not exclusively, they concentrated heavily upon creating gating models and training them for splitting into pure and impure cells. And we have some experience with it. If I was pointing out that it's going to be really handy to compare what we get with SiteFiz versus SCGate. But it's also going to be tremendously handy to see where the differences are. You know, why is it sometimes we get really good results with SCGate in terms of impurity versus not? We know some of it's due to feature contribution. In some data sets, we won't have all of the ADTs that we'd really like to simulate the flow cytometry gating process on something like, say, subpopulations of dendritic cells. So sometimes we'll shim in mRNA features and then we'll sort of be scratching our heads, well, where did this go wrong? It would be wonderful to have the ability, like you said, to look at how well or poorly we know that the correlation between ADTs and mRNAs often awful. We've plotted this for an awful lot of ADTs. So we know that it's really not even second best. It's just one step up from the worst to use mRNA instead of an ADT. But sometimes we don't have any option if we want to set up a gate. How soon in the near future do you think it's going to be that there'll be functions to retrieve a gating scheme that the user has set up so that it can be compared against something that was computationally prepared? So almost being able to save a profile of things that you normally go to. I can plug your type in scheme where you can retrieve it and see what it looks like. Right, right, right. So maybe the next time you generate a similar data set, you can just plug it in and then just check your answer again and then see any differences. And that's good. Right, right, right. I think maybe one hiccup would be that the shape of the UMAP would be different. But you can learn a UMAP and then produce new data. Yeah, yeah. That doesn't have to be a show of thought. Right, right, right. So that would be the workaround to it. But yeah, that's a good feature to include as well. Because we rarely do things with just one experiment and done. It's going to be always a series of experiments and improving the reproducibility and the parameters. So that would be good. Sorry, one last question. So about the back gating. Now, as I think it comes to my mind, but you were selecting a UMAP cluster, right? Can you, once you selected what are the group of cells, can you suggest like a back staging in a way that what are the gating features you need to use? Like a most parsimonious way to suggest the biologist, like if you want these type of cells, you use these combination of features in this way. So that in let's say in a real experiment, you might get these cells. I'm not sure if I'm making sense, but most parsimonious group of features which defines the selected group of cells. So it's give you a suggestion in the reverse direction. I'm not sure if it's making sense, but it could be a very cool algorithmic problem. No, yeah. That's totally a good idea to like if there was like a, yeah, or more algorithmically smarter way to like help suggest like the features that could differentiate certain clusters. And we haven't really thought that far. And no, but that's a good, these are all good features to like include. And we'd love to, I know, keep in contact. We'll like show you some, the new updates or something, but thank you for the suggestions. Right, so I just trained the decision tree to maximize the discrimination that you would get. That's a little pink, but let's call a circle around the thing. And that seems like the most powerful way to do it, right? We all want to throw a bullet out of sight and have some kind of initial model based clustering instead of what it should be. That's not how it actually works in actual cells that are transitioning from states. Oh yeah, sorry. Sorry. I was just agreeing with one of their presenters that it seems sort of ideal to have both data driven and also supervised. We refer to it often as semi-supervised. If we didn't know anything, then yeah, we'd want the data to do all the driving. And if we knew everything, then we wouldn't need new data because we'd already know where to drive. But that doesn't describe most research problems. Most research problems are somewhere in the middle. We're kind of trying to figure out how much driving the data should do and how much the experimentalist or domain expert should do. So I'm kind of looking forward to something like SCGate and SITEVIS meeting in the middle. That's all I was saying. I concur. Yeah, that's a good, that's an interesting point of like, I think also like with difficulty with single cell, it's like when should it be hands-on? Yeah, versus like unsupervised. So I don't know, it's quite philosophical at this point. Maybe we can... But yeah, I totally understand where you're going at. But yeah, thanks for all the discussion points. Sorry, not to beat this dead horse, but I think I'm watching on the same ideas here. The back-getting thing, maybe think that you could use this for almost like discovery or something like that, biomarker discovery, this kind of thing. Back to the flow of the time. Okay, thank you. Thank you, everybody.