 Okay. Hello. Welcome. I'm Jenny Drenovic. I will be the chair for this session. This is the single-cell track for talks. We only have three. We had one cancellation. My script for people online, please put your questions in the chat. I think we have another moderator somewhere who will collect those for me, and then we will do about 10 minutes for each talk, and then we will have all questions at the end. Okay. So, first step would be Avi Shrivastava, who is a post-doctor fellow at the New York Genome Center, and he will be talking about characterizing cellular heterogeneity in chromatin state with... Okay. How do you say your package? You'll do it. Okay. All right. Come up. You have to use the microphone. And how do you move forward with this? I guess, yeah. Okay. Hello, everyone. Thanks for being here, and thanks, Jenny, for the introduction. My name is Avi Shrivastava. I'm a post-doc in Rahul Satija's lab at New York Genome Center, and today I'm going to talk about this very exciting technology to put it out like it's single-cell cut and tag pro, and leverage that new technology, single-cell technology, to design a computational model to understand the functional role of DNA element, and the tool's name is single-cell chrom HMM. So, if I talk about my research interest over the years is to understand the cause and the consequences of cellular heterogeneity. And over the years, I've developed a multiple tool for single-cell RNA-seq analysis. You might be familiar with Salman, Kalisto, Surat, and all those tools. And then single-cell ATAC-seq, there was Cygnac, and a couple of other tools which I've contributed to. But all these methods is very small piece of a very complex biological puzzle, right? And the field is actively moving towards profiling newer cellular modalities, specifically to understand the functional role of non-coding DNA elements at single-cell resolution. We have to do at much finer grain single-cell technologies. But before moving into understanding the current technology which we developed, let's look at some of the limitations of the currently available technologies, and going all the way back to ChIP-seq, where we can profile histone modification profile. But the problem with this was it cannot be scaled with thousands of single-cell, right? And then relatively recently, single-cell ATAC-seq came along, where we can profile open chromatin region at single-cell resolution. But the problem with this measurement is you're looking basically at the binarized profile. With each DNA element, they can be open or they can be closed, right? But actual biology is really complex, much more complex, I would say, than just looking at the binarized open or closed chromatin regions. For example, if they are poised regions or active regions, you cannot really differentiate by looking only the open chromatin profiles. You need much more information. And one of the information to really look into these are looking at the histone modification profile. And relatively recently, single-cell ATAC came along, where it's a way to stain the cells with TN5 transposes with protein A fusion. And the antibodies get bind to specific regions of interest based on whatever the staining antibody was used. And that way you can profile specific histone modification at single-cell resolution, which is very important to study the cellular heterogeneity. Now, the problem with single-cell cut-in tag is the data is sparse. And not only that the data is sparse, but also to understand the functional role is the combination of histone marks, which defines the role of a DNA element. And still, there is no technology where you can profile multiple histone modification within a single cell. So if the slide is already outdated because this New York genome center is actually a powerhouse of developing new technologies, we can measure a couple of histone modification, but we have to go even further than that. And right now, we cannot. But the basic idea is you need multiple histone modification for each and every cell. And that is what is one of the limitations of single-cell cut-in tag. So to improve on that, we developed this newer, I would say, like an extension of single-cell cut-in tag, which we're calling single-cell cut-in tag pro. And the pro part is coming from the protein, where you can measure protein and histone modification within one cell. So there's no integration, there's no interpolation. You are actually measuring the both cell surface protein and the histone modification within one cell. And the idea is relatively simple. You stain the cells with the protein of interest, and then you use histone modification protein of interest. And again, stain the cells with the cell surface protein. And then you run 10x 8-axis protocol on them to generate the dataset. So using this technology, we profile like four histone modification mark, where a couple of them were activation mark, there's a repressive mark, K27 triamethylation, and K9 triamethylation, which are marking the repressive regions of the genome. So one question that really comes to mind, like why did we went through this all effort of profiling protein in the same experiment altogether, right? So I was talking earlier about the problems with the single-cell cut-in tag dataset, which was the sparsity. And one of the way to work around this problem of sparsity is pseudo-bulking group of cells, right? To pseudo-bulk a group of cells, you need to figure out what are the groups of cells which are relatively similar. And I may see, it's relatively, it has been shown multiple times that you can figure out the KNN graphs and everything to figure out what are the group of cells which are similar. But here the real problem is, let's see if I can point out. Okay, yeah. So the problem is if you just do unsupervised clustering of the cut-in-tag dataset, no protein at all, right? The resolution, it is separating B-cells, myeloid cells and lipoid cells, but the resolution which is needed to really go into, let's say, naive cells, memory cells and regulatory cells, it's not really there. So that's why protein information is really useful. You have specific antibodies which can differentiate these finer cell type annotations. And since we have profiled the histone modification and the cell surface protein together, you can use this information to subgroup these cut-in-tag dataset into different groups. So on the extreme right, that's what I'm trying to show. It's like a hybrid supervised PCA used UMAP embeddings where histone modification data is plotted, but the nearest neighbor graph was created using the protein information. So using the protein information, we can separate out finer-grain cell type annotation, right? So this solved one of the problem with single cell cut-on-tag, at least approach towards solving one of the problem, right? But there's another problem where we cannot profile multiple histone modification altogether. We need multiple histone modification profile to learn the chromatin dynamics across the cell lineages. So to solve that, we design another computational tool which we're calling single cell chromat MMM. And the basic idea is to start with integration using the protein information as the central modality. So one of the very good thing about the protein information is it's very statistically robust, not like single cell RNA-seq. They don't know drop-out or any kind of effect, right? It's robust in defining cell type annotation through site-seq. We have shown multiple times that it really helps to profile protein. Since the data is robust, we can use this and across all the histone modification, we have protein information as well, right? So we use protein across multiple experiment as a central anchor and then try to integrate all the data into one framework, right? That's what I'm trying to show you on the right, that there are like eight site-seq, which is combination of protein in RNA-seq, histone modification, and then ASAP-seq, which is a combination of ATAC-seq with protein. So there's a whole new whole suite of technologies which you can integrate into one consistent framework. And the idea is like if all the cells are grouped into one consistent cell, one consistent framework, excuse me, you can annotate cell type annotation to each of them in a consistent way. And that's really the beauty of this whole data comes in. So here you're looking at CD8 loci, right? And here like nine cellular modalities, each with six or seven cellular cell type annotation with them, right? ATAC-seq showing coverage, higher coverage in the CD8 loci, specifically for the CD8 T cells. Activation markers are showing enrichment. But at the same time, all the repressive marks are active in rich, sorry, in the cell types except CD8 T cells. Along with that, you have RNA-seq, which was showing high enrichment in CD8 T cells, along with protein, specifically CD8 protein, which is also, so you have like eight or ten modalities all integrated into same framework and it enables you through these exciting technologies like side-seq, ASAP-seq and single cell cut intact probe. So as you can imagine, the infrastructure which you need to handle this kind of data set is going to be even more complicated, like how you integrate, how do you perform C, like a low-dimensional embedding. So it's a work in progress, but the whole, this is like proof of concept idea that you can visualize everything into one and then maybe use it downstream to understand the functional role of DNA elements. So once you figure out integrated, once we integrated everything into one cell or group of cells, what a general way to understand the functional role of DNA element is through Chrome HMM. It's a tool which uses the Baum-Welch algorithm, which is a HMM, multivariate HMM model and assign like, assign each 200 base pair region of the genome, if it's activation, it is, oh, sorry, it is a promoter, if it's an enhancer, if it's a repressed region, and the idea is shown on the left, like, specifically here, like K4 diamethylation and trimethylation is higher, so it's kind of representative of promoter region. So that's why I'm denoting it with a green color, which these two states are marked as a promoter, the three and four state marks as enhancers, similarly 9 and 10 are marking as a repressive, and lastly as a heterochromatin region. But how do you visualize them? Well, using bulk Chrome HMM, since we cannot really work with a single cell data set, the idea is to perform pseudo bulk analysis on the B cells, like all the B cells grouped into one, one profile, and that's what I'm trying to show at the bottom, like each 200 base pair region of the PAX-5 loci, where the transcript start site is showing some K4 diamethylation pattern, which is marking of promoter region as we go inside the gene body, it's K4 monomethylation, which is marking enhancer region, similarly K27 trimethylation is marking that all of the cell types is this region is repressed, right? It's reflected like the combination of histone modification market shown as a green, and as we go into gene body it's getting bluer and bluer, which is marking the state as an enhancer, right? But the problem still is like we are not going at single cell resolution, this is again pseudo bulk, we are talking about B cells as a whole, CD8 T cells as a whole, right? So that's why single cell chromatum, it's kind of extends the previously published method of chromatum, which basically use the model learn on the pseudo bulk profile and apply at single cell level, and again if you have to visualize all this complicated data set, for example, each 200 base pair region of the genome for 20,000 cell for 12 states, it's really not really doable in the current format, right? So there's a workaround which we figured out was basically visualize each 200 base pair region independently, right? Here I'm showing the pseudo bulk profile in the bottom, but we can look at the specific yellow region and look at what for each cell what are the what are the repressive state probabilities and what are the promoter state probability. For example, this region is transcript start site, it is showing higher promoter posterior probability and is showing repressive posterior probability across all other cell type. So I really encourage you to check out a paper or check out a paper we can do a lot more things with this posterior probabilities, but I have to summarize it for now for the sake of time. We developed a new technology which is called single cell cut and tag probe and then this new computational framework which is called single cell chromatum, where you can look all these all this we calling this funnily mega ohmic profile to really look into this mega ohmic profile in the combination of histone verification marks. With this I like to acknowledge all my lab and our funding resources and probably I move into question after everybody else is finished with your talk. Thank you.