 Okay, sorry for the delay. It's my pleasure to introduce our next keynote speaker, Dr. Gilles Raymond. Come on, right there. He's the Benjamin Goldberg Professor, and as of just a few weeks ago, now the head of the Department of Biochemistry and Molecular Genetics at the University of Illinois College of Medicine, which is up in Chicago. Now, one of the few silver linings to COVID is that last year I was able to watch him just give a local departmental seminar, even though I'm 150 miles downstate at the University of Illinois, Champaign, Urbana. Now, the research in his lab focuses on inflammation in immune response to pathogens, which took on extreme importance after the onset of COVID. His lab has also produced a couple R packages to infer the activity of transcription factors using single cell RNA-seq data by leveraging chip-seq data and also to do cellular heterogeneity across time. And they are going to be presenting a package demo today at 3.30 in the JNB SoundGarden if you want to check out their packages. I will now turn it over to Dr. Rahman to present his keynote on assessing cellular heterogeneity across time and disease. Thank you, Jenny, for the kind introduction and also to the whole bioconductor team. I think it's just wonderful to have this conference in person and even with the hybrid, it allows people who can't make it here to still participate. So it's been a really fun few days and I look forward to the remainder of the sessions. So as Jenny already mentioned, my lab comes from a biological perspective and disease perspective and roughly five to seven years ago we started developing computational approaches to study tissue injury, tissue inflammation. And I'll just give you the biological background so that you understand what got us excited about using computational approaches. And this is something, this is our interest predates COVID by quite some time. But I think it's a value when you try to, in the era of COVID to understand what is it that makes COVID so devastating. What happens in severe COVID-19 is this here is an alveolus of the lung. It fills up with fluid. And that's the reason why people need a ventilator because they can't breathe. And the reason it fills up with fluid is because these are the blood vessels that go through the lung, the capillaries. They start leaking because the blood vessels here, the lining of the blood vessels and the so-called endothelial cells, they are destroyed. They get damaged, they get broken down, fluid leaks in and in addition to fluid, immune cells get in there and the immune cells cause massive inflammation starting a vicious cycle of inflammation, destruction of more tissue, more leakage. So it's often not the virus that kills people but is a destruction and the hyperactivation of the immune system in the lung that does it even long after the virus is gone. So we started this research before COVID-19. So at that time our models were bacterial pneumonia, bacterial injury, which is just as devastating. And this is an example of an animal model here. You can see this is a transgenic mouse. Red fluorescent tag is the non-vascular parts of the lung. So these big holes is where our lungs breathe and they're all covered by blood vessels because that's how a lung exchanges oxygen. So every part of the lung is coated by green fluorescent protein labeling the blood vessels. So red and green go hand in hand together. This is the baseline conditions. This is how a healthy mouse lung and human lung would look. So they're nicely opposed to each other. If you inject a bacterial toxin called LPS into the bloodstream, it hits first the blood vessels of the lung because it's injected into the bloodstream so the blood vessels get eviscerated. So all this nice green structure here is destroyed. The green fluorescent cells have disappeared. You still have the red fluorescent cells which are the epithelial cells that are on the air interface but the blood interface is gone. And then after three days in humans and mice that are fortunate enough to survive, the green cells start regenerating. They grow back and this is not quite back to baseline but you can see here that a lot of the green has been restored because our blood vessels in the lung have the ability to grow back. So this is something that we had found about three, four years ago when we were studying bacterial injury just about before the COVID-19 pandemic but at the same time we asked this question what is it that drives the regeneration here in the green cells and what drives the injury in the first place? And to a surprise in this eLife paper that you can look up, what we found is that when we compared the blood vessels of the lung, the brain and the heart using that same injection of the bacterial toxin into the bloodstream, it was the lung that had the strongest inflammatory response. So the blood vessels of the lung activated all these genes involved in immune host defense, immune cell activation, leukocet proliferation, much more than the brain and much more than the blood vessels of the heart. And so we asked ourselves why would this make sense? Why would our lungs, this is during homeostasis. So there's no injection at this point. So even during baseline, our lungs are prepped to have a massive immune response and inflammation response. So we thought, well, maybe that's how we evolved because our lungs are the one organ which are continuously in contact with the outside world. We're always breathing in pathogens long before COVID. So it makes sense that the lung blood vessels are primed to activate the immune system but that comes at a cost because maybe that being primed is what makes our lungs so vulnerable. When we do get hit with a severe infection, they tend to overreact. Whereas our brain and our heart blood vessels don't react so strongly. So this is the biological context that prompted a lot of our research. And we said, well, do all blood vessels, blood vessel endothelial cells in the lung have that response or are there subpopulations? Which is why we looked into single cell RNA sequencing. This project was led by Lucy Zhang who's now an assistant professor at University of Pittsburgh. She's moved on and has her own lab. And Shan Gao who's actually one of the workshop leaders today. He's a PhD student about to graduate. And we just studied. We injected the lungs of mice which had a red fluorescent tag in the endothelium and it's just harvested the tissue, three hours, six hours, 12 hours and so on. At seven days is when their lung fully regenerates in the surviving mice. And we tried to use a non-lethal dose of this bacterial toxin LPS. So single cell RNA sequencing. This is something that obviously all of you are familiar with. Just a UMAP plot here. This is baseline. So even at baseline what we found is that there are two predominant clusters in the mouse lung endothelium. One which without any injury has a immune response antigen presentation signature. Suggestive of this idea that we had that the lung is primed to respond. It's just waiting to be attacked by pathogens in the air. That's our hypothesis. But I think it makes sense. And the other interesting thing is that there's another set of endothelial cells here which are ready to regrow. They'll regenerate once the injury occurs. So even during the resting state they have more of a developmental signature. This is just heat maps showing some of those genes that are characteristic. And you find that these histocompatibility complex genes are expressed in that subset of endothelial cells here that are primed for that immune response. And then more developmental genes like SOC17. This is a transcription factor very important during development or HES1. And there during the baseline homeostatic state in the lung, they're already being expressed. Once injury occurs, as early as six hours, this is just six hours later here. This distance increases. And of course, I think we all know it's a UMAP, it's not a linear distance and we shouldn't overinterpret how far apart these grow. But it becomes very clear that there are distinct cell populations very clearly demarcated. This is serrat clustering but also the visualization when UMAP shows it nicely. More of an inflammatory response and then more of a developmental response. So these immune endothelial cells, the sub-population we think is the one that really drives the interferon production, massive cytokine release and the developmental endothelial cells, they're still expressing these genes as if they're ready to regrow back, ready to regenerate, ready to develop. At three days, which is when biologically we'd found the massive regeneration, that's when we see that a new cell population emerges which has now a proliferative signature. It's mitosis, nuclear division, chromosome segregation and this is just here. All these UMAPs are just showing the different cell cycle genes that are activated in this proliferation. When we did monocle trajectory building it was a little bit challenging because a lot of cytotrient trajectories are for developmental processes which are more linear. Inflammation is a cyclical process because our cells get inflamed and then inflammation goes away. So it's sometimes better not to take the whole process but instead take chunks of the process of injury and then just use that for cytotime building and what we found is that really these proliferative endothelial cells likely emerge from that developmental population that's been there all along but apparently likely gets activated at this time frame here. Now, this is just simple observation that we had. We saw these different cell populations that emerged but the question that we are most interested in is what drives the shifts in the clusters? What drives the dynamics? And we realized, yes, we had picked a few transcription factors mostly based on a biological hypothesis but what if we had a more unbiased approach to understand the transcription factors driving injury, driving regeneration and that led us into probably one of the most interesting new directions of our lab which is really start developing a computational focus and it is, we developed this algorithm, BIDFEM, which is a Bayesian inference of transcription factor activity in single cell data. It was again led by Shang and I'll just briefly walk you through this here and I think you'll see here that it reflects our biological bias. We have this idea that when we try to analyze a single cell RNA sequencing data, is there prior biological knowledge that we could use to understand inferred transcription factor activities or how to analyze the data? And that prior knowledge here is in this chip-seq transcription factor target gene matrix. We went to a comprehensive chip-seq database and we said we know what target genes transcription factors have. We took the GTRD database here and we created this matrix of whether or not a gene is a target gene off a transcription factor. This is just a normalized single cell RNA sequencing data and we just hypothesized that there would be inferred transcription factor activities that we could learn using machine learning approach with this as our input data, with the chip-seq data as input data and that this inferred transcription factor activity would of course drive the normalized RNA-seq, single cell RNA-seq expression levels that we can measure and that they're likely gene weights because not every potential target gene in the chip-seq database is equally likely to be a target of a given transcription factor. So if we could use a machine learning approach to not only infer the activities but also learn the weights given this data here, that might be a very exciting way to get a handle on the transcription factor activities and on the preferred targets of any given transcription factor. Now what can we do with this data once we have this? Well, we could use the inferred activities to cluster cells. I know, you know, that this, for example, Sirat is an excellent way to cluster cells but what if you were interested not in all variable genes but primarily in variable transcription factor activities? Would that give you another perspective on clustering cells? Would that have a different biological meaning? Could you use those inferred transcription factor activities to build trajectories because maybe transcription factors in a sort of hierarchical way maybe they are more important for cell identity than just all variable genes if you take, you know, all thousand, 2,000 variable genes in Sirat clustering and could you then also identify population-specific transcription factors? So this is just a key summary here. We take the GTRD database. This is the prior distributions in the spacing approach of the weights. It's just based on whether or not a transcription factor, a gene is a target in that database. The posterior distribution after the learning indicates the weight how likely is this interaction of the inferred interaction of a transcription factor and its target gene in the dataset that you're analyzing. So that's different from that a prior distribution. Now this is an important caveat and I also really appreciate this at this meeting here where everybody who presents the algorithm or their software package always points out the caveats. All of our approaches have caveats. Our prior distribution accuracy depends on the quality and the relevance of the chip-seq dataset. And in our paper we found that for some transcription factors it might matter whether your input chip-seq data is derived from a relevant cell type. So for example, if you are studying immune cells in the lung and your chip-seq dataset is from let's say leukemic cell line, it might be much more relevant than if it were done on a glioma cell line from the brain. So for some transcription factors it might impact the accuracy of your inference. And this is why I think one should always try to put it into the biological context when you interpret the data that Bitfem provides to you. And then we use our stand for the inference as a machine learning approach. Now Tablo Morris is of course something that we all use. This is what we use to kind of benchmark and understand it. I think all of you are familiar so I won't spend much time on this here but it's a great way to start. And now of course there's human atlases and other atlases too that you can use now too. But this is how it works. And I won't go into details of both packages today because Shang and Shinga are presenting both Bitfem and Trendcatcher at the workshop at 330. So I'll just try to give you examples of how they helped us give us some biological insights. So what does Bitfem generate? It generates a heat map of inferred transcription factor activities. The labels here are biological labels based on, we're using that as a ground truth. So what are these cell types that are biologically labeled? And what you can find here, so these are alveolar macrophages using traditional markers of alveolar macrophages. This is all the lung. So this is lung endothelial cells. This is B cells in the lung from the Tablo Morris database. And so this is the inferred activity of the transcription factor here. This is an example here, MAFB. I'm not sure if you can read it here but MAFB, this is PAX5, this is TAL1. And when we first showed this to our colleagues, we had no idea what PAX5 was. But someone in the audience said, PAX5 is the B cell transcription factor. It really determines fate of B cells. And it's nice for us because we really came to this agnosticly. And when we look at the same corresponding population here, you wouldn't find much PAX5 mRNA because transcription factors are lowly expressed. We all know the issues with sequencing depth, with single cell RNA sequencing. But based on the machine learning approach, its targets were apparently highly expressed in that population. So it gave us a handle on inferring PAX5 activity even though you can hardly see any expression of PAX5 itself here as assessed by single cell RNA sequencing. Same with MAFB. It's a macrophage fate transcription factor and it's nicely expressed here corresponding to the biological label, alveolar macrophages, that's the main macrophage in the lung. And again, we see it in other cell types. We just did not have a lot of luck. If we had just gone by mRNA levels, we would not have found it. Tell one, you do see it nicely here. It is expressed, but it's much more clearly and widely. You see our learned inferred transcription factor activities because its targets are so well expressed in this population, the lung endothelial population here. So this is the heat map that it generates and you can have biological labels here. You can also use serrat labels or other labels here, but it gives you a very good handle on it on what cell types express or not express, what cell types have which inferred transcription factor activities. Now, it also then, and this is the important thing, it links the, this is the posterior distributions. After the learning process, it gives you weights for each transcription factors and their target genes here. And this is, I think, to me, it was one of the most interesting findings and as somebody who was biologist coming to a computation biology, it really highlighted to me the value of doing this here. So when you look at the chip seek database and you say, what are the chip seek targets of tell one? These are the processes you find here. So tell one is involved in RNA metabolic process, macromolecule modification, protein modification. Pax five, metabolic process, metabolic process, macromolecule modification. Why is that? That's because what most transcription factors do is just keep cells going. It just, they help maintain cells. And it's important because if they stopped doing that, we would all be dead right now and we don't want to be dead. So we're happy that they do this, but it's not intellectually that exciting. It doesn't tell us much about heterogeneity because they do it in all the cells. Now if you say, let's look at only the genes that are the top weighted targets from our learning approach. And not just all potential targets. Turns out that tell one has cell differentiation, circulatory system, cardiovascular system development, vascular development. So now it has enriched for genes that actually are much more matching the biological function of tell one, which I thought was very exciting that we were able to pull that out. So instead of looking at all the targets, this told us much more about the actual function of tell one than looking at, you know, whatever 1,000, 2,000, 3,000 potential targets in ChIP-seq. Same with Pax-5. If you look at all potential Pax-5 targets in ChIP-seq, it's the same thing, a lot of maintenance genes. Look at here, it's B cell activation, lucaside activation, lucaside differentiation, lymphoid development. It tells you much more about what does Pax-5 really do? So I think this was, again, it's something that helped validate in our mind the biological value of this approach here. Now it comes to clustering. We did not originally intended as a clustering algorithm, but we're really using it more as a validation. Do transcription factors, because we've heard some really great talks, for example, one of the first talks by Sondline Doudois, was a great introduction to the different ways how you can cluster cells. And I think it's important for us to realize that there are multiple clustering approaches, but they should be maybe guided in part by the biological questions. So could you cluster cells just by the inferred transcription factor activities? And what we find is, yes, you can. So the biological labeling here matches the clustering just based on transcription factor activities here. And it's comparable to, if you were to use, for example, SRAT, or if you were to combine it with other transcription factor inference algorithms like Scenic, of course, then combined with a clustering algorithm like Luven. So yes, you can cluster it nicely. And in some cases, there's not much of a difference. In some cases, there is a difference. But again, I'm using this more as a way of validating the approach and not necessarily saying that one clustering approach is better than the other. I really think clustering should be contextual. And what do you want to cluster the cells for? Now, going back to why we got into this here, we wanted to go back to lung injury. And I'll just show you one slide about these results here because that's what inspired us. And this is what we found here. So if we look at the cells after six hours of bacterial endotoxin injection, LPS injury here, this is the genes that are activated. It's these immune genes. But instead of us now looking at the expression of transcription factors, which is what had led us to SOC-17. And again, it was very hypothesis-driven because we had known that SOC-17 was important in regeneration and in suppression inflammation. This is what it told us here. These are the UMAPs of the inferred activities. And this is what we found, autoimmune regulator, vitamin D receptor. Not necessarily things that we would have stumbled upon just by hypothesis-driven research. You know, vitamin D receptor, this is right now that the sunniest and hottest Seattle I've ever seen in my life. But it means we have a lot of vitamin D and it is important in immune cell regulation, but it's not something that we would have naturally gravitated towards. Again, interferon regulatory factor, very important in inflammation, inflammatory responses. And PIPAR gamma is something that there are a lot of ways to therapeutically intervene here. So it has really opened up many more directions for us to do our research by looking at inferred activities and sort of expression levels of transcription factors. So just because we lost a bit of time on the technical difficulties, I won't go into a lot of details here. So this is just a summary of it. It's on GitHub. And we are also interested here, I'll just briefly touch on this here, that, you know, what I felt our main contribution here is that we really try to integrate prior knowledge, prior biological knowledge empirically derived from chip seek. But maybe there are other ways to also integrate additional knowledge. For example, what about intersecting our data, chip seek data with ATAC seek data so that we further filter down the potential targets to those where the chromatin regions are accessible. So there's more ways to, I think, build on BIDFAM and we hope that future iterations, either by us or by others, could work on that and do this even better. But I want to now touch on a second question, which is COVID-19 and the platform Trendcatcher, that was developed by Xingya Wang, also a PhD student in the lab. And just for time reasons, I'm not going to too much detail here because, again, Xingya is actually doing a live coding session and demonstration this afternoon at 3.30. But it can be used for bulk and for single-cell RNA-seq. Single-cell RNA-seq requires that we pseudobulk the samples, so it works best if you have a fairly stable cell population, such as, for example, PBMCs, where you know that these are lymphocytes or these are CD4, CD8 cells, so it's easiest to pseudobulk those. If you're dealing with situations where new subpopulations emerge, pseudobulking can be a bit challenging. But if you're working with samples where you can confidently pseudobulk or you just do bulk RNA-seq, what does Trendcatcher do? It uses a curve-fitting approach. It models the trajectory off your dynamic genes and then calculates gene-wise dynamic P-value. And so you identify which genes are dynamic. Looking at the whole trajectory, looking at the whole trajectory, it gives you the trajectory patterns. And something that I'll highlight here really is to also look at the time heat map. And I will emphasize COVID-19 because in COVID-19, we found it is so essential to look at the dynamics of gene expression and not just which genes are dynamic overall. And I'll walk you through that here. So using simulated data, Shinga compared, Trendcatcher to impulse DE, DC2, DC2 spline. We use a spline approach when we do our curve-fitting. That's why. We do fairly well when it comes to just comparing our accuracy. Now, what helps here is the more time points you have, the better Trendcatcher performs compared to other approaches. And the more complex the trajectory is. If you have a biphasic trajectory, a lot of the other approaches, so biphasic means up and then down. That's a fairly, it's something where I don't think Trendcatcher is that much better than others. But if you have a multimodal trajectory, it does much better. And so I think that those are reasons to choose the package you use to analyze your longitudinal time course data is based on the complexity and the number of time points. So the more time points you have and the more complex the trajectory, I think the better Trendcatcher performs. And this is just an example of the various data sets that we looked at in our paper. This is the JCA Insight paper. This is early in the pandemic. This is in non-human primates, PBMCs, single cell data of human patients with COVID-19 or SARS-CoV-2, bulk whole blood data, and then other PBMC data too. So I'll just show you first the non-human primate data here. And this just gives you, this is what Trendcatcher gives you. It tells you which genes are upregulated over the whole dynamic time course, which genes are downregulated. And I think for me, one of the important things is how do we communicate these results to biologists? As you know that we can always do genes at enrichment analysis, we can tell our colleagues these are genes that are dynamic or that are different. But I think what's really key is to look at when are they peaking? Which genes are early peaking? Which genes are late peaking? And what this time heat map does, it shows you, this is the mean fold off the genes in that given pathway that is statistically significant. And this also tells you then here what percentage of the geopathway is dynamic and what are the numbers of the genes. So you can immediately see, there's a 2.61 log two fold increase in the mean expression off this pathway, genes in virus defense response. And then it gradually wanes. So it goes down to this is always versus the prior time points, so zero day to one day, one day to two day, two day to four day. So it gives you a very nice sense of there's an immediate peak up and then a gradual going down here. And this became really important for us and it had actually a lot of translation relevance when we looked at whole blood RNA-seq in patients who developed severe COVID-19. What we found is that they had a massive increase in their immune activation here in the beginning and a very gradual decrease. And this is here the individual trajectories here. So in severe, you see the red here is the severe patients, blue is moderate patients, green is mild patients. Neutrophils are the immune cell types that are actually meant to fight off bacteria, but they get activated even in a viral infection. Patients who had mild COVID-19, their green is more or less flat. There's virtually no significant mean change in gene expression of neutrophil activation genes in whole blood. Those who had severe COVID-19, so these samples were all taken at a time when you didn't know whether they've developed severe mild or moderate COVID-19. This was early on, they all had similar symptoms right after they were diagnosed with SARS-CoV-2. But those who early on in the first week already had an upwards trajectory of activating the neutrophils and this is what, you know, trend catcher really nicely identifies. They're the ones where the neutrophils stayed active. You can no longer find virus at week two, but the neutrophils remain active. And we suspect that these patients who develop severe COVID-19, they might also be very important for what's called long COVID now. So long COVID is that you have a hyperactive immune system long after the virus is not being actively transcribed. You might still have viral antigens, but you're no longer contagious and yet your neutrophils remain activated. So this is just an example of how understanding the time course can have immense benefits for analyzing the data. I'll just, I know because we're running out of time and I think there's an awards ceremony very soon, so I don't want to take time there. But the other thing is that COVID-19 is not only about severe COVID-19 is not only about having hyperactive immune system, but it's also about having inadequate activation of the right immune cells. So neutrophils are the cell types that are hyperactive in severe COVID-19 and they're hyperactive early on and they stay high, so coming back to this here. So this here, they are active here and then they remain high. But the actual immune cell type that needs to be activated and the actual immune signaling pathway that needs to be activated is type one interferon signaling. So that's the pathway that helps eliminate the virus early on. So the patients who have only moderate COVID-19 who go on to develop moderate COVID-19, they have a very rapid response of other cell types, not the neutrophils, natural killer cells, monocytes, B cells, T cells off the interferon pathway and then it subsides. The patients that develop severe COVID-19 have a suppressed response here. So when you follow this time course, you see that the best way to, for the immune system to avoid severe COVID-19 is rapidly activate interferon one, eliminate the virus and avoid having your neutrophils coming in and causing more damage than good because they are not the cells that your immune system really needs to activate. So again, this is something just for time reasons, I'm not gonna go into this. If you have questions about Trencatcher or Bitfem, Shang and Ching will gladly answer them in the workshop. I think COVID-19 is just one example. I would love to see this applied also and for example Alzheimer's disease or other, not just these acute and subacute diseases but also chronic diseases where we're going to get more and more longitudinal bulk and single cell transcriptomic data. Let us look at the temporal trajectories of these different diseases and severities but also for biological processes. When your cells are developing embryonic development, what are the bursts of certain pathways and if you combine Trencatcher with Bitfem, what are the bursts of certain transcription fact activities that are turned on and then off again at defined stages of development? Now, just in the last few minutes, I wanna touch on something very simple which we're just now playing with, experimenting. I just wanted to throw this out there is that we want to look at now dynamic changes with single cell ATAC-seq data and the first challenge that we had is how do we even represent single cell ATAC-seq data so that we can discover new aspects of it? So I'm not gonna show you any real results but just share some thoughts on you how we're even starting to approach this problem right now. So a lot of you probably who work with single cell ATAC-seq data use the peak by cell matrix. You look at all the different peaks that you have in your single cell data and then you generate a matrix for each individual cell here. Now, that's one way to do it but this is something else that both Shang and Shing are right now working on. What if we, and this is maybe touching on our kind of prior knowledge approach, what if we annotated the cell matrix based on what we know about the genome structure? What if we separated intergenic regions from exon one, exon two, exon three, intron one, intron two, intron three, promoter region? Would that be something that might be of value so that we don't just look at accessible and non-accessible regions but we ask this question, accessible where? And what would that tell us? So just coming up with this idea of how to represent the matrix, I think is something that's where we are right now and then now we're thinking about what can we do with this here? And this is just to give you an example. This is the same gene, Hess one. Very important gene in the developing mouse and running brain. This is a single cell ATEX data from the 10X website here. Hess one is this gene. It's part of a notch pathway. It helps drive differentiation. This is the peak matrix. And of course, for the same gene, transcription start site, it doesn't have that resolution because it doesn't tell you where is the chromatin accessible? If you now look at exon, intron, it gives you much better resolution and then you can cluster the cells or sort the cells. This is right now just sorting the cells by where is the accessibility taking place? And we could do the same thing for the intergenic regions. We could look at, for example, define certain enhancers, certain epigenetic modification sites. If we could break up the sites, maybe that would be a different approach to looking at single cell ATEX data than the peak matrix. So again, this is just in different examples of genes where we're just looking at this right now. And we want to now use this here, this approach now to maybe use new machine learning or deep learning approaches to learn identities of cells with this better resolution and this new approach of how to represent the data, the single cell ATEX data. So this is our phase two. Can we identify cell heterogeneity with this annotated gene segment matrix? So we annotate coding and non-coding regions. Do we think that this annotation approach helps us identify distinct cell types so that we don't just have this binary approach accessible, not accessible, but accessible where? Will this help us maybe track real-time changes or pseudo-times changes as disease progress because if we want to create something like chromatin accessibility, velocity idea or different changes in state, would this be helpful? And then can the genomic distance between accessible segments across all the genome help define cell states? Are there certain genes that become co-accessible or not? So these are ongoing projects here. And again, I'll just take a few questions there. At the end, this is just the work which is done. Shang and Xing had been really driving this to Clavers, Yang Dai, and Lucy Zhang at Pittsburgh. This is the Bioconductor Workshop by Xing and Shang today at 3.30 p.m. at the GMB building. And one last thing I want to just plug is I just became the department head. And one of the biggest mandates I have is to really expand our genomics and computational biology section. So we are very actively recruiting, of course, graduate students for a new genomics and computational cell biology track in our program that we just created, post-doctoral fellows and senior data scientists. But we also have several tenure-track positions in genomics and computational biology. So if you're interested, please email me for any of these positions here. I'll still be here tomorrow, and I will gladly talk to you. So thank you. Thank you, Dr. Rahman. I guess we'll have time for a few questions. I know we're running late, but I know everybody's got some that are interesting. That was a wonderful talk, Dr. Rahman. I'm incredibly clear. Towards the end, you got kind of closer to the frontier, though, in terms of representing accessibility in different ways and different cells. We struggled a bit with this. Yesterday, you once and moved from Mike Love's lab, presented in a workshop ways to test out representations of. And I'm sure you know that the Greenleaf Lab and Jeff Grandja have also addressed this. What do you see happening with this as people are moving towards multi-ohm, for example, and on the research side, but also in terms of simplifying it so that it's more clinically actionable, cheaper, faster, strike a balance between good data and useful data, or lots of data and useful data. Where do you see this going? So I think I'll talk on the research side first, because as of three weeks ago, I don't see patients anymore. But the research side is fascinating for me, because I think the multi-ohm is what really helps us anchor, because we are trying to understand cell state shifts over time. And we would like to track in that same cell, especially like in a multi-ohm, use the RNA-seq data to label sort of the biological function of the cell, and then understand changes in the ATAC-seq with the different representation approaches. So we feel that we would love to benchmark our approaches and our learning of the cell identity using the multi-ohm approaches. I think that will be fantastic. Right now, there's so much of the multi-ohm is limited to PBMCs, and there's not that much available for other cell types of what we'd be very interested in studying. So I hope that there's more multi-ohm-ex data available. And like you said, we're obviously not the first lab that's doing, there's many other labs that are more advanced in terms of trying to how to represent it. But I do think that where I see this field going is to improve the resolution of how we represent the single-set ATAC-seq data. And I'm especially interested in the intergenic regions. So in some really very early pilot approaches when we try to identify cells using a single-set ATAC-seq Aplas and then trying to say and like just inferred gene activity scores, what is the cell type? We asked this question, what if we only looked at the intergenic region? Would that be sufficient? And there's some very early data that suggests that maybe the non-coding regions might be actually sufficient to identify a cell and it's not just the coding region. So that's where I see the field going. For the clinical question, again, I just think that there's such a disconnect. We need so much more work done to understand the data. I just personally, I would love to say that we're very close to interpreting single-cell RNA-seq or ATAC-seq data for patients, but I just don't see that right now. I think we need so much more work on the computational end to really meaningfully interpret the data. Thank you so much. OK, now we'll do one online question. This is from Ian Smith, who said, very interesting transcription factor model. So the question, do you expect the inferred TF gene weights to be time dependent? For example, due to change in chromatin state, and can the model account for this follow-up? And how can the inferred TF gene weights be validated? Yes, so I think, first of all, I do think that there's a possibility for the weights to be time dependent. And I do think that intersecting it with single-cell ATAC-seq data. For example, if we had multi-ohm data, that would be a fascinating approach to ask this question. The weights would be much more accurate if we could filter down for actually accessible genes. I think we would have much higher accuracy. And I do think that multi-ohm data would be very useful, and it needs to be integrated into our Bitfem model, which currently allows for integrating it, but it's not something that we routinely build in. So this is something I would like to see that built in into the future. And I think the second question was, how do we validate it? So as you all know, ground truth is always the biggest challenge that we face. And what we did for this paper is that we used like a CRISPR-Cas9 deletion of multiple transcription factors, and we tested would our inferred activities go down following deletion? And of course, the better we get at deleting, the better our deletion efficiencies get, the more data sets we have with targeted transcription factor Cas9 deletions, I think the better validation we will have, but we did see that our model was partially validated with that approach, and I'd like to just have more data sets available with transcription factor activity, transcription factor targeting to further validate it. Thanks. Thank you. Jen, Dr. Luan. Okay, so we're going to cross your fingers. We're going to close this Webex. We're going to immediately open the other one, which actually may already be open for the awards. So we'll see you in just a minute or two.