 Good morning. So I'm Lee Cooper. I'm from the Center for Comprehensive Informatics at Emory University. And today I'm going to be talking about some of our morphometric analysis of glioblastomas and correlates between morphometry, patient outcome, and molecular data. So let me get the laser pointer here. So our group is sort of imaging centric. We are contributors of tissue to TCGA, but we also consume the data. So our science PI, Dan Bratt, is a neuropathologist, and we also have an associate, David Gutman, who leads up a group of neuroradiologists who are examining radiology data. And so most of our questions focused around the idea that, you know, you observe something in an image. How does that relate to patient outcome or molecular status or response to therapy, something like that? So TCGA is really a unique data set in the sense that you have a large number of samples where you have histology that's linked with patient outcome and linked with molecular data. So if you're not familiar, we have these new devices that are proliferating more. So it's called a slide scanner, and basically you put 200 or so slides in, and overnight it produces high-resolution images for you. So these images typically have, you know, more than a billion pixels, and they're at 20x, and you can see the tissue in clear detail all the way across. So within TCGA, we have scans of frozen tissue. So each chunk, they take a top slice and a bottom slice, and that's done for quality control. So those scans are out there. They're also scans of diagnostic block permanent sections. And so that's mostly what we use for our sort of automated image analysis. These are at 20x magnification. If there's somebody out there from the TCGA, if you're interested in doing 40x, we'd really like to have that, if that's possible. So one of the other things that we have besides these images are teams of pathologists who basically look at them and rate criteria. So the basic things are percent tumor nuclei, percent necrosis. But I know in GBMs, they rated a lot of histological criteria. So is there a presence of gemistocytes? Is there an oligo component? Quite a few categories, lymphocytic infiltration, things like that. So why would you want to analyze histology and glioblastoma? Well, it turns out that glioblastoma is very heterogeneous in terms of the way it looks. So there are a lot of sort of discrete cell types that show up. You have large cells. You have gemistocytic components. But as another part of the story on the left here, what you can see is that even though GBM is a grade four astrocytoma, frequently we see oligo dendrocytic components. And there are also cells that are sort of in between an astro and an oligo type cell that don't really fit into any kind of discrete category. So a lot of this stuff is not understood while some of those component type cells are linked to specific genetic alterations. The whole thing is not clearly understood. So what we're getting at here is to try and see, are there any kind of clustering of the morphology of GBMs? If we can describe it using some kind of algorithm, do patients cluster in terms of their morphology? And then the obvious question to ask after that would be, if they do cluster, what are the links between these clusters and outcome, molecular data, et cetera? So this is just a 5,000 foot view of the sort of pipeline we've come up with. We have several layers involved in this. But the general idea is that we use image analysis to capture some description of the cells in a whole slide image that belongs to a patient. And from those descriptions, we calculate a morphology signature for the patient. And then what we do is to cluster these morphology signatures, so you're essentially clustering the patients into different groups. And once you have those groups, you can do all kinds of correlative analysis, looking at outcome, look at significant differences in expression, et cetera. So I'm going to go into each one of these components in a little more detail. So really the core of the analysis is this image analysis component. So Jun Kong in our group has developed a system that goes into these slides and circles every single nucleus. And then defines, you know, so the nucleus is circled in red. And then he defines a high confidence area of cytoplasm, since we don't have any kind of membrane marker, and these are glial cells. And then what he does is to describe these cells into using a set of features that capture the shape, the staining characteristics, texture, things like that. And so each cell gets its own description, and these things all are stored in a database for, you know, ease of use. And then what we do for each patient, we calculate a morphology signal by just taking the arithmetic mean of their cells. So basically what you're looking at is, using these descriptors, what does the average patient cell look like? Once we have these patient morphology profiles, we pass them into the clustering engine. So, you know, because of the nature of processing slides and et cetera, there's normalization that needs to be applied. We also do feature selection to eliminate redundant features that are non-informative. And we use the consensus clustering method then to get a really robust clustering of the patients together. And then we can do all kinds of visualization and low dimensional spaces, et cetera. So once we have those cluster labels for the patients that are driven by morphology, what we can do is just to follow this sort of normal pattern for an integrative type analysis. We look at survival. We look at relationship between morphology clusters and molecularly defined clusters, or, you know, classifications like the Verhoc classifications of GVM, or the G-SIMP phenotype. We also want to check against our expert pathologists and the ratings they've provided and see is there any enrichment of certain components that they can describe, like small cells or gemistic components, et cetera. And we also check against the limited set of recognized genetic alterations. From there, we pass it into a whole genome analysis where we do, you know, deep analysis looking across the genome for differences in expression among the clusters, differences in copy number, methylation, et cetera. So an analysis of 200 million nuclei from the TCGA data from 162 GBMs, we found three clusters, and we named these clusters after the functions of genes that are associated with them. So we have the cell cycle cluster on the left, the chromatin modifying cluster in the middle, and the protein biosynthesis cluster on the right. And so these groups are prognostically significant. The chromatin modifying cluster has a worse outcome. If you compare that to the other two groups, it's statistically significant. So the next thing we did now that we have these clusters is we need some type of visualization. So for each patient, we picked their cell that's closest to their morphology signature. So this is sort of the average-looking cell for each patient. And we put these into groups. And so, you know, based on our pathologist's feedback, there's some differences. The cell cycle cluster is more hyperchromatic. It's darker. It also has a slightly larger size. The chromatin modifying cluster has more bazaphylic cytoplasm, so it's kind of speckled, has the least intensely stained nuclei. And then the protein biosynthesis cluster is kind of a mixture of the two. It's sort of somewhere in between. Let's distinguish. So we validated this finding in a separate set of GBMs that we obtained from our collaborators at Henry Ford. So we just looked at, you know, the clustering, again, doing a de novo clustering using the selected features from before. And you know, immediately we recognized there's the cell cycle cluster and the chromatin modifying cluster. There's also the PB cluster, it doesn't, you know, immediately appear, but there's some kind of mixed component that's in between. And the survival trends remain the same as they did in the TCGA dataset. So this is encouraging. So now, just to go on to the associations, we looked at, you know, several different things. So we looked at association with molecular subtypes. And we found some things that are mildly significant, but nothing really definitive. The same goes for the ratings of pathology. So there's some small cell enrichment in the cell cycle cluster, some lymphocyte enrichment in the chromatin modifying cluster. But it's not really anything that's so significant, so specific to those things that it's definitive. The same goes for the genetics. So, you know, we wanted to dig a little deeper. This just drives that point home a little more. So you can see for each cluster, each cluster is a bar here, you can see the distributions of the verhox subtypes among these clusters. So there is some variation, but it's pretty close to uniform. So that doesn't really explain what we're observing here. But when we looked at the genome-wide analysis, we did find some significant results. So there are quite a few genes that are differentially expressed between these groups. And we've subjected those to all kinds of ontology and pathway analysis. One thing I would notice that this chromatin modifying cluster does have the most hypermethylated samples. So there are 244 genes that are hypermethylated there compared to the other clusters. So that's a good validation too. So one of the interesting things about the GO analysis of these gene sets is that the nuclear lumen was the most highly enriched term in all of those. So we're analyzing nuclear morphology and the genes that we pulled out when we compare these groups, you know, the most highly enriched term is related to nucleus. So other terms that were enriched were, of course, the names for the clusters. That's where the cluster names come from. But also things that you would imagine could affect shape like, you know, M-phase or DNA repair. When we also subjected these lists to an IPA analysis, we found differences in cancer-related pathways. So one of the clusters we have, you know, ATM and TP53 damage checkpoint activation differences. NFKAPB pathway went signaling, PTEN and AKT signaling. So our conclusion is that, you know, maybe these clusters are not definitive, but it seems that there really is signal within these images that relates to molecular status and also a patient outcome. So one of the things we're working on is to develop some more complex models to account for some of the heterogeneity that's in these samples, always with the risk of, you know, not wanting to overfit things. So we're developing more complex models so that we can answer questions better, correlate things better, and have more, you know, specific results. And I just want to, you know, thank the TCGA for providing a terrific data set. Here are some of our collaborators in our group. There's Joel Saraltar, our director. Dan Brad is here. David Goodman will be giving a talk later. If you can stick around, he'll be doing the radiology portion of this. And I also want to thank our collaborators at Henry Ford for providing slides for us. So that's Lisa Scarpacci and Tom Mickelson. I'll take your questions. Very nice presentation. Comment first, it's really nice to see a relationship back to traditional pathology and where it's going in the future. There's a lot of information in an H&E that was discovered hundreds of years ago and that we're finally extracting. My question is, have you looked at the relationship between tumor nuclei and stromal cell components by different categories? Andy Beck has done some nice work recently in breast cancer using that type of analysis. Are you pursuing the same thing in GBM? Yeah, so I'm not a pathologist, so I can't really comment about stroma and the role in GBM specifically, but I'm familiar with Andrew Beck's work. And we are looking at a more, you know, you could say a more complex description of structure and et cetera instead of just focusing on individual cells. So it'd be nice to know, you know, who lives close to the blood vessels and, you know, what's happening around the necrosis, et cetera. So we just need to sort of boost up some of our algorithms to be able to do those kinds of things. Hi, are you considering a supervised approach, finding individual features more correlated with prognosis, for example? Yes, so it's interesting, when you correlate these features with prognosis, any one feature does not, you know, come out as significant. But when you do clustering analysis and you're in higher dimensions, they do seem to segregate in a way that provides some prognostic significance. So, you know, we'd be interested in other methods where we can do sort of a more interesting regression type analysis. Maybe those features would pop out. I'll just let you know that we learned from you guys. So back in February of this year, we actually changed the requirement to 40X. Oh, great. Thank you. That's good news. Okay. Thanks, Lee. Thanks.