 So, first speaker is concerned Pierre Bellac from Montreal. He will tell us about dealing with clinical heterogeneity in the discovery of new biomarkers of Alzheimer's disease. Thank you. Well, thank you very much. Thanks for the invitation. Yes, the microphone works. So, I'd like to start with a confession because I messed up a little. I was invited to give that talk. I said what I would talk about and then I finally checked the program this morning. I realized I'm not in the right session. So, I'm not going to talk about necessarily open science and reproducibility. So, I thought I would change my talk to talk about that. But then I realized I don't have a huge lot to say about that because the way I deal with open science in my daily life is more like, you know, the way I do science, not the topic of research necessarily. So, what I decided to do, we keep talking about those things which are more like machine learning and clinical neuroscience, but comment extensively on the open science and reproducibility aspect of that work. So, if you feel like you're being scammed, you're welcome to leave. I'm sorry. So, I'm interested in Alzheimer's disease. When I started my lab about five years ago, I decided to just focus on that application. And Alzheimer's disease, for those who don't know what it is, it's the brain shrinks a lot. And people start suffering from major cognitive impairment and the most dramatic form in dementia where they're not able to function in their daily lives. And that picture suggests that it's a really dramatic change that's going to be very easy to diagnose if you look at somebody's brain. And to some degree, it is the case if you look really at somebody extremely healthy versus somebody who would be at sort of the late stage of dementia. But what's going on is that people now start to think that the late stage of dementia are too late to really make an impact of the disease and that we should try to develop drugs to prevent rather than repair neurodegeneration. So, what that graph shows is on the way axis, you have the degree of cognitive impairment. At the bottom, you would have dementia, mild to moderate. Here you have the prodromal stage of Alzheimer's disease which is often referred to mild cognitive impairment where people can still function in their daily life but they're starting to show objective cognitive decline. And then there's a very long preclinical phase where people don't have symptoms at all but still have Alzheimer's pathology in their brain and that can start 10, 20, maybe 30 years before the onset of dementia. So, that's the way axis. On the X axis, you have years and the points are clinical trials and what they're trying to recruit. And so back in like 10 years ago, people were more like focusing on patient population who suffer from dementia and over time they've started recruiting people at milder stage of dementia, prodromal stage or even now we've got prevention trial. We were recruiting people who don't have any sort of objective cognitive symptoms. But the thing is like how do you know you're actually unrolling people who have Alzheimer's pathology in their brain because at this stage they don't have like completely shrinked brain that's easy to pick up on an MRI. So, here comes biomarkers. If you ever attend any Alzheimer's talk you will see that schematic. I tried to be a bit creative here. I picked up the variant by spelling et al but it's really generally proposed by Cliff Jack and that tries to show sort of the cascade of events. It's a theoretical model. There's no data behind this really. But here what you have is the clinical function. So really what is going to drive this diagnosis of mild cognitive impairment and dementia that's going to pick up really late in the disease and that's typically preceded by the shrinkage of the brain that's the blue line atrophy and a decline in cognitive abilities but that is subjective. You still don't go, you're not an outlier in terms of cognitive test from a clinical perspective but you do decline, you know, just worse than you were before but still kind of normal. And the idea is that that would be preceded by an accumulation of proteins in the brain under forms that they shouldn't take. That would be beta amyloid plex and tau neurofibrillary tangles. Yeah, that's it. So the beta amyloid would start decades before the tau would arrive a bit after and in between here you would have synaptic dysfunction which maybe we can pick up with functional imaging. So the general idea is can we use a mix of those things and try to diagnose Alzheimer's disease and future progression to dementia quite early. So that's where the story gets complicated. Here what I'm showing you are maps of cortical atrophy in a fairly large sample that's called ANNI. So here that analysis is based on about 400 subjects mixing people who are cognitively normal but elderly folks and people who suffer from mild dementia. And we generated maps of gramatoid density with SPM and we just try to measure and the quality of the contrast is not really good but that matrix measure for each pair of subjects or similar their patterns of brain atrophy are. And highlighted along the diagonals are seven cluster of subjects where you've got relatively high value in terms of the similarity. So that means that the special organization, the topography of their atrophy in the brain look more or less similar. So that quadrant and here what you have is a map of their atrophy after subtraction of the overall mean. So what you can see is for example this pattern here people are very thin or low gramatter volume in the temporal pole. Okay. And this type of topography seems to be found more permanently in patients who suffer with dementia which we did expect, you know, we know that Alzheimer's disease tends to create atrophy in the temporal lobe. It's not the only one though. You've got this one which is a more like posterior atrophy pattern. You've got that also predominantly seen in patients with dementia. You're going to have this one which is an atrophy that's really spread out all over the cortex. Okay. So here that picture seems to suggest that you'd have really seven discrete subtype of atrophy in that population. I'm not just suggesting that on the lower end what you have is for each of those seven subtype and measure of similarity of that map which is an average of a subgroup and the map of an individual. And what you can see is that it's more or less a continuum. So if a subject falls into a given subtype it's going to tend to look like that subtype but sometimes you can see some overlap like that. So it's more like I think about the subtype as more of a continuous summary in low dimension of the variations you have in your data. I could have picked many other type of data dimension reduction what I like about cluster analysis like that that I can interpret those maps really easily. They're just average of subgroups which with a PCA or an ICA actually becomes really challenging very fast for actually results indeed by Pamela where I don't know where Pamela is but her talk this morning was highlighting those problems of interpretation in linear mixture models. So my group has not been the first to describe that by any way there's a couple of papers who have reported that recently. Wangedal in 2016 in particular. So these are three of the subtypes we found as associated with dementia and those are the three ones that were reported by Wangedal. And so that's where the open science reproducibility kicks in for a little while. It's interesting that these two analysis coincide quite well like here you've got a subtype with a big atrophy in the temple pole. Here you have the posterior one here in our sample here in their sample and then that's the one they call the diffuse one. Again the contrast is not great on that screen but you can see this sort of like temple frontal pattern which you find here too. So they also use ADNI but they haven't used the same set of subjects and we used SPM to generate those maps and they used a CVET pipeline with cortical sickness measurements. We actually also have run a CVET our own version of CVET on the same data and found also good convergence. So reason why we use SPM and so that's the sort of like open science story is that I wanted those tools to be easily reusable and reusable in a industrial setting if a company wants to use those tools and replicate what we have I found it was very important we could do so. So Angela Thames a student who did that where explained a couple of weeks disassembling SPM to get rid of all the MATLAB parts keeping only the code that works on the octaves and build a container that replicates this pipeline and that's on Docker hub right now. So if you want to pick up the ADNI sample and run that pipeline you should get the same maps as because you also should have the code to do it. Actually I have a student who left the lab to start a company and recently I was listening to him talk at a panel about transitioning to the industry and he was complaining a bit that he had to do all open science tools during his PhD and that he couldn't keep the IP from his work for his company but he took that pipeline which did not develop and using it in that company so in a way it benefited him on other levels. All right so just saying so once we've established the identity here the next question is can we leverage it in terms of prediction? The thing is that there's a bit of a glass ceiling in terms of early prognosis for AD. People seem to be hovering around 80% accuracy when they try to predict progression to dementia within two, three years and there doesn't seem to be a technique that cracks open that except papers that just don't cross validate properly the data and it's apparent from the method section and they do deep learning too but anyway one way of dealing with heterogeneity would be we try to make a very simple schematic that illustrate our motivation. Christian D'Enthreau developed that. He wasn't a panel yesterday so you would have two labels you try to separate red dots and blue dots and here really you have two clear groups you can separate very easily but there's also another subgroup where those two labels are completely mixed. If you do a simple linear super vector machine what that's gonna do is separate try to make a tradeoff in a way cut the line in two and you're gonna get a not so great accuracy not so bad actually because their accuracy is fantastic on the left but it's kind of chance level on the right and probably you'll be at 75% accuracy if you do that with your heterogeneous data. So the idea that Christian had was to say well what about we try to rerun that classification many times we subset of our data and check if the classification actually is very robust on some of our subjects and if you do that what you'll find is that those points actually are very unstable unreliable prediction associated with them while the left one are very hard to mess up with like you know they're very clear the model will always converge right on those so now what you can do is try to train a second classifier that's gonna aim at just those high confidence points and what you end up doing is a classifier that picks up those red dots here those blue dots there so in effect what you've done is you've traded sensitivity you're missing a lot of the red dots here consistently for specificity you almost never pick up blue dots and you also gain in precision precision is a measure out of everything I predicted is red or many are actually red well here my precision is almost 100% well here it was probably something like 60-60% funnily enough those numbers are what we face in AD they're about 80% accurate and they're about a 50-60% precision so it's pretty standard to do that in machine learning usually what people do is that they're gonna look how far they are from the classification line and then try to just stay away from here but in that configuration if you did with this strategy it would perform really bad because in order to get rid of that whole cloud here you would have to be very very very far from the separation line so we found that this simple two-stage model had a lot of potential and we applied that idea and surprisingly it's one of the first time it works like that in my career it worked almost right away without tweaking things much so here are results for a separation of patients with dementia versus control on any the top row is the any one simple which has about as I said 400 subjects and we try different model because actually cognitive deficit in it for themselves can distinguish those population very well so here you have only structural features only cognitive features and a mix of cognitive and structural features that's for each graph it's like that and here you have some specificity and precision and so every time you have two bars on the left is a simple SVM model and the right is the model that has been pushed to be in a high confidence regime and so what you can see is that consistently you lose insensitivity exactly like in the toy example you gain in specificity exactly like in the toy example and you gain in precision as well cognition alone works really well if you add structural imaging you gain a little bit in terms of percentages structural imaging by itself does a no-key job so especially if you look at the high precision classifier you can be 90% precise on this type of task just by looking at the MRI and this very simple hierarchical clustering so that has been trained on that sample and then we took our model we didn't touch it and we ran it on the NE2 sample so that's 1.5T data that's 3T data and that data was not used at all to estimate the model the subtypes nothing and we replicated the performance almost exactly so we felt okay so the folks that are being tagged with high confidence at dementia can we find them now for folks who don't have dementia so we did that we went and took the MCI sample from the database and we run our classifier and asked do you recognize people with dementia with high confidence and he did so we stratified our sample in 3 those are the people that a linear SVM would not say have dementia in the middle you have the people a linear SVM would say have dementia but not the high confidence version of the model and on top you have people being tagged by the high confidence version of the model and what you have on the y-axis is cognitive score with the ADASCOG so higher is worst and on the x-axis you have time after the imaging session up to 3 years and you can see that the model picks people who have worst cognition to start with which is perfectly logical it's a machine learning that is fed cognitive variable in that strain to recognize people with dementia so those are gonna have like bad cognition to start with that's normal but you can see that they do very quickly degrade as well while people who are tagged negative by the model are almost perfectly stable and you have this nice sort of gradient those error bars I insist are error bars of the population they're not error bars on the mean so you have almost like no overlap in terms of distribution of ADASCOG scores between high confidence AD subjects and control but all those like MCI patients so we try to see well what's the actual predictive power in terms of progression to dementia so if we look at the anyone and I need to sample if you look at the negative subjects only 30% of them progress to dementia well if we look at the high confidence subject 93% of them will progress to dementia and in any do it was 80 now those numbers they seem quite different but in any one almost half of the MCI overall progressed well in any two is only like about 25% so if we adjust our precision number to the actual prevalence of progression in the two sample we get 80% for anyone and 87% in 92 to my knowledge those are the highest precision metrics out there at the moment even though we're only using a simple MRI and a cognitive score so just those are a breakdown in terms of sensitivity specificity and precision just like I had shown you on AD versus control but now for MCI progress services table we get the exact same behavior we've essentially traded less sensitivity for more specificity and precision overall so there's nothing special about the machine learning model we have here it's not like a very like clever deep learning anything just ideas that we didn't choose a better model we should a better objective we try to achieve high precision instead of overall accuracy on the whole sample we move the goalpost in a way so importantly the models that are used to do those predictions here I've never seen an MCI data set before they were trained entirely on AD minus control on any one and we actually add the pipeline ready before we started doing this work so there was a no P hacking involved here that's my open sense touch I'm fine okay okay great so it really I mean historically I've been mostly functional MRI research so we tried also in functional MRI actually we started by that but I felt it made more sense to start with a structural MRI study so you can do the same kind of approach with fMRI maps I don't know if you're familiar with fMRI but if you're not well it's technique and give you a map and you can but you can do it for different networks so with structural imaging you get one map with functional MRI you can get many many many maps here we got seven maps they go by network here it's a network that's called a different network is a very popular network and so that's an average across all subjects those are individual maps we compute a similarity matrix across subject we're in a cluster analysis we find three subgroups and then we can compute averages per subgroup which here have not been demeaned which means it's a very straight average we haven't subtracted the grant average which is on the left so here our sample size is much smaller though we have about 60 control and 60 patients some of them with MCI some of them with dementia so those results are to take it with a grain of salt unfortunately any only study during fMRI in any two and only on a third of their participants so we suffer from a small sample size syndrome in fMRI for any in any but that's how it is so for fMRI there's a big concern in the community that fMRI maps are not reliable in particular when you're talking about short time series and by short I mean about seven, ten minutes which is what we have in ADNI so we took the subtypes that were generated in ADNI and went and looked at the test-free test data set that we shared openly as part of the course the core consortium for reproducibility so in those data we have two sessions in each session we have two runs so we can test the test-free test reliability within and between sessions and we've generated subtype weight so how much any given subject look like any specific subtype for run one, run two of the first session and from those replication you can compute a nice C score and it's about 0.6 0.5, 0.6 which is moderate moderate is that enough to do machine learning? Well, we'll see but it's not clear it's not catastrophic but it's not great either and that's very consistent with what's been said in the literature or ephemera is sort of okay rest of ephemera connectivity but not great reproducibility so first thing we did is just to see there's a group-wise difference between people with dementia or MCI and controls and sure enough there were some differences so the differences here are illustrated by the distribution of the weight and so here's from the DMN we've got two subtypes of the DMN which are associated with dementia one which was more present in patients the other one more in controls so this one is more in control this one is more in patients it showed differential connectivity in the DMN that seems reasonable consistent with the literature the FX side though was small to moderate so then we tried to mix structural MRI matrix and functional MRI matrix and not looking at coordination at all and we've done the same kind of study we did with structural MRI so first of all trying to discriminate patient-mysus control and what you have here is a performance of fMRI alone structural MRI alone and structural MRI mixed with fMRI so fMRI alone is bad we are looking at precision around 60% after we do the boosting actually it's more than random on the baseline structural MRI that's good very similar to what we observe is a much more simple size in the first study I presented to you but when you combine the two you can get a boost in precision which really relates to a small boost in sensitivity here and specificity there so the best performing model was a combination of structural and functional MRI and we are almost at 100% precision when we use those two matrix together now to distinguish our chords and just like before we reuse that model and then looked if we could find people tagged at high confidence AD in MCI population but we didn't train the model in MCI at all and that gave us that out of all the MCI we had in the sample about 33% of them progressed to dementia and out of the people who were tagged that's 9 of them you know 10 of them 9 progressed so that's 90% so it's fantastic numbers in terms of 90% it's very low numbers instead of 9 out of 10 that's just how it is we don't have enough data so we've been trying to publish those results for now a year and a half I'm starting to be a bit sensitive but we've got new data and we're pulling everything and soon we'll resubmit with double the sample size and hopefully it's going to go something interesting though is out of those 10 people first of all the person who did not progress actually dropped from the study we don't know exactly what happened to that person but add notes of constant cognitive decline so it's almost like a nine and a half out of 10 and there were also all AV45 positives some of them have beta amyloid in their brain it costs a huge lot of money to test for beta amyloid positivity and here we only use MRI data so if the only thing we could do with that technology is detect people who have beta amyloid positives that in it for itself is super interesting for our farmers and we also enriched our sample for something that's called APOE4 which is a known risk of Alzheimer's disease so in the couple minutes I have left couple announcements on open science and reproducibility I really think the next level for that research is to look at more than just ADN controls because people who come to the clinic they're just not in those sort of extremely stratified pools so as part of the consortium Canadian consortium on neurodegenerate in aging we're acquiring a much more representative sample this is going to have over 1,500 subjects about 500 have been recruited now and they're going to span a number of different known sort of like canonical variants of dementia in aging so we're going to have people with Alzheimer's dementia but also from to the temporal dementia everybody dementia vascular cognitive impairment mixed etiology dementia and L.C. controls too so that's across Canada we currently have 15 sites and I really agree with what Gales was saying this morning that if we want to train models that are going to generalize in the real life you need heterogeneity you need heterogeneity in terms of the background of your subjects biological heterogeneity but also how you scan them so here we have G, Phillips, Siemens the whole thing big mixture and in our core protocol we do have functional and structural MRI as well as diffusion MRI and we're going to do genomics and metabolomics on those subjects those data is going to be available our first release in schedule in a couple of weeks I've also worked with Canadian scientists to try to generate high quality and standardized derivative based only on open tools that are free to use in an academic or industrial setting so we'll get functional diffusion and structural derivatives for all those folks our plan is to get those high confidence subtypes identified and also share those and hopefully we'll be able to find subtypes from imaging which are very closely associated with clinical symptoms a big question mark is whether that's feasible at all because we're going to have all those scanners and fMRI has only 0.5 to 0.6 ICC which is probably not that great so here's a little experiment in a couple seconds if you want to learn more you can go see Aman Pridh Badwar at poster 66 she's got a full poster on this but what that experiment shows is that we took data from a subject which is called SuperSimon which has traveled across Canada for the past two and a half years and we've got multiple scans at certain sites from him hopefully soon we're going to scale up to over 30 sites but right now it's about 15 sites there are G's, Siemens and Philips and we mixed those data with data from China where we have 30 subjects that have been scanned 10 times over a month and then we did a fingerprinting experiment every time we select two scans for each subject we generate functional connectivity maps in different networks we do cluster analysis and we see if the two subjects well the two scans of the subjects have been paired together we can do that for every subject so the box plot shows you the distribution for seven networks in the brain the box plot are for the HNU1 the Chinese sample they're all scanned at the same place and the red dots are SuperSimon so that means that if you look at map of the silence network you can almost say that SuperSimon is SuperSimon and not somebody from China with you know about 90% accuracy despite the fact that it's been scanned over two thousand and a half all over Canada on every single manufacturer so is 0.5.6 ICC good enough well for fingerprinting? That is well and there's a paper from KAMH we just came up that showed that but there are four traveling phantoms so here we did this bit of mix and match experiment with different data set because we only had one traveling subject but they only have three sites we have 13 so but it's very convergent so to conclude with large sample you can reliably identify subtypes of brain phenotypes structure and function and I really believe there's those fully data driven subtypes they're not that exciting for itself they're only weakly to moderately associated with clinical symptoms but if you look at specific subgroup who share a signature of different subtypes then you can find signatures that are highly predictive of clinical condition and both in diagnosis and prognosis and you don't need fancy machine learning model all you need is tweak your model to be less sensitive and more specific I believe it would be still great to have deep learning models that could learn complex mixture what I said really works if you have two clear groups and noise so it's a limitation and open science tools have been key in all that work it's all open data we've used and hopefully we're going to be able to give back to the community and help the community reach the next stage with CCNA study thank you very much thanks for a great talk about open science and reproducible science we have time for a few questions yeah so I have two questions the first question is when are you going to be satisfied with the ADNI1 performance metrics like is there a point where you say this is what humans are probably doing themselves in terms of diagnosis especially between MCI and AD which I presume is not consistent across physicians so if you compare AD diagnosis based on cognition and based on looking at what people have in their brain when you cut them your accuracy is actually pretty bad you have I mean depending on how picky you are to define AD pathology but about half of people with a deep dimension don't have AD pathology so I don't think we can hope to build a machine learning model that would perfectly predict our clinical levels if we do we're probably not actually predicting what we're interested in right so the point of that talk was to say we need to maybe like shift our objectives and maybe we cannot do accurate prediction for all of the subject but maybe we can find some like an at-risk type the energy I do is with genes you have some genes which are very low penetrance they're associated but they explain maybe 1% of the phenotype and others you have it you're going to have Alzheimer's disease every time and within a window of five years I believe there are those high penetrance subtypes in brain phenotypes we just haven't discovered them yet that's the point of that talk okay and follow-up question a bit unrelated is if you could say something because you mentioned a lot about discovering subtypes in this I wonder if you could compare that with the event-based model work from the group in London for which covers more that this is progression like this index all right so yeah there's a number of multivariate model of disease progression including we have a very permanent researcher here Yasser Teria Medina who was working in Alan Evans group and those two works are fascinating in that they are linking many facets of the disease as far as I know they haven't been evaluated as predictive tools and so you know I don't really have a comparison to make in terms of the metrics I care about as of now maybe I'm wrong but if I am please find me and correct me one more why the next speaker is setting up yeah hi thank you for the for the talk here just there's one point that I didn't understand is the idea of heterogeneity and you know the different scanning acquisition protocols and all that in your cluster discovery group do you want to have a lot of her heterogeneity or not I didn't catch that part so the question is do we really want a lot of heterogeneity in the discovery group and I would say yes we have a study on schizophrenia Pierre Abon's first author that clearly demonstrates that heterogeneity in the sort of training sample helps for generalizability of the findings now here we sort of took two extremes controls and people with dementia and I think that helps the model because the fully data driven model so you want the clinical phenotypes to actually explain a lot of violence in your data so I don't think that necessarily the UK Biobank with which is just a general population sample would be the best to find subtypes of dementia actually for rare forms of dementia they will have very few subjects even though they have 100,000 individual they will have very few with frontotemporal dementia in there in the end because it's just very rare condition so then maybe just a follow up question then in your testing and when your validation data set do you also want to have a lot of heterogeneity or presumably if you've captured it well with your cluster discovery group then you can just use whatever so I don't have a strong opinion the reason why we did what we did which is control versus dementia patients and then look how it goes in the MCIs that people before us did it so we just followed kind of like what's been done in the field there's probably a clever way of doing it but I haven't talked about it thank you Hi Pierre entertaining talk as always it seems to me that the holy grail here is to associate those functional connectivity subtypes with behavioral subtypes so could you speak a little bit to that we want to get at the mechanisms of the disorders rather than just classifications thereof so yeah indeed clearly here we are just focusing on clinical diagnostics those kind of like very binary entity it'd be a lot of fun to subtype behavior and we've tried to do that on HCP data with activation maps and the battery of behavioral measures and we found that deacetion are pretty weak and I think that's consistent with what the Oxford group has reported they report significant as usual because you have a thousand subjects but the actual association are pretty weak so yeah I don't know there may be as I said some aspects of behavior on which you have a high penetrance from brain phenotypes and some others where it's really education school you've done stuff like that and yeah I don't clear view on which aspects of behavior we should focus on or I can say that yeah in terms of big phenotypes like Alzheimer we've had some success with that technique and when we try to do the same thing with more like simple sort of like endonness or that didn't work so well to be continued okay hi it's Michi thank you for the talk I have one question regarding the features for functional MRI could you tell me more about what did you use for each subject because you talked about different networks absolutely so we use seed based connectivity map like Barad Bisval in 95 after standard preprocessing and using an entire network as a seed from a group level atlas which are freely available open okay thank you very much let's thank Pierre again