 Thank you, Pierre. Thanks a lot. And thanks also a lot to Benelah, who did a really great talk right before me. So, yeah, I'd like to put forward the concepts of psychoinformatics, as opposed to neuroinformatics. And what I'd like, and I'd like to show that machine learning on brain imaging, I think, can really bring something to the field, and has brought in the past. So what I would really like to do here is to propose a research program and to illustrate it by things we've done in the past. Really, the start of this research program is that we're really, these days, accumulating functional brain data that is extremely rich and that is getting more and more standardized and reusable, and thanks a lot, basically, to the effort of mostly people who are here. And really, the question is, how are we going to use this to ground theories of the mind, and what I call the theory of the mind, is basically laws for perceptions, decision action, emotion, basically, for the cognitive functions. And if we think about how, currently, we typically study mental processes, the way we will work is that we're going to craft an experimental condition that recruits a given mental process, and we're going to do some form of elementary psychological manipulation, and then, most likely, we're going to contrast those. And so what this is telling us is that we typically study mental processes via oppositions, and I find that this is a very strong reductionism. I find that this is a limiting factor, and one of the challenges is that to interpret the results, basically, we're bound to a paradigm, an experimental paradigm, but sometimes a conceptual paradigm. So, this is a real challenge, it is not an easy problem, and what we're proposing is that generalization is really a guiding principle that can be used to build broader theories. And the kind of generalization we're looking at is can we go from one given experimental paradigm, so, for instance, you're showing a stimuli to subjects, to another one, for instance, the subject is thinking about stimuli. And why this is important is that the experimental details are different, so you can believe that your results are broader than the experimental details. So the challenge that we're going to be facing is how do we bridge paradigms, and this is actually a very difficult challenge. So we're going to be using brain imaging to do this, mostly functional brain imaging, and basically we're going to be using it to learn brain mind associations. And I've been doing this for a little while, and one thing I've learned over maybe ten years is that the imaging part was the easy part. At least I believe that modeling the signal, which is basically what I've been doing for ten years, is the easy part, and the hard part is modeling the brain, the behavior and the cognition. So Russ Poldreck and myself have just finished writing an opinion paper that claims that predictive models can broaden theories by generalizing brain mind associations to arbitrary new-task and stimuli. And so hopefully this is going to be out soon. I want to explore and give examples of concretely the challenges that we face by doing this and how we can address them, and I'll show several examples. And I'll be basically walking in Pamela's footprints, so I'll start with encoding models, and I'll show you that they can go beyond a position. And this is mostly based on a paper that we published last year with Michael Eichenberg. So we were interested in vision, and the challenge in vision is how do we break down the ecological situation, which is complex natural images. So we need to break down those complex natural images into elementary psychological processes. And historically this is being done. Vision is an area of cognition where we know many things, and maybe the projectmatic experiment is the one of Huber and Wiesel, where they showed that basically the V1 neurons were receptive to GABA or filters, were receptive to edges. And to do this, they crafted specific stimuli and they showed them to cats. And later on, people continued working in this direction, crafting more complex stimuli and showing different receptive fields. And so the current picture of the visual system is the one of a cascade of regions that are specific to specific aspects of the stimuli. And so we have, you know, a nudge detector, we have a corner detector, we probably have a face detector. So maybe we have a foot detector and maybe we have a left big toe detector, okay? So that's all good. I can craft an experiment that will do this and I'll craft stimuli that have a left big foot and I'll oppose them to stimuli with faces. I'll acquire fMRI data, I'll acquire lots of fMRI data, I'll do a contrast and if I have enough stimuli, I'll get something that's significant and I'll publish about my left big toe region and probably won't go through in a good paper. Why? Well, because people won't believe that I've used good stimuli, they're not well-balanced, and people will not believe basically that I have a proper theory, cognitive theory that analyze my experiment. And so what I'm showing you here is that boiling down cognition to a new position is extremely fragile. And if I take a stupider position, then I'll get a result that will be significant, maybe interpretable, in the sense that I will have found a region that is different in faces and left big toes, but not interesting. And I claim this is because we're reducing too much cognition during those experiments. And the nice thing is that encoding models allow us to go further because they can first be used to build complex representations of the stimuli, not simple one, and once we have this complex representation of the stimuli, the only way we can do statistical analysis is basically predicting the brain response, and that's for technical reasons because those very high-dimensional objects basically call for high-dimensional statistics, and high-dimensional statistics and machine learning are pretty much the same thing. So if we want to go back to the visual stimuli, how do we decompose visual stimuli? Well, we can follow the footsteps of Alhousen's and Field that basically tell us that the reason that low-level visual cortex is tuned to image statistics is because – is tuned to – sorry – to Gabor filters is basically because those are the simplest representations of natural image statistics. Basically, any kind of simple analysis that you do on natural images will show you that they're what are represented with Gabor's. So what drives high-level representations? So we can use the modern, powerful statistical tools to address natural images, which are basically convolutional neural nets. And so what we can do then is to take convolutional neural nets and to build an encoding model from this. Now, the way this works is that we take convolutional neural nets that have been trained on natural images. So basically we take readily-trained convolutional neural nets, and then we take a data set where subjects have seen natural images. We represent those natural images with the convolutional neural nets because the continents have several intermediate representations. From those intermediate representations, we build an encoding model that is basically trying to predict brain imaging, all right? So we were not the first to do this. A bunch of people did this. And so what we showed is – and what we and others actually showed is that these features of natural images, so the intermediate representations, explain the stimuli much better than handcrafted features. And what we showed that we found was really interesting is that they not only explain brain imaging, but they also map the visual system. So what you're seeing here is which intermediate – which layer of the convolutional neural net maps best to brain activity. What you're seeing is that we can see a gradient basically of low level to high level, and we're really actually mapping the areas of the visual cortex, but in a way that's actually fairly weak in terms of hypothesis. We haven't decomposed or stimuli to oppositions. So once we have this, we can actually – what we have here is basically something that can predict brain activity. So we can use it to simulate classical experiments. So what we've done here is that we've done a synthetic retinotopy. We've taken our model of brain activity, we've predicted the brain activity for wedges, and then we did a standard analysis on this. So what you're seeing here is basically a response to eccentricity, but artificial. So it's an artificial retinotopy. So it's showing us that our model has captured things that artificial stimuli also capture. And so we also did the same thing using phase versus place oppositions, and what you can see here is that – so we used – to generate these maps, we used the phase versus place stimuli of the historical XP experiment, and what you can see is that they have actually different low-level statistics than the one there are in the Gallant studies. And so what we're showing you is that the high-level properties are carrying over, even though the low-level properties are different. So what I really like about this work is that it really shows that we can go beyond opposition, that without oppositions we can recover the full hierarchy of visual modules, but more importantly that it generalizes across different patterns. So the results that we have show that we've worked on data sets that were natural images, but we can reproduce classical results that are done with extremely non-natural stimuli. So this I personally find exciting. Now the question is can we go one step further? Can we generalize to arbitrary paradigms across tasks? And this can be useful to characterize the function of brain structures. So really the question here is how much observed activity is a consequence of the specificities of the paradigms? And the school of thought, I would say, is the idea that you can use large-scale decoding for principle reverse inference. So basically we would like to conclude on the function of given brain structures and using decoding across paradigms is useful for this. But this is challenging because what we find is that the paradigms that may study similar mental processes may have a lot of things and difference. For instance we could be giving a visual stimuli versus an auditory stimuli. And here I should have put the auditory stimuli in French. So is this a confound or not? So there are two kinds of heterogeneities. One of them is technical, so scanner, stimulus, modality. But one of them is a fundamental problem in psychology which is the pragmatic isolation. And this is a problem that has been well-known and that has been put forward, for instance by Newell in 1973 where he claimed that you cannot play 20 questions with nature and wind. So there's really this challenge of generalizing across paradigms. And so what we did, so basically the question we're asking is if you give me an image of brain activity, what can I conclude about the ongoing mental processes? And so the way we tackle this is that we describe the tasks by their cognitive components. So we did this in a predictive setting. And in this case this is what's known as a multi-label prediction. So it's not a multi-class prediction in the sense that you're not trying to assign each experiment to one class, you're trying to characterize one experiment by a bunch of different labels. So we're using a cognitive ontology to label experiments. And we're using cock-bow. And then we're trying to generalize across study. So basically we're trying to describe the tasks that we've never seen from the brain activity of books. So where we're trying to get at is we're trying to find regions that are specific to certain facets of cognition. But specific is a word that has many different meanings. So I'd rather use that predict facets of cognition. So we've done this on a fairly large database that was actually assembled many years ago by Yannick Schwartz. It has many studies and many different subjects and many different experimental conditions. But what I find really interesting is that it has data that comes from different labs and different cognitive science labs. And having worked with different cognitive science labs, I know that there are different opinions on what is the right way to conduct a cognitive neuroimaging experiment. And I find this really interesting to see if we can generalize things across those different ways of working. And so the answer is yes we can. We get a reasonable prediction accuracy across a variety of different concepts. Some of them are harder to predict than others. Though I wouldn't jump on the conclusion because with only 30 studies, we don't get a lot of redundancy. What we found is that it's really important to use some form of ontology to shape those labels to predict and this gives us a better prediction. Then I said that we wanted regions and there we have to address a challenge that Pamela mentioned, which is the fact that when you have a non-identity covariance, you can actually pick up regions that are not marginally related to what you're trying to predict, but conditionally related to what you're trying to predict via the noise. So the way we tackle this is basically we do some form of conjunction between a standard analysis, so a forward inference, and the decoder. I won't go into the details, but basically we're trying to have something that is both conditionally and marginally related to the aspects of the stimuli we're interested in. And so if we do this, we're really narrowing down on the important regions. This is for the concept of a place in the brain. And so here I'm comparing different approaches. I'm comparing crafting contrasts and what you find is that crafting contrast is actually really hard when you're working across experiments because the stimuli are not well balanced. It's extremely hard to balance the stimuli. So here we're looking at the visual cortex, and we find that basically we tend to overload the mid-level visual areas because it's extremely hard for us to cancel out stimuli there. And the way people work, by the way, is that in a given experiment they work until their stimuli are balanced. But across experiments or cross paradigms, it's very hard to do. So I'm showing the Neurosynth reverse inference, and it actually does much better, but it still has trouble separating those mid-level regions. And I believe that the reason this is the case is because the Neurosynth uses a naive-based, the Neurosynth reverse inference uses a naive-based decoder. And a naive-based decoder is basically a univariate decoder. It's just a sum of univariate decoders. So it's not marginalizing on the activity of other voxels. And so what we're doing is we're using a multivariate decoder. It's basically some form of complicated linear decoder. And when we're doing this, we're conditioning on the activity of other voxels. So by doing this, this is actually removing basically the joint information. And this is helping us to narrow down on the most important regions. So we basically are covering the cortex with this breaking down the cortex into different relevant regions. And so basically we're getting a map of what I call the map of reverse interference. So I've shown you that generalizing across tasks with decoder was a powerful tool. To do this, we've had to move away from the classic oppositions that basically classification uses and use the multi-label setting. One thing that we've found and that's something that we've found quite often is that prediction using discriminative models is more robust to heterogeneity. Here we have paradigm heterogeneity and basically standard statistics pick up imbalances very fast. But discriminative models, because they're discriminative, can be more robust to those imbalances. So one thing I'm excited about is that these results are grounded on some clear measure of prediction accuracy. And I think this makes those regions more trustworthy than regions that are just achieved by delineating them from a review. And the reviewers of this paper didn't agree with this, by the way. And the paper didn't go through PNAS because similar drawings had been published. That is an epistemological difference between the mathematician that I am and cognitive science. So this is, for me this is exciting because this is giving us an alley for what I call atlas incognition. This is one way where we can, as we gather more data, where we can hope to have more resolved atlases of cognition. Now, something that was crucial for this work was that we needed some ontology of cognition. And it turns out that these things are terribly hard to build. And even when you have them, it's extremely hard to label studies with such an ontology of cognition. And I see that NG is nodding because we have been doing this and they have been doing this. And let me tell you, it's just harder than you think. So we tried to go one step further and we tried to say, well, can we actually use the terms that people use in their own study, even though they're somewhat horrible sometimes because they're complete jargon? And can we still use these terms and build universal representations of cognition with this? And the idea being that we're going to do decoding, but we're going to do decoding with a latent structure in the middle. And that latent structure is going to allow us to do oppositions the way people do it, but still find commonalities across studies. And so in machine learning, this is typically a multitask learning problem. We're solving many related problems, but different. So it's many classification problems, not a multi-label problem because it's allowing us to go around the ontology problem. And the idea being that we're going to learn commonalities via the intermediate representations. So for this, the first thing that we want to use is deep learning, deep architectures. And they're really great for multi-output, so a multitask. And basically they learn intermediate representations that can capture a lot of information. Now, the problem, as we've heard in the talk before, is that they typically come up with a huge amount of parameters. And we probably don't have enough data for this. So we're going to simplify them. And we can simplify them even further, but if we simplify them even further, then we basically get a softmax on a single layer, which is a logistic regression. So what we're going to do is that we're going to do basically a deep logistic regression. And I'll explain a bit more. So what we're doing, I know this slide is kind of complicated. I don't expect you to see everything there. What we're doing is actually a two-headed hidden layer, so a three-layer deep linear model. And why is it linear? It's because we have no nonlinearities in the middle, and empirically they didn't add anything. So we're stuck with no nonlinearities. So the first layer is actually trade unsupervised in a way that is similar to what the first presenter was presenting, or actually in a way that's similar to learning on ICA or PCAs. So we're using non-negative matrix factorization or addiction earning. It's the same thing on a huge amount of resting state data. And what we found is that the more resting state data you have, the better. And we're learning 512 components. So it's 512 localized brain modules. And then on top of this, we're training a factorized logistic regression or a deep logistic regression. You call it the way you want. And what we found is, and so this logistic regression actually has multiple output heads because it's a multitask problem. So we're solving classification. We're solving decoding in each study jointly. And so technically getting this thing to work was really hard. So my student, Artur Mencher, did this really amazing work. And what he had to do was to use Bayesian deep learning techniques that we found out were good regularizers for this. So if there's one technical take home message, Bayesian deep learning is interesting. So what this gives us is intermediate representation that we call task optimized networks. And these are really powerful because we can use them then because so they're... I'm talking about these guys. And on top of this, what we have here is just a simple logistic regression. Oh, sorry. These guys, right? And on top of this, what we have is just a simple logistic regression. So these intermediate representations will improve decoding in each study. So if we compare logistic regression to learning on these intermediate representation, what we find is that we almost always get a gain in accuracy, ranging from 15% to actually sometimes a loss of a few percent. And what we're finding is that this gain is actually related to the study size. So in our group of study, the very large studies don't benefit. The small ones benefit a lot. And we can go even one step further and we can compare the decoding accuracy with a simple logistic regression and our deep linear logistic regression. And what we can see is that the less subjects we have, the more the impact of this model. And it makes sense because this model is actually transferring information from the big studies to the small studies. So that's one thing I'm excited about. We know that we're underpowered typically in cognitive neuroscience. We also know that we need many studies to be able to explore the different aspects of cognition. So I hope this is an ally forward for this. And so I can show you the corresponding brain networks. And I'm sure you can't see anything here. So let me zoom in. So what we have is that we have brain networks, but we can also represent how they're loaded in the original names that people are giving to their contrasts. And so you can find that there are all kind of horrible names. For instance, I don't know if you see, but in purple there's C01 and C02. C stands for complexity. It is people who are looking at the complexity of sentences in language. Now capturing this in an automated analysis is terribly hard. It really requires you to read the paper and sometimes to ask questions to the person who did the paper. What we're finding here is that we're able to use this information without manual labeling because we're able to use the original labels that people are giving. So what I've shown you here is that using multi-task decoding across many studies, we're able to extract some form of universal cognitive representation. And I call it universal because if I bring in a new study, it improves decoding power in this new study. And for this, we use deep linear models that basically share intermediate representations across study. And I think this is very useful because it will improve the statistical power of small studies. And so once again, if we look at the networks, they're somewhat meaningful, sometimes extremely meaningful. And if we look at the names that are associated to them, we understand them. So for instance here, it's motor, but it's also right auditory visual click or object grasp. But doing this in an automated way is extremely challenging. So as a human being, you understand that object grasp and motor are related things. But it's a knowledge representation challenge. I think it's an important one, but it's not there yet. So I want to say a few words really fast about learning across subjects, and then I'll wrap up. So learning across subjects promises us to give us biomarkers, and this is an important challenge. And I also hear a lot for this and for other things that heterogeneity is going to kill this research agenda. And this heterogeneity can be in the scanning, but it can also be in the clinics. So typically if you're looking at for instance autism, autism is a disease that has a lot of heterogeneity. And what we've been finding, so last year and this year with a new, with actually a competition that we ran in the lab, is that if you have enough data that you're able to sample this heterogeneity. So samples, so basically you're using, so if your heterogeneity is one across sites, then you're using multi-site data. And you're predicting across sites, then if you have enough data, you're basically able, as far as I can tell, you're able to address it. So I don't think that heterogeneity, well, I'm sure it will become a problem. But first I want to see it, because so far what I've seen is more data always performs better. And so we ran a competition, I don't have the results yet, but we ran a competition on autism. A prediction on a larger data set were in the hidden test set. There was a site that we were not allowed to share. And on this competition, people did 81% AUC, which as far as I'm concerned is huge for autism. And they actually did better on the hidden test set. And the reason is that the data quality was better. So more evidence that heterogeneity is not a problem. And then the other thing I'd like to talk about is we're really struggling with the labels with this research agenda. We sell them, have the information that we would like to have, but we can actually learn from labels that seem uninteresting. This is work that was done by Frans Lyem a few years ago. And what he did was that he tried to predict the age of people looking at their brain. And this sounds really stupid. I can just look up their birth date and know their age. And the idea is that there's a difference between aging and colorological age. So he was able, using multimodal prediction, to go down to a mean absolute error of 4.3 years, which is quite nice. But then the amazing thing was that the discrepancy with people's colorological age and the brain aging actually correlates with cognitive impairment. And I mean, we were kind of hoping this, but not at all to see this in such a clear way. So my take home message here is that even labels that are not the most thrilling are bringing us interesting information because basically they're allowing us to extract quantitative information from brain images. All right, so I want to wrap up. I'd like to pitch the concepts of psychoinformatics as opposed to neuroinformatics, which is focusing on the mind and using brain images to focus on the mind. And I find that a prediction is a guiding principle for broader theories. Partly because if you're interested in cognition, you can use AI to model stimuli or the world. But partly also because explicit generalization across paradigms is something that's extremely powerful, though a bit challenging. And so this should allow us to go beyond a position. For instance, to use encoding models with a complete description of the task and not just, you know, tiny description of the task or to decode multiple facets of cognition. And this is useful even if we have imperfect labels. So we can extract common representations across many imperfect labels. We can extract surrogate biomarkers. I skipped over all the technical details. And I'm a computer scientist and a mathematician. So basically I spend my life doing technical details so you can find them in my papers. Or you can use the software that we developed, Nylon, that doesn't have all our fanciest models because I'm very skeptical. I'm very worried that my fanciest models are maybe not the best. And so we don't put them in Nylon by default. It has simple tools to easily scale. With that, thank you.