 So in the past couple of days we had about computational approaches and statistical methods to process brain data and this talk is quite different. I'm going to be talking about how the brain might implement some computational processes and what happens in the brain when it thinks about statistics. So in this presentation, hopefully by the end of it I'll convince you that the brain is able to learn about statistical properties in sensory environment and it can do so even when it is engaged in the very demanding task which demands a lot of attentional resources. So a few words about predictive coding and we'll hear more about it later today from Michael. So predictive coding really thinks about the brain as a predictive machine. And so it really says that what we perceive from the world is a combination between what we expect to happen and what actually is happening. So it is really comparing these two sources of information. And so it turns out that the brain is really good at detecting prediction violations. So I just want to give an analogy to when we go to a concert and we might know a piece by heart and so we know what follows, which note follows the other note. But if the musician makes a mistake the brain is really good at detecting this, right? Not only is it very disappointing but it turns out that there is a huge brain response to these prediction violations. So we can call these prediction errors and there's all sorts of prediction errors in the brain. I'll be focusing more on the sensory but there's a great deal of work looking at reward prediction errors in reinforcement learning. So in sensory processing and especially in auditory studies, a hallmark of testing these ideas of prediction is the use of auditory oddball paradigms. And this is just an example, perhaps the simplest example of an oddball paradigm where we have a sequence of sounds that have a given frequency and then once in a while we'll have sounds that deviate from this frequency. So these are called deviant sounds or oddballs. So when we measure the event-related responses to these standard sounds and deviant sounds we can see using EEG or MEG waveforms which look like this. Now you see the evoked responses to standards in blue and the evoked responses to deviant in green. And when we perform a difference between these two we've got the so-called mismatch negativity or a sensory prediction error response. So we can think about this classic oddball paradigm as sampling standards and deviant from two delta functions. So what we asked here in this study was what is standard or normal actually is a bit more variable than just a given frequency. So what if we sample our standards from a Gaussian distribution and then if we compare these, sorry, and if we still sample our oddballs or deviant from this other delta function can we still see prediction error responses such as the mismatch negativity. And indeed when we measure with MEG and compare sounds that coincide with the mean of this Gaussian and sounds that are in the tails of this distribution we see a prediction error response. And here I'm showing you the results of the study we did with magnetoencephalography when I was at UCL. And we see that these prediction errors are distributed over a network of areas including secondary auditory cortex and infrared cortex. So here I have the paradigm, perhaps it's slightly too small, sorry about that, but this is a description of the paradigm which we used. So you see in blue shading the Gaussian distribution which is narrow as compared to the Gaussian distribution in red which is slightly broader. So both of these Gaussians are centered at the same frequency and we have standards being sampled from these two Gaussians. In one block we have sounds sampled from the narrow Gaussian and in the second block we have sounds sampled from the broad Gaussian. And then importantly our outliers are still outside this distribution on the first block but are physically exactly the same in the first and the second block. So our question was whether we can evoke prediction errors when we compare outliers and means, but also if we compare these two outliers which are identical, can we find a higher prediction error when we have a narrow context? So our prediction is that yes, these sounds should be more surprising and a narrow context than they're broad. So we did this experiment in the PLOS-CB paper in 2013 and then what we did afterwards was to ask what if we have people be engaging in a demanding task which requires attentional resources such as the N-back task and I have a schematic here of the N-back task. So basically we have a stream of letters being appearing in the screen and people are asked to detect repetitions of a letter one after the other and this is the one-back task and then in the two-back task people will have to detect repetitions of one letter before. So you have here two T's separated by another letter. So really here we have a two-by-two-by-two design so we've got means and outliers in the context of a narrow and a broad distribution and while people are performing either low load task or a high load task. So what we did here was to measure with EEG this time not with MEG so we actually first try to replicate our MEG experiment using just this statistical learning paradigm without the N-back task. So we just had like a very simple detection task and yes indeed we did replicate our results and so as predicted we see that both outliers in red and blue evoke a larger response around the mismatch negativity time which is between 100 and 250 milliseconds and we see that responses to the means sort of go together a lot smaller than responses to outliers. Now what happens when we have people engaging in a demanding task that's over here so this is for the low load task or the N-1 task and again we don't see much difference right so people are still able to detect outliers and there is a higher prediction error in the context of a narrow distribution as you see in blue but what if people are engaging an even more demanding task such as the two-back and I have to confess that we did hope for a modulation of cognitive load or cognitive demands but we didn't actually find any so that's the bad news the good news is that actually the brain is still able to learn this statistical structure in the environment even if it is engaged on a very demanding task and I can tell you that the two-back is quite a hard task to do. So just want to show you now some so I showed you sorry here I should say this is just one channel around frontal central electrode which is typically presented in mismatch negativity studies but really with EEG we can see the whole coverage of the scalp and so what I show you here is a video of the statistical map so if you are familiar with SPMs statistical parametric maps for fMRI this is the same but now for this EEG data and the statistical map is across space and time and so how do I get this back on? Oh it's still going sorry so there's nothing significant later on so all the significance is early at from about 70 milliseconds over 150 so this really shows what we call the surprise effect so this is the main effect of surprise where outlier responses show a greater response as compared to means and I'm showing you just the statistics are corrected for the whole volume of comparisons then here I'll show you the interaction between surprise and variance so this is really asking the question where in space and time are outliers bigger than the means in the context of the narrow as compared to the broad distribution so in the previous the previous video shows that the brain is sensitive to outliers and this one shows that the brain is sensitive to the variance of a statistical distribution okay so the next one sorry I should say so these two first are videos are for the detection tasks so there's not there wasn't any heavy working memory load now for when we make people perform this difficult task what we see so this video shows an effect of surprise across the variance and load condition so this is regardless of variance and load putting every collapsing all of the data together so yes we still see an effect of surprise that's nice we have not only the very early effects but also later effects which might perhaps be related to P 300 but who knows whether we would have had in the previous experiment if we had more trials so this one has this experiment has more trial and and then here is the effect of surprise and variance interaction now across load conditions so high and low I don't have a video for the to separate the two load conditions but I can show you here in this figure and basically they show very very similar results and again there was no difference between low and high load so here this really summarizes the the previous videos now for all of the different conditions so just buried me is sort of a heavy slide so here we have effects of surprise here we have interaction between surprise and variance and again these are statistical maps the X the Z axis is time and and these two panels correspond to cuts in the in the sculpt so you can think about it as those videos that I showed you put we'll put together piled up and and here I'm just showing you a cut on one of the time points so we see very early clusters for surprise effects when we have a very simple detection task when we have a demanding tasks such as the n-back task we see we still see an effect of surprise and as well interaction surprise variance and then when we split the data between the low and high load conditions we still see effects of surprise and surprise variance interaction okay so that brings me towards the end I think I might have been a bit too quick so what does this mean so I've showed you that outliers evoke larger prediction error responses then sounding in a center that the mean of a distribution and so this really shows that the brain is sensitive to outlier detection I also show that there is a greater prediction error in the narrow than in the broad distribution so this really might indicate that the brain is sensitive to the likelihood of an event given the context that is in and so and so we we think that this so-called mismatch negativity response of prediction error might be early marker that the brain is able to learn statistical structure in the environment and and finally by doing this manipulation of attentional resources we we see that well it's not modulated or at least we don't find an effect of this modulation but importantly people can still learn these statistical regularities even when they their attentional resources are being taken up by a different task alright so I would like to end by thanking to James Ting especially he was an honest student last year with myself and Jason Mattingly and he did most of the hard work and Jeremy Tyler he's a great undergraduate student volunteering in my lab and he's helped with making these nice videos and obviously Jason Mattingly also want to thank the ARC for the funding and and Gary for inviting me and and and also CIBF for sponsoring this event thank you so thanks for a nice talk so from the videos it looked like the surprise effect was lateralized and I wasn't sure which way the orientation of the image was but was it was there a sort of right parietal effect there yeah I guess a little bit more to the right yeah and it's not too obvious but to to really answer precisely I would have to make the statistical comparison left right but miss in mismatch negativity studies there is evidence for more activity on the right than on the left and that seems to be consistent although if if the prediction area is done in situations language so there's phonemes that are violated then we see more effect on the left but yeah I haven't really explored that statistically thanks matter what was your error rate on the the one on the two back tasks and are your results affected by performance on the letter task in any way that's a good question I I didn't correlate the behavior responses with the with the EEG I could do that but the error rates they I mean they were pretty good we we eliminated people that were below chance on on any of the tasks so they were around 70% so we tried to make it well obviously not at ceiling but not too hard but it would be it would be nice to correlate the behavior with the electrophysiology definitely thanks for the talk that was great and I was trying to think of it more like a generally speaking when you're studying these other things there's a lot of people that are doing similar rich paradigm right I mean this is classical paradigm and I was thinking so how could you grab all the results of those other paradigms and and think of how do you integrate those results in a model that can sort of explain those things and still keeping like in a small area of this is the specific paradigm and the paradigm but still construct a model that can explain many of the results that are in the same area you know it's kind of a general question and I'm sorry that it's not like but it seems to me that there's a really need to sort of start to integrate results that are close by and and at the moment they're basically the results are in papers and what you kind of integrate and to build something in your mind you just have to read the papers and the model is in your mind but how do you think I mean is there a way that you can see that we could formalize that and make it say again yes so I think predictive coding is the answer is really the the theory that is it explains this this phenomena so in all of these mismatch negativities that is in oddball paradigms or any and even beyond when we have a rule and this rule is violated we always see prediction and I think that just shows how the brain is really really interested in making models of the world and making predictions about what will happen next and anything that mismatches or violates our prediction is really very silent and we can think about you know theories of adaptive theories so if there's something surprising in the events in the environment well that might indicate a potential danger or potential reward so being being able to predict what will happen next can give us a competitive advantage for going after that real reward and getting it quicker or avoiding some some some threats and dangerous situation in the environment so yeah I think pretty difficult so basically a model like that would explain results of those like the 510 papers that you've seen and you know just this model would be the result of the of those kind of 510 papers I believe so okay that's great okay thank you