 introduction. Thank you very much for that introduction, Graham. And we'll get started. So what I want to talk about today is perhaps integrating various tools together to improve our analysis methods in functional MRI. Yeah, so putting together data-driven tools and sort of model-based analysis to do help interpretation of data-driven models, sorry, interpretation of data-driven analyses, using this for automated noise filtering, and also a novel method to use data-driven methods to actually help in interpretation of flexible and general linear model results. So the modelling that I'm talking about is just for the computational neurosciences. In the audience it's not computational neuroscience, modelling statistical modelling of the sort that JB was talking about. So in the very simplest form, you've got a signal and you try to model it with a general linear model here. You've got a priori effect of interest, which might be what the subject is doing, for example, and other effects that might be confound such as motion or other artefact that you know about, and you try to fit this model to the measured signal, and then you can compare the size of these coefficients that you fit with the residual iron-explained noise. So that's statistical modelling. So we're using the information in a priori sense. Contrast that with data-driven approaches, blind source separation approaches such as principal components analysis and independent components analysis, which are really taking the data in its raw form and trying to work out what underlying sources might be contributing to the variance in the data. So this is a so-called cocktail party problem. You've got a room full of people and they're all talking in your measure sound in various microphones and try to deconvolve the individual voices. So this blind source separation, you don't have to put any information into the start except for the data, but then you have to make any sense of it at the end of the day. You need to interpret it, and that's done in a post hoc way. So can we apply some a priori information to that post hoc interpretation of data-driven methods? And of course one can, but can we do it in an automated and objective way rather than just subjective interpretation of components? Well, we know some things are going to pop out of an ICA just from experience. Anyone who's done this, you know, if you've got head motion and functional MRI, you'll tend to get edge artifact in the resultant images. Pulsatile motion will evidence itself with ventricular activity. The non-blood oxygen level dependent signal in a functional MRI is often very broadband, higher frequencies than you expect from the hemodynamic response function. So if there's a lot of high frequency, temporal frequency information present in the component, then it's not likely to be a bold signal. And also in independent components analysis, estimating the model order is a problem, and you often overestimate that, and then you get lots of components that are spotty or speckled, which are non-physiological. So if we know this much already, can we apply this in a sort of automatic classification scheme? Well, the answer is we can, and there have been some recent efforts in this over the last few years in this, starting with sort of a systematic visual inspection, sort of more a protocol driven visual inspection, but of course that's very laborious. And then we developed this algorithm SOC, which is a fully automated algorithm that takes those features that I spoke about just a moment ago, and tries to identify components that fit within those categories and identify them as noise, and one can use that, and I'll talk about a little bit in detail about that in a moment. And there have been some other recent approaches that have also come out in the last few years, some with software that is not available, so we can't test those, but most of them are available for others to use. This one's quite an interesting one that queries an external feature database that they build up and you can help use it to identify a particular resting state network, for example, and there's methods that you can use that require training on your own data. So just to talk a little bit about SOC, it's named somewhat whimsically after an acronym of this term here, Spacially Organized Component Classificator, that being the German word for classifier, and it, in its main use, just identifies artifactual components and components that are not likely to be artifactual. And importantly, it can be applied to resting state as well as task-based fMRI, it's fully automated, you don't have to train it, the software is freely available, and in distinction to most of the other packages out there, one of the key philosophies in this particular algorithm, we've tuned it to not remove any components likely to be of your own origin, so the particular issue here, particularly in a clinical environment, if you've got an independent components analysis that hasn't quite split a noise source from a bold physiological source and you might get some mixing, in a clinical sense, perhaps when you're doing pre-surgical mapping or something, you don't want to be removing anything that might be biologically plausible. So we've been conservative in that sense, although we can tune it to more of the neuroscience questions if we want to, and more aggressively remove things that might have any artifact in them. So I'll just mention one of the features of the algorithm, and one of the reasons that you can use this really straight out of the box is that it uses adaptive clustering of the features that we've put in there. So we basically start with the hypothesis that your FMRI data will contain certain noise classes, like I mentioned. I mean, it's especially impossible to get anybody in the scanner unless they're dead, not to move, so there will be some degree of motion artifact in an FMRI dataset. So in this case, this slide's illustrating the idea of spatial frequencies, so the overfitting problem where we might have spotty components compared to more biologically plausible components. So here we have a metric that looks at the ratio of low spatial frequencies and high spatial frequencies. It takes a ratio and plots that as a function of the radius of how big this circle is. So this is just essentially a Fourier transform of the data and the low frequencies in the middle out here. If you plot then how the ratio tracks, and you get these curves, and then you can cluster that with a K-means clustering or another algorithm, but we use K-means that was sufficient and cluster those into groups, and then you can allocate a score based on how much high frequency speckle there is in the data compared to the low frequency. And we can do similar things for the temporal domain in terms of the temporal frequencies and also edge artifact in an edge mask and cerebral spinal fluid mask. So this works quite well as robust and removes typically in an ICA analysis of FMRI about half the components, so you don't have to look at and interpret. So that's great for interpreting the data driven analysis, but you can also then take this one step further. Having identified the noise, you can actually combine this now with a model based approach and use this as a pre-filtering step. So we've done this and tested this, for example, for language functional MRI, and we did this on a study of 29 subjects, and here is a plot of the mean Z scores within presumed language region of interest without SOC on the x-axis here and with SOC on the y-axis showing that most of the subjects showed improved Z score when SOC was used as a pre-filter. Now, importantly, we're taking into account the fact that we're reducing the degrees of freedom, so we adjust our analysis to take that into account. We still get these improved Z scores. There are only three subjects that showed anything below the line and only one that was sort of really significantly below the line, and I'll show you that example in a minute. Here are some examples, anyway, four of the subjects showing you the kind of improvement that you can get. So these are three slices through a particular subject language study without the SOC processing and then with the SOC processing and the red arrows here of highlighted activity that's slightly artifact that's been removed by this filtering and green arrows show significant activity that's now appeared that we weren't sensitive to because of the noise in the pre-SOC analyses. So we get rid of ventricular artifact and edge artifact. We enhance the activity in general. This subject 22 here was that subject that was most below the line in our comparison metric, and if we just have a closer look at that one, we think actually what's going on here is a problem with our metric more than a problem with the SOC algorithm. So the activity that's been removed is this activity here and here, which as you can see is actually on a contrast boundary within the image, and it's actually a little bit lower than what we expect typical language activity to be. This is the language activity we're expecting. But it just got at the bottom of our regions of interest, so we actually counted this as a loss of real activity, whereas in fact, if you look at it with a keen eye, we think that SOC's actually removed some motion artifact. Turned out this was the subject that had the most motion in this study. There was a great deal of motion going on. And so it's difficult actually to remove this kind of motion artifact in any other way than an independent component. And this motion was identified because it was such high frequency compared to, temporal frequency compared to what we'd expect with bold. So ICA managed to separate it and then removing it got really cleaned it up. So just to make the point, we're not reinventing the wheel here. To do much of this processing, we used existing software. So we used the FSL melodic program to do the ICA, SPM to help with the segmentation, and some other code from fMRI stat, and then our own custom MATLAG code to implement the rest of the algorithm and wrap the whole thing together. And this is, as I say, public available. So now I just want to talk about the idea of using a data-driven approach to help interpret a general linear model type approach. And now, why would you perhaps want to do that? Well, the particular application we were interested in is, if you've got an event-related fMRI study, so you know when events are happening, you've given a stimuli or whatever you, and you want to analyze that in a general linear model, but you don't actually know what the response is. And this can happen, for example, if you've got a light flash, you expect a response to be a delta function convolved with an HRF. But if you've got some cognitive task or a continual light flashing, for example, where you might get a training effect, so the response is not actually either a delta function or a block, or you might have an anticipatory response that you're interested in that might happen actually before you present the stimuli, and you might be interested in that. You can't model that with a conventional HRF. And in the particular application that we developed this method for was perhaps the most difficult, where we're looking at interrectal epileptic spikes in patients with epilepsy, we detect these with simultaneous electroencephalography, so electrodes on the scalp. These spikes happen subclinically between seizures and are often quite frequent in these patients. But the thing about them is we only see them on the EEG when a lot of the cortex, 10 square centimeters or so, is synchronously firing. And there's quite often activity happening in the brain before that that the functionally MRI is sensitive to. So we don't know the shape of that to put into a constrained general linear model. And this is an example of relantic epilepsy just showing some of our earlier work, that the response, the average response over a group of these patients is really quite different shown here to the canonical hemodynamic response function. So you're not going to be able to fit these well even if you put dispersion derivatives or whatever in the model. So in these sorts of situations you need to use some sort of a deconvolution, basically invert the normal equation. So here we've got some event timings that we know. We would typically convolve these with an HRF and look for that in the measured bold signal. Instead what we do here is we take the measured bold signal in these times and invert this equation and try and work out what this effective hemodynamic response function is. Now I've put it in inverted commas because it's really our hemodynamic response convolved with some neuronal response train that's not just a delta function. And then we get a map and we can look at this with an F test. So in this deconvolution typically it's affected by using a finite impulse response or a Fourier basis set or some flexible basis set in our GLM. So now we've got a situation where we've identified where things might be happening consistently that are time locked to these events but there's a different response function at every single voxel in the brain. So how do you actually interpret that? How can you summarize that in a way that's going to help you interpret? So this is where we add in the ICA step and we call this event-related ICA or EICA. Instead of doing the ICA on the whole time course of the functional MRI what we're doing is we do this deconvolution with a GLM and then we assume that there are specific underlying sources to this and we use the ICA to decompose just those event-related time courses. So now we're looking at a very restricted model of what we think is consistently going on in each voxel. And so we get much fewer components typically than you would if you were doing the whole fMRI time series because you're now only getting things that are time locked to those events. So in that example of Rolandic epilepsy if you do a conventional general linear model with a conventional hemodynamic response function these events you get very little activity. If you apply the ICA method you now get beautiful activity where you're expected in the Rolandic region where the time course is significantly different from the hemodynamic response. You can concatenate these time courses together across individuals to do group ICA in much the same way as you would do group ICA in a conventional sense and again you get robust activity and this non-canonical response shape. So we've developed a package that we're about to release that does this analysis in an automated way. We're using fast ICA in a CASO software so existing software again SPM for the general linear model and then custom MATLAB code that implements the rest of it around that. Now we've actually gone one step further and combined the two things that I've talked about. We can actually pre-filter our EEG fMRI data or our event-related data with the SOC filter so that's going to filter the whole fMRI time course identify noise sources based on that entire time source course remove that. Before we do event-related ICA so we deconvolve then the filtered time course with the general linear model and then apply ICA again on the event-related time courses and if we do that in Relantic Epilepsy this is the group component that I showed you earlier. We got one component that showed Relantic activity but with the increased power due to the noise reduction from the pre-processing with SOC we now actually have the power to separate these two components of activity here so we got a hint that there was some contralateral activity so this is the spikes are coming from this area in these patients but there's a bit of a hint of a contralateral activity. Well with this denoised analysis we now see that yes these these spikes are giving us this response here but then somewhat later there's a response that's coming on the other side of the brain and we now have the power to separate that. So in conclusion we can combine data-driven analysis methods and modeling analysis methods to help sort of get the best of both worlds and we can even top and tail these analysis methods and using existing software to enable this makes it makes this much easier to develop and fast-track these sorts of developments. So I'd like to just end thanking the other contributors to this work and noted their disciplines here you can see this is quite a multi disciplinary team that's been involved so thank you very much. Thank you David so we've got time for one question. The convolution aspect must be very difficult. If you just think of the magnitude the model is not favorable if you're so what are the constraints that you put on the deconvolution. Well the deconvolution is just using the conventional as you would do if you were trying to model what the hemodynamic response function was. You would use a flexible basis set so you could either use a finite impulse response set or a number of Fourier components so instead of having one effective interest you know saying your effects are actually different. You're not trying to find the neural node trends and the hemodynamic response at the same time because if you've tried to find that those two things and that's not identifiable you have to put some constraint you have to put some regularization constraint to make it. Well the constraint is that it's time locked with that event right so we've got multiple events. Okay that's that's easy. So that's the one thing I understand now. Okay well thank you thanks for that thanks David. Okay we need to move on.