 I also want to thank the organizers that they invited me to speak here. Do you hear me? OK, I want today talk about data-driven analysis of spatial temporal interaction. My talk will be structured in an introduction in some results and correlation analysis and how to approach this, then how to approach massively parallel data, and then I will discuss. So as probably everybody here, I'm interested in how does the brain compute. And in particular, I'm interested in the cortex, which is computing in relation to higher brain functions. And this is an interesting piece of brain because it has an extremely high density of neurons. And there is extremely strong convergence and divergence on each neuron and from each neuron. So one neuron is getting about 10,000 to 20,000 synapses. So we are dealing here with a highly interconnected network. And I guess I'm right with the assumption that neurons do not act in isolation. But what do they then? So there must be some working hypothesis. And I'm actually going back to Donald Hepp, who had the hypothesis formulated that cell assemblies, meaning groups of neurons, act as building blocks of information processing. So how could we imagine this? So this is basic. I would say this is processing by an interaction of neurons. And that in a signature would be that assembly members would exhibit coordinated activity. So one way of thinking of this would be if you had recorded a number of neurons simultaneously, that if a certain assembly is active, that the neurons being part of this assembly would be synchronously active. If another assembly would be requested due to another behavior, another set of neurons which are members of another assembly would be synchronously active. And you could even have this that individual neuron is participating in different assemblies at different instances in time. Now the goal I see for myself is to detect and analyze such processes. So actually, we don't have colored spikes. And to identify expressions of assembly activity. And in particular, to relate this to the dynamics of stimuli and behavior, imagine in particular in a situation where you do eye movements, as nicely shown by Harald also this morning, you have very short saccades and fixations which in total last for about 200 to 300 milliseconds. Within this short period of time, you have to compute whatever is coming in and decide about the next eye movement. So it's a highly dynamic process. And I would like to detect this. And in addition, we want to identify temporal and spatial scales of assembly activity. That's the goal. Now, how did I approach this about 15 years or so ago? I started out with developing an analysis method, which we call unitary event analysis. And this works in principle like this. If you record neurons in parallel and you represent the simultaneous biking activities as parallel 0-1 sequences by appropriate binning, let's say 1 millisecond bins. Then you end up with 0s and 1s and 1 indicating spikes. And now you can do a lot of statistics on this. One first you would do is typically deriving the firing probabilities of the individual neurons by a simple estimate, getting the number of spikes divided by the number of possibilities. It's a rough estimate. And in addition, you can also look what the neurons do simultaneously together. So you can detect and count individual synchronous patterns across the neurons. However, detecting is not that if you just detect, you are not done. So what you should do in addition is to calculate, so to say, is this going beyond what you would expect by chance, what you would get by the firing rates of the neurons. So we can also calculate the expected number of coincidences, assuming statistical independence using the firing based on the firing probabilities and also derive significance of empirical pattern counts based on coincidence distribution. I will come back to this. You can also do this in a parametric fashion, but also do this using surrogate methods. So now if an empirical count exceeds significantly, your expectation, we say we found unitary events excess by synchrony. And just that you get an idea how this would typically be done to illustrate. So if I had, in the simplest case, these are simulated data, three neurons simultaneously recorded. One box is one neuron over several trials. Then you would typically unfold the individual trials that you have the neurons simultaneously. Then you can detect for the coincidences, in this case, triplets. And if you found that these number of occurrences are larger than expected, then you would talk about unitary events. Typically, we do this analysis in a sliding-window fashion in order to observe the dynamics of synchrony and to be able to relate this to behavior. Now, there are some issues about violation of assumptions we do here. And I was basically working 10, 15 years to deal with these kind of issues. So as you may know, firing rates of neurons are typically non-stationary. They change as a function of time. And in the best case, they do it in the same way across the trials. All the firing rates are different from trial to trial, non-stationarity across trials. All the firing of the neurons deviates from the assumption of causal process. If you neglect this assumption, if you actually neglect this, you typically will get, at some point, for here shown for non-stationarity across trials. So if, for example, the firing rate in different trials differs here from about 40 to 50 spikes per second, you start to get false positives in completely independent data. So what we did, we looked for the various cases and tested when we get false positives and how we can avoid it. And typically, if you violate such assumptions, you typically affect the mean of the expected number of coincidences or the shape of the coincidence distribution. And as a consequence, you may reject the null hypothesis due to the wrong reason. And as a consequence, you would get false positives and or wrong interpretation of the data. For some of these aspects, there are analytical solutions available, but often data are too complex. So what we do, instead, we use surrogates. So surrogates are artificial data which we generate from our data, where we manipulate, which we manipulate such that we destroy what we test for. So for example, if you look for a coincidence spiking, you would intentionally destroy exactly this. By, for example, a simple case would be to use trial shuffle, because then you destroy the simultaneity of the recorded activity, or you could randomize the spike times, or you could shift one spike train against the other, or you did the individual spikes by displacing it by a few number of milliseconds, and so on and so forth. So you see there are a number of surrogates on the market. There is also here a small problem, because if you manipulate your data, you also destroy other features of the data, for example, the firing rates. So for example, such an extreme, of course, it's an extreme example, a rate step would be smoothed, in some sense, or the other. Or you would affect the interspike interval distribution such that when you had a preferred interval that you would smooth it out in the extreme case, you get puzzle processes. So you need to be careful. And you need just to test and calibrate method, which I typically do using simulated data, where I have control over everything what is going to happen. And so what we do, we compare also different kinds of surrogates. And as you can see that the coincidence distributions originating from such surrogates may be quite different and may also mislead you. So surrogates are good, but use them carefully. That's the message. But we often do this. There is another issue then related to surrogates. Well, this is numerically expensive. And because in the worst case, you do this in each sliding window, separately, let's say 1,000 times to derive your coincidence distribution. You do this for each individual pattern. This is kind of expensive. But the people in our lab developed a nice easy way which is described here in this chapter in the book Analysis of Parallel Spike Trains, which I edited two years ago, where you can parallelize, so to say, your processes when they are independent from each other. So this helps also. So what you need to do is to bring this parallelization. So for example, to distribute the processes on a cluster, you can compile MATLAB code or better use Python directly. And what we do is simply make use of queuing systems and code generation for easy parallelization. So this works quite well. So this problem is solved. Now let me show you some results. What kind of results we can get with such an analysis. One example which illustrates you also again the procedure. It's quite old. So this is an example of two experimental data recorded in motor cortex from a weight-behaving monkey. The monkey was involved in a delayed pointing task. He got at this point in time a preparatory signal was sitting still, and in this case, he got here a go signal to move his arm. Now the monkey was trained to also have other waiting times, and these are these lines you see here. So what we did is in a time-resolved fashion, analyze for empirical coincidence counts, as you can see in Cyan calculator expectation, and at each sliding window we calculate the significance, which you can see here. And it can clearly be seen that at different instances in time, this is highly modulated, you see that there is excess bicsynchrony occurring. Interestingly, at instances in time when the monkey expected the go signal to come. Now we were, how do I go back? We were a bit concerned at that time that this time interval of 300 milliseconds, which appears here, is leading to an internal clock, and that unitary synchrony is occurring at this rhythm. Therefore, Alexa made then new experiments where she trained the monkey for different timing intervals. And what you can see here is, so the monkey first was trained for a waiting period of 600 or 1200 milliseconds, and as we knew already in the study before, unitary events occur at this 600 millisecond when he expected the signal to come, but did not come. Now she retrained the monkey, not to wait 600 milliseconds, but 900 milliseconds and 1200 milliseconds. So in the first sessions, the monkey was already able to do the new task, but unitary events still occurred at the old timing. But after some more practice, synchrony then occurred at the new timing. So meaning the timing of the synchrony changes with practice. This is also, oops, this you can also see in the population of the data if we average over the first half of the sessions and the second half of the sessions, you see that this is a consistent result that when the monkey first was retrained, you don't see this excess synchrony, but after some time, you see that there is a excess synchrony at expected level. This correlates with behavior in the sense that the reaction time of the monkey is actually going down while he's improving his performance. Now coming back to the example I brought up in the very beginning, free viewing. This is an example from visual cortex. This was a collaboration with Petro Maldonado who trained monkeys to do free viewing on natural scenes. And we actually looked for synchrony in this relation and we found that when the monkeys were now starting to fixate here at this point in time, after some time the firing rate increased and decayed slowly, but excess synchrony actually had a quite different temporal evolution. This was fast, very fast, about 50, 60 millisecond after fixation onset and decayed fast. So we are able to extract this kind of dynamic activity and find strong differences between firing rates and synchrony. Now let me go on to another aspect which is massively parallel data. I mean our results, I was quite excited about our results and we could say yes, fine, but I am still concerned that we considerably under-sampled the system. Inserting five or maybe maximum of 10 electrodes is compared to 10 to the power of nine neurons, just nothing. And I wonder whether we actually oversee assembly activity a lot. It's not a solution to do sequential recordings because you need simultaneous recordings from the neurons to see their interaction and their common activity. Also, just looking at signals summing over large populations would not help you to resolve, so to say, the microcircuits or the activity going on in the signal in the cortex. So what we choose to do is on the one hand, observe population activity directly by looking at mesoscopic signals like for example the local field potential, but relate them to elementary signals like the spikes and corresponding processes. So just a comment, local field potential is a signal when you insert, when you have an extracellular electrode in the brain, if you just low pass the signal above between let's say one and 500 hertz, you get the local field potential, if you high pass it above let's say 1000 hertz, you would get the spiking activity. The other thing what you could do is directly observe many neurons simultaneously, and I will now talk briefly on the first and then quite longer on the second part. So the first approach to relate mesoscopic signals to spikes, we actually did this and we're interested in the relationship of the local field potential and to the spiking activity and we have quite a number of studies on this, but I just want to tell you briefly about the most exciting I would say, where we looked at the relationship of synchrony to local field potential. I mean everybody says of course local field potential which has typically a oscillatory structure is reflecting synchrony, but it did anybody show. So what we did is we extracted first the different kind of spike events like unitary events, just chance coincidences and isolated spikes which didn't show a partner and looked at their relation to the local field potential. In the simplest case, just looking at the spike triggered average or coincidence triggered average or unitary event triggered average. And what you see is that the spike coincidence triggered average, if you relate it to unitary events is much larger in amplitude than if you relate it to individual spikes or to chance coincidences. The reason for that is actually, so we went on and looked also whether this is a mere effect of the amplitude of the local field potential that this changes or whether this is rather a face lock, it's a face lock. The unitary events, the excess synchrony locks better to local field potential than chance coincidences. And actually this reminded me on the comment by Carl Peterson because we were looking a lot on his papers whether this sub threshold membrane potential core oscillation would actually lead to synchrony. Now it seems really that it needs additional bumps. So the idea is that it's likely that the local field potential is also reflecting sort of say larger oscillation of the system but that assembly activities just jumping from one bump to the other. And this is what we see here. Okay, but I would actually like to focus on this aspect. We will go on with this local field potential relation to spiking activity with massively parallel spike trends as well. This is current work. Now I would like to tell you a little bit what we do in relation of observing many neurons simultaneously. And actually there are no tools available and we have to do that. So and the motivation for something like this is to uncover correlation structure. This is a simulation actually from the group of Marcus Desman and what you see here is these are many neurons simultaneously as a function of time. And what you would conclude here if you look at this activity in the dot display is that the neurons coherently change their firing rates. But if you just rethought the axis of the neuron IDs you will notice that this is not a coherent change of firing rates but actually there is propagating synchronous activity propagating through this network which appears as increases of the firing rates. So we want to get at such a point that we are able to do this in data which we don't know the underlying structure. However, a simple extension of unitary event analysis to massively parallel spike trends leads just to a combinatorial explosion of parameters because to look for individual patterns in let's say 100 neurons, you would have to look for 10 to the power of 30 parameters which is basically not possible. So we need to do something else and this is what we did. We actually did a lot in this respect and I would like to give you a bit of an idea how we try to approach this. So first of all we did in data a pairwise analysis. So this was data from cat visual cortex recorded by a 10 by 10 electrode grid, moa grid and we restricted ourselves to the multi-unit activities not to single units and just looked at the pairwise cross-correlation. Here you can see an example and evaluated whether this is a significant correlation by using surrogates and generated from these all sort of cross-correlations and then we had quite a lot of significant pairwise correlations which we then composed further by looking at overlapping cliques of correlated neurons at a graph and found interesting there is actually there is the whole data decomposed in four number of four clusters of highly inter-correlated neurons. So this was interesting and we also found that this correlated to or that these clusters occur clustered in cortical space. But the question I was left with was is this actually reflecting is such a cluster reflecting the presence of higher order correlations meaning do all the neurons in one of these clusters exhibit synchronous activity synchronously or is this just a group which has pairwise or triple-wise and so on. And this is a very complicated task. However we started out with looking at a simple measure which is the population histogram. So consider you have 100 neurons simultaneously this is again a simulation in which we inserted a certain correlation structure which you can see here as these coincident events which you would hardly see if you look at randomized neuron IDs. What you can just do is pin the data and look at the count across the neurons meaning what we called complexity. And you see here these high spikes of reflecting coincident activity. And what you now can do is to use this mesh these all these counts and generate a complexity distribution and compare it to a control situation where you don't have correlation in the data. And you see it's quite hard to see this little bit what is there in the tail. You see it with you if you for example make the difference of the two but often it's just a sort of say distortion of the distribution. This you can then capture by a method which we developed by looking at the measured coincidence distribution and deriving for this distribution its moments. In this case we use cumulants. And when you assume a certain type of model in this case a compound poisson process model where you can define the higher order structure you can then relate these two or actually you can from this distribution and its cumulants actually only third order cumulants derive the order of the correlation. So you can sort of say derive a lower bound of higher order correlation synchrony correlation. Well but this was still a quite simple model which we assumed here. And we went on and said okay now we just assumed synchronous activity in this model let's go to maybe a more reasonable model which was suggested by Moshe Abel as many years ago which is a sunfire chain which actually implements the high convergence and divergence which we actually find in the cortex. And in the simplest case this model would have groups of neurons it's actually a feed forward structure where you have a high divergence from the neurons of the first group to the other neurons and so on. And just given by this connectivity structure you get the feature that if you stimulate the first group that you get synchronous activity which propagates through this network. Okay this is a very simple implementation but what we did next was to embed this in a random balance network in which actually all excitatory two excitatory connections were replaced so to say or were then part of a sunfire connectivity. The rest was as in a randomly connected network and then we looked at this activity. So in the first step we did what Arbeles and Gerstein suggested about 20 years ago or so or 15 where they developed a spatial temporal pattern detector and this is what we first applied on this data meaning spatial temporal pattern you have a spike and then after some time you get another spike in another neuron or the same and then after yet another sometimes small time intervals you get another pattern and if such a pattern repeats at least twice you would consider this and this is the result of such the data we generated with this embedded sunfire chains there were many sunfire chains embedded and we look at this with a pattern spectrum meaning what you see here is the complexity of the pattern here the number of occurrences and you see that even for very high complexity many spikes involved you get a high pattern count. Now what you can do just to get an idea whether this is significant whether this is really significant or not or just by chance you can now again do a surrogate manipulate the individual spikes for example what we did here a dithering of plus minus five milliseconds and all the high complex patterns with high counts just disappeared. Currently we work on a significance evaluation that we can indicate really which entries are significant and what not which is of course not trivial because this is a multiple testing per se but if you are interested there is a poster outside P117 you can get more details on that. Now however we thought of doing a different analysis actually we know that the sunfire chain is a sequence of coincident activity. Now if such a sequence of coincident activity one particular sunfire chain is active again so we could actually imagine that we catch by binning the neurons active at this point in time when the first group is active and the first group when run again you would have a high overlap of identical neurons being active and the same holds for the second group and so you can so to say build up an intersection matrix where you enter the intersection values at the times combined by the two instances you just combined and then you would expect such a diagonal entry of high intersection values. And actually if you do this on this data you get this so here is now the analysis of the data I told you at the beginning. You see these little diagonal stripes which all reflect so to say the at least repetition of one particular sunfire activation. So this is good. So there are clear visual indications of sunfire activity and now we thought what is the sensitivity of this? Can we actually now deal with 100 neurons we have or 200 neurons we have currently at hand to apply this method and actually if you down sample this data from the simulation to 200 neurons this would be the result of this analysis which is now much more noisy but you can help yourself by filtering this matrix by a diagonal filter emphasizing this preferred stripy appearance. Let's make a control. So now I show you again a pattern spectrum however of data in which the firing of the individual groups was randomized and actually if you apply now the intersection matrix analysis you notice that actually these stripes are missing these diagonal stripes are missing. Meaning what I would strongly suggest is that we have data and this is actually what we want to do in future to use different analysis methods simultaneously on the same data in order to so to say emphasize different aspects on the data whether this is spatial temporal pattern or just coincidences and so on. So this is where we are more or less in the moment and I would like to come to an end and may come to the discussion. So I think I am, I convinced you that correlation analysis for identification of interaction in the network is a reasonable approach. Unitary events indicate transfer can indicate transiently active assemblies and that we can should take care of avoiding false positives by incorporating statistical features of the neuronal data. Just the multiple pairwise analysis would not be enough to conclude on higher order correlations and assembly however we expect or think that assembly activities is expressed by as higher order correlations. As I showed you massively parallel spike data would lead to an explosion of parameters for individual patterns but we now make use of population measures or integration over time and space as we did with the intersection analysis. Analytical treatment is basically, I would say impossible given such complex data and this is why we make use of surrogates in order to deal with such data and to derive significance of the occurrences and as I just said, in general I would suggest to apply multiple analysis methods to the same data sets. Actually we have now the chance to observe many neurons simultaneously. I am involved in two big projects where we record on the one hand with two Utah arrays, one in motor cortex, the other in visual cortex when the monkey is making a visual guided motor task. This is together with Alex Zahrile. I have an electrophysiology lab at the CNIS and the other project is which we just started actually a joint German-Japanese collaboration with Hiroshi Tamura and Shige Hiroshinomoto where we want to record multiple at multiple layers of the visual pathway of a monkey doing a freely viewing task. So we will have this data or actually we have already such data and actually there are some issues to be solved. These data are a lot. These are huge data sets. So we actually need currently the developing strategies to deal with such huge data sets. We have 500 gigabytes or approximately per day and actually if you consider in addition pre-processing and so on, this is a lot to do. And actually also how does do this data come from our experimenters to us, to you, to analyze the data. So we are currently working on this and in addition, something which probably sounds trivial to people from the neuro informatics environment or from people who do simulations. We are intensely working on developing workflows for the handling and analysis of such data. This is not trivial in particular if you have such massively parallel data, if you have a behaving monkey, if you have several trials, if you have several sessions over which you want to do this analysis and in addition analysis which goes beyond just great analysis but looking at the correlation of this and what you really want to aim at is to have a reproducible workflow where you just can change, for example, the bin widths at the very beginning and then go through all the data sets again and get at the end without further intervention get a figure for your paper. So and this is what we want aim at and we do this together with the G-note, Thomas Wachtler, but also with Andrew Davison and I want to refer to two posters, P113, Denkite, Arlo and Workflows and P124 of the Electrophysiology Data Share Task Force and I would like to thank you and the people who work with me in Jülich and also other collaborators and the people who spend money for my work. Thank you very much. Can you say something about the workflow engines that you're using and the extent to which they do provenance tracking? Well, this would be, we are trying to use Sumatra but we are not sure yet whether this would work for our applications. So of the commonly available workflow engines the one that has the most sophisticated provenance tracking is actually Kepler. Yeah, we also have this on our list to check. So we've in the moment are in our workflow somewhere up here and we want to check these various options. Do you use Kepler? Yes. Okay. We need to talk. Understanding question, one of the earlier slides you had that coincidences which were drifting and had sort of curved lines. Could you explain again what that meant? Curved lines, curved lines, do you mean? No, it was a gray slide with about three or four curved lines inverted Js. This? No, but I could ask you afterwards. I think it was. Further? Raster. Ah. Was it a raster plot? And then another one which had some curve, yeah, that's it, that's it. What is happening there? It's just a raster plot with the... Well, I just re-ordered this, yeah, the neural ID. And what you see here is that you have propagation of synchronous activity through the network. This is one type of synfire chain. This is another one. And these are sequential in, see? These are spikes, synchronous spikes. It's a bit compressed in time, but you see that 100 neurons are more or less synchronous than the next 100 and so on. This is what you see here. Are these sequential in time going up the graph or is it sort of... Yes, here is a time axis, look. It's going sideways and the mean... And this is a different curve. And the chain idea is... Okay, I'll talk with you afterwards. Thank you. If you go up to 10,000... He asked me what, if we have 10,000 neurons, well, we train ourselves on simulation data, as you see here. These are actually 40,000. So this shouldn't be a big deal because we use simple measures and then try to reduce, so to say, the complexity of the data until we have individual patterns and can these then relate to the dynamics and the behavior?