 Welcome back everyone. Today I'm going to show you how to get emotive data into basically Weka for analysis. And what we have here are two emotive data files. We have this baseline CSV and we have cat.csv. And essentially this was a person looking at a cat and we recorded their responses. And the baseline is a person basically just looking at a white screen. So if we open up baseline CSV we can see what we have here. Out of the emotive we have the channel that it was recording on. The emotive has quite a few channels and then we have theta, alpha, low beta, high beta, and gamma waves. And what we found of course is we're going to have the most the most interesting data let's say is going to be in this low beta and high beta area. We can use all of this but basically it's in the low beta, high beta area. So we have these two files baseline CSV that I just opened up and I'm opening up again. Okay so channel theta, alpha, low beta, high beta, gamma. And then in the same in the same file format we have the same data just for the cat. And we've recorded this over time. So we have all of these entries and these are all of the channels. I forget what the sample rate was but I think it's every second or something like that. Okay, so we have this measurement out of every few seconds or second for these different things. So basically what we want to find out is is there a difference or can we differentiate between the baseline and cat actually looking at or detecting the cat game. So I have this both CSV and I'll talk about that in a second. But what we want to do first, the first thing to do is essentially open up Weka. So Weka you can get, let me see if I can pull it up here. Weka is data mining software. You can get it free from this website. Just go to download. This is cs.wadako.ac.nz. Go ahead and download whatever version is suitable for your operating system. I'm currently using it in Linux but it works the same in basically all of them. Okay, so let me just open it up here. Okay, so once Weka opens up, you'll have this Weka GUI chooser. And one of the first things you might want to do possibly, especially if you're dealing with emotive data or any brainwave analysis of data is go to tools and package manager. And then that pulls up your package manager here. And these are all the different packages that you can install. There's a bunch of extra algorithms, for example, Lib linear is an interesting one, Lib SVM. If we want to install more algorithms or just more basically features, we can install it from the package manager. Package manager makes everything very easy. What you should search for is what was it called time series. Time series. And what should show up if we look at time series, I already have them installed. I think it's not going to show me again. But if we look for time or time series, then you should see time series packages, and there'll be two of them. And time series packages let you analyze this time series data directly, we won't be doing time series analysis today, we'll just be doing clustering today. But definitely look into Weka's time series packages. Okay. So next, once we have Weka open, we have all the packages or all of the algorithms that we want to use installed. Next, we need to go to explore. And that will open up basically a menu where you can open your data and also start to clean it. So I have my data files in CSV format. Okay, so my data is already in CSV. And it is in desktop data. Okay, desktop data. And I click CSV. And then I have my baseline in my cat. And the first thing I want to do is understand a little bit about about my data. So I'm going to just load in the baseline first click open. And everything loaded up. Okay, it looks like we didn't get any errors immediately. If I click on each of the settings, I don't get an error inside of these. These are just descriptive statistics about basically the levels that it sees. We can see here, the different values that each of the that each of the basically wavelengths or frequency values are giving us down here, it gives us a little bit of an idea about what's going on, but not a lot. Okay, so we can also something else that's potentially interesting is, for example, if we click on, like I said, we want to focus on low beta and high beta. So if we click on low beta, then we can see the mean is 38.7. Okay, if we click on high beta, then we can see the mean is 37. So I'm going to make a note of that. Okay, opening up just a text text document, high beta, I'll say high and 37 low, high 37 low of 38. Okay, and that might be interesting. We need to one of our one of our analyzing some other information that might be interesting. Now just with the baseline data. Basically, that's what I was interested in. What is our mean here? What is for at least low beta and high beta? Okay, and I'll show you what we can do with that in a second. Okay, so now I want to open up my open up another file and I have this cat CSV. And this file is where the user was actually looking at a cat and I want to differentiate between not doing anything and looking at a cat. So I open that up. And because it's in the same format, we get hopefully everything loads up again, we can see all of the instances we have. I'm going to focus again on low beta. And we see that our mean for low beta is 42. So I have my mean low beta for 42. And I have my mean for high beta 123. Okay. Now, this is already somewhat important because whenever we're looking at the cat, I mean, just between the baseline and between the cat, I have a high beta average, a high beta, a high beta average or the mean high beta is actually much, much greater. Now, what what exactly is this? If we look back in our data, we look at it. Okay, so we have low beta, high beta, and then we have all of these channels. Now, what exactly are we looking at here? Well, we are actually looking at all of this column for low beta, all of this low beta column combined, the average overall out of all of these channels, right, we didn't actually filter out based on channel, we're just looking at only the channel at what we're looking at all channels, all data from all the channels. If we wanted to focus on a particular channel, for example, I think PZ was the back of the head, right? So whenever we're doing image pattern recognition, PZ might be the most interesting. So we might need to remove or somehow filter out all of these other channels and just focus on PZ. Okay, but right now, we're just loading in all of this data, and including all of the channels, low beta and high beta together, we're averaging all of these different things. Okay, now from that average, we are getting a relatively high mean for low beta, no, for high beta, sorry, for high beta, whenever we are looking at a cat. So the average for high beta is much greater whenever we're looking at the cat than whenever we're not looking at the cat. Okay, so this means that we might actually have a pattern we can detect. So it's already looking somewhat interesting, actually. Okay, so I'm going to focus then on only high beta. Okay, and what I want to do is see if I can make a classifier, see if I can make a classifier that will be able to differentiate between not looking at a cat, just looking at nothing, basically looking at the baseline, which we can just call nothing, or looking at the cat. Okay, I want to try to figure out if I can differentiate that using machine learning or just find it automatically, essentially. So one of the things we're going to try to do today is clustering to see if we can detect the differences. Okay, so using just the means I've already figured out I want to focus on high beta because I see a difference, a big difference in the mean. For low beta, I see a little bit of a difference, but not much, right? So it could just be noise, it could be who knows what it could be, but with high, it looks like it's actually quite a bit of a difference. Okay, now, whenever I want to start clustering, whenever I want to automatically detect or pull out these things, I need to combine cat and baseline together into one file, obviously. Okay, so before I forget, which I already forgot, before I forget, looking at all this data, once we've actually loaded this in, we can do this in this visualized tab, you might be interested to look at it. So what I tend to do first, one of the first things I do, not only looking at the mean, but also looking at the actual graphs of all of these different patterns, essentially. So here we have, for example, high beta. I can see that I have kind of a cluster here and some extra data at higher values. I have a cluster relatively low, and then I have some a little bit of data spread out a little bit higher and same for versus low beta. So I have some kind of interesting patterns here. Now, if we go back to prepossess, open up the baseline, and then go back to visualize, then I look at it again. And if we look through high beta, well, this high beta versus theta is maybe almost the same. High beta versus alpha is almost the same. Low beta looks a little bit different. So I just basically go through and visualize all the data. Initially to get an idea of where is the data falling for the baseline? Where is the data falling for looking at the cat? Okay, so just think about this visualized tab. Just go through both of your data sets and see if anything kind of jumps out at you. That's that's one way that I start. Okay, so I'm going to go back. And right now we need to combine our cat data with our baseline data. And I've put both of these into the both category. In Linux, what we can do, you can do cat, which just means concatenate baseline baseline.csv and then one greater than sign into both.csv. And this basically takes everything that's inside baseline.csv and puts it into both.csv. Okay, then I'm going to do echo and echo a blank space into both.csv. And here I'm using two greater than signs. The first one overwrites all the data in the file basically makes a new file and puts data inside of it. The second one appends. So I've appended an empty line into both.csv or I'm appending it. Okay, so now this both.csv has baseline, everything that was in baseline plus an empty line. Okay, so the next thing I want to do is cat cat.csv. So basically take all the data inside cat.csv. And we need to do something special with it. I need to do tail dash in and plus two, I think it was. And what this does is actually I'll check to make sure it's working. Okay, yeah, so what this does is let me let me remove it. cat csv head. Okay, so here at the beginning, we have this channel theta alpha, low beta, high beta gamma, we want to remove this before we put it into the file. So we need to use tail dash in plus two. And basically it lists out all of the data without this heading, we want to remove that heading. Okay, so I'm going to go ahead and put that into my both file. So here we have both CSV. So I have cat cat.csv and then pipe that to tail dash in plus two, greater than greater than both.csv. Okay, hit enter. And then if we do cat baseline.csv and then WC dash L, we can see how many lines are in baseline, we have 4359. Now if we do instead of baseline, we do cat both.csv, then we have a lot more. So it should be cat plus baseline. Yeah, so that that should be it. So we know that we actually have all of the data inside both.csv. Okay, so now I can go back to Weka, open up my file both.csv open. Okay, now we have as many records as we expect to find, right, we still have all of the same attributes. But we're not really interested in anything else except high beta, I'm really looking at pulling out patterns specifically with high beta. So I'm going to remove everything else, just select everything you don't want to include and click this remove button. Okay, so then I have high beta only and this is all of my data, I can already see that there's, you know, quite a few entries here. And then there's something going on here. I don't know what that is. But, you know, it's something's happening. Okay. Right. And then I want to go to cluster, because I'm going to try to do some basic clustering, then instead of EM, I'm going to choose, what do we have? Simple k means k means is an interesting, a very simple type of clustering. And, yeah, I mean, it's a relatively effective way. It's a very interesting algorithm. I won't talk about it here. But I recommend you read up on it because it is pretty, pretty interesting. We do want to use a training sets. Right. And that's pretty much it just select simple k means with your data, your combined data. And then we click start. Okay, now it's going through and it found two clusters with about 5369 entries each, just with high beta, yeah, high beta. So we only have high beta. And it found two different clusters. Okay, so that's actually pretty good. We were expecting two different clusters. Let's see what we got here. Cluster one 99 cluster 283. Okay. Yeah, so basically, we have high beta, whenever the value is relatively large, it's cluster zero, whenever the value is relatively small, it's cluster one. Okay, now if we right click on simple k means, then we can visualize cluster assignments. Okay. And then that pulls up this box. Okay. And we have cluster zero is this top cluster here. And cluster one is the bottom cluster. Now, if we go back and look at our mean, let's just focus on high for now. Okay, we have high of 123. And 123 is about here, right for the mean. And then we have, or sorry, the the cat, the cat is about 123. And it's about here. So all everything kind of above 123 was clustered as the cat. And then everything around 37, which is probably about here, more or less, was clustered as the kind of baseline, essentially. So we have this information. Well, basically, what this tells us is that we seem, we seem to be able to differentiate automatically between the baseline and someone looking at the cat, right? Now, with this information, we could basically go in and start to do some categorization, try to detect specifically when this is happening. Or if we saw some data that looked like this, would it be a cat or would it be something else? So essentially, what we're just looking at here is can we automatically differentiate between this baseline and looking at something else through brainwaves. And it looks like we're getting a, you know, it seems like we're probably getting a pretty good cluster here about where we would expect. Okay, so the next thing I would probably do now is go to classifiers and try to make a classifier, I would have to change the data slightly. But using a classifier, I might be able to classify or detect new data coming in, or we can potentially just keep using clusters depending on what your data actually looks like. So I thought this was pretty interesting, we're using just very basic data to kind of go through and say, you know, can we differentiate between two, two different things? And it looks like our clusters were pretty accurate between our high data for looking at the cat and low data for not looking at the cat. So that's it for how we can do simple k-means over emote of data. Thank you very much.