 live. Always having trouble going live on LinkedIn especially. The best one is YouTube. No questions asked. Twitch is great. Yeah Facebook not so much. We just have to deal with that. Okay so we're doing this Kegel competition. I just found about it. It was already going for a while. It was two months ago. It was now a month ago. We're not worried so much about submitting an actual competition thing but we're more interested in the data because it's open. So we have a bunch of labels, a bunch of sieges from quite a lot of patients. So we already spent yesterday doing this. We have this little tool that we made. We can select the EEG ID. So some of them are not so great. Some of them are repeats. This doesn't work for some reason. So in theory it does work because we have... Yeah so this will be an example of how not to store your data probably. I don't know when was this data collected. Must have been some time ago. So we have data from these different electrodes. There's not too many of them. I don't know why. Normally there would be 20. Obviously there's less here. How is it a standard montage? I think it might be a standard montage. We have the EEG as well. In most cases it doesn't look very good. Like here for example. Obviously rubbish. Yeah no EEG. So if this was a real EEG, this patient might be having other problems. But then the EEG recording and as you can tell the EEG is not so great either. There's some sort of DC shifts in it which should be there. Yeah a big DC shift here. So there's a lot of patients but this data is segmented. It's not continuous data but don't particularly like segmented data. And yeah an example of how not to store the data. This would be it. And so what do we have? We have the EEG ID. I think there's 50 seconds of recording and then there's a sub ID. So in theory so this first ID there it's going up to something like 87 seconds. However it meant to have this 10 sub IDs in it. So we're still figuring out how to display it properly. Yeah so this is what we have in the training data. We have EEG ID, a sub ID. So essentially in each one of those files you have different labels. In this case they're all labeled seizure I think. Yeah they're all labeled seizure. And say this next ID has only two sub IDs, zero and one. And this is an interesting bit. So originally it says that for each label each entry for each EEG ID you see it duplicates there. So essentially this will be that number 0.0 and then same number 0.1 but then the EEG label offset in seconds. So the first one is at 0. The second one is at 6. So then it's not quite clear. So in theory the original thing should be... just doesn't sound right. This label offsets. So for the second label we have 50 and we have label offset at 0 and 2. Yeah really weird. Good example of how not to store data at 1012. Yeah this one has 0 and 18. The next one has 0 and 28. So in theory the offset. Let's see again what that offset the time between the beginning of the consolidated EEG time between the beginning of the consolidated EEG. Okay so even if I assume they are all 50 seconds their sampling rate is 200 by the way for this data set. It doesn't make any sense. Two seconds 52. I mean it doesn't make sense in terms of the two seconds there. The two 52 problem is that to be having then 102 seconds of data because each entry is 50 seconds there is a essentially two second offset. So the length of this recording should have been 100. It's the next one doing triple four at the end. So this triple four. How did these numbers don't add up? Let's check. So 584 at the end has it 65, 68. It's 50 plus 18. So that makes sense. But then they're meant to be 118. Same for 710. It will be 78. Yes 78 but it doesn't doesn't make sense. I don't think so. They're both labeled as seizure. Here you have the agreement or disagreement. Yeah so those agreements are actually quite interesting. Yeah here you had essentially 16 experts labeling the data. One of them labeled the LPD one AVA and 14 labeled as LRDA. Description of what each means. LPD is lateralized periodic discharges. GPD generalized periodic discharges. So that's throughout the brain. Lateralized will be essentially what's called focal. It's called focal seizure. So originating in one area of the brain. Lateralized rhythmic delta activity. GRDA generalized rhythmic delta activity. And AVA. So you had supposedly up to 16 experts labeling this data. I don't understand why. Then in this next case only three experts did the labeling. In this case they were 13. So it's 12, 14, 15. Yeah it's good to see some disagreement. In this case seven labeled as something as seizure and essentially eight labeled as AVA. So the consensus was AVA. And that's what you get for the label. And yes but analyzing this data I could give priority. So this one will be uncertain because it's like seven to eight. Whereas in other cases you have much higher certainty. Like here we have 14 experts putting a specific label. So that's that we are dealing with. We'll just try and sort out the display. We need to figure out. And is there a spectrogram file as well? I probably could make our own spectrogram but supposedly the spectrograms are a wider time window. So they were expedient and the data is reviewed a 50 second long e.g. samples plus match spectrograms covering 10 minute window sent at the same time and label the central 10 seconds. Many of these samples overlap and have been consolidated. Okay so this this is really weird. Okay this is another thing of what not to do. So okay I think I understand now what's going on. So for example if you take again say zero one say this example now seven 10 is fine. So in this example essentially the reviewers labeled two windows of 50 seconds. One was from zero to 50 and that one was from 28 seconds to 50. The four when you look at the whole thing you have 78 seconds. That does make sense. So the the two windows have a high overlap almost what is it 70% overlap which is a bit odd. But okay just have to deal with it. It's like same case there. Yeah so if you go in the for the first e.g. id we have this 90 seconds of stuff and that's because 50 plus 90. Yeah so that's the offset. So the reviewers labeled these 10 50 second segments. The first one is from zero to 50. The second one is from six to 56 then eight to 58 so on so forth and the last one will be a 40 to 90. Therefore the whole segment is 90 seconds. Right yeah not sure why so many overlaps. It could essentially the labels are all the same anyway. So that's the AVA thing. Yeah I had three reviewers and they all voted seizure. Right so they are so the voting seizure and this AVA abnormal activity would they be labeled for normal? No so it's a bit odd but okay you do have label for AVA whatever that means. So originally obviously yeah it would be better. This is not working as expected. It's just I don't know why something like that is yeah this was labeled as seizure. That's updating correctly. It doesn't look like this one is updating correctly. We have to fix it. Yeah this is super weird. Yeah it would just discard something like that. Yeah that's the I mean you can discard based on ECG sometimes like the ECG electrode is not sitting well but that will be just an indication that potentially something else is wrong. Like in this case a 5533 yeah I have to check the labels. Are there any cases at all where you have the same EG ID but the labels are different? I don't think so. Yeah they're always the same labels anyway. It's kind of silly that way. Yeah it looks like the extra stuff there is just is just like fudging you know creating more data. It's essentially fake because it's hard to explain just a hunch but not just a hunch but yeah we can explain it based on the fact that yes for example yeah they just copy paste of one another. They all have the same scores. They all was called L, R, D, A. Is it lateral rhythmic discharges? This patient will have 260 so we expect 260 seconds in this data. It essentially doesn't matter how long that the segment is they were all labeled the same. Just go somewhere into the middle of you can tell. Yeah this will have the same exactly the same label. Same scoring. This one has a lot. Okay this is interesting. So no this scoring was different so that essentially was just so weird. So how many experts looked at this segment? Is it three of five? Because essentially this one was labeled by three experts as seizure. It was eventually labeled as seizure but interestingly enough this three in the middle was also labeled by essentially another two experts that said that one said the other and one said something else. So no decision but why all five didn't label the whole lot? Yeah which is weird. And it's also suspicious that in this case for example you had 13 so 13 experts. We do find 13 experts. They labeled just this one segment alone that second one was also labeled by 13. There was a bit more of a disagreement between them. Yeah this is an interesting one. Yeah it was eventually labeled as seizure because there was seven expert labeling it as seizure but six labeled it as something else. Yeah so even if you take this one as seizure there is essentially it's probably not the clear not clear cut. Anyway we want to display all this stuff in our tool and then do more processing. So doing a spectrograms and other things. Eventually we will be training a fuzzy logic to try and do the detection, the classification. Yeah nerf is dead. Yeah you shouldn't see things like probably some sort of motion artifact. Yeah I can bet the ECG would not be there as well. Yeah I'll be inclined in all of the comparisons but I will be doing quite a bit of filtering. I'll be inclined to discard things like that. There's obviously something very wrong. Yeah that looks like a seizure. So the spiking is an interesting one. Yeah we'll have a better tool to review this stuff. Yeah I want to pull out for example a seizure that had a very high agreement and a large number of experts. So we want to do a priority. So those will be like the parameters. There will be some sort of agreement score. There's actually a kappa. What's the kappa score? You have to look it up. Maybe do that but essentially should tell you something about the agreement between the agreement level between experts. And yeah we'll go from there. Yeah this one was only labeled by one expert. It's a bit odd. Okay so what shall we do next? We have this code loading data. We have the data downloaded ready. So this is actually a good example for what is more likely to be a seizure. There's still no ECG in it. Yeah this seems to be spiking on all channels. We can even tell yeah and it's more pronounced on some channels than others. This is just the baseline noise is smaller. On is it on one side of the head? So those will be left. Hey was it even numbers will be left side. Odd numbers will be right or the other way around. It doesn't really matter. Yeah so you have the seizure potentially on both hemispheres. That's why they label the seizure. I don't know why the other labels are the other labels not a seizure. Have to check. Anyway we have this working as a Flask application. I mean if it's doing something useful and you want to access it we can make it available eventually on BarneyKills.com. If you haven't checked BarneyKills.com go check it out. Sometimes you even get relevant ads for you and we will add this. As other examples the license permits us using it for educational purposes. Yeah it's attribution, non-commercial, creative comments for a 26.5 gigabyte and that's a zip file. Once you open it it's obviously larger. That's exactly how large but still 26.4. So it's a zipping and it didn't help much. It's mainly the data is parquet. I don't know how to pronounce it. Piles. So we're opening those. Python can do it fairly easily. Yeah I like that example because it's obviously it looks more like a seizure. It is 124 seconds of it. The other problem is that there is no onset. Yeah that's why we prefer continuous time series data not chopped up segments because you can see the evolution of the seizure and what's happening before and after is very important. So it won't be displaying all the channels you just used the drop-down window. We will however try and sort the sub ID and the label offset. So it will be displaying only 50 seconds and then you can select for each ID you will have another drop-down of the sub ID and we use the offset in seconds to move it along. So now understand what it's doing. So let's show GitHub Copilot will be able to sort this for us because it doesn't have all the context. I mean we have the text file with the data structure how the files are loaded. Yeah so sub ID, sub ID. Let's pop that in because it's currently not displaying correctly. Also ideally the idea was that you select based on expert consensus based on the label instead of the EG ID. Currently you select based on the EG ID instead and that looks a lot like baseline EG. I have to make sure the expert consensus will place it below and do this quickly. Make sure it's actually yeah there's some hard fact there. Now while those that will be ignored I wonder how they were presented to the expert that what they filtered out somehow. This looks like it looks more like EG. Yeah so the offset label eventually if it's actually working we don't have to display it and let's see. I'm doing a lot of debugging at the moment that's okay. Okay so the data for sub ID yeah we have the offset. Okay so we don't need that. Yeah we do need this and the problem is that it's not being shown correctly. EG sub ID label offset then seconds empty on the front end. Yeah so they're not being populated. Have this populate drop down function. I think it's too generic for liking. Probably we'll have to. It's trying to cover all the different cases at the moment. It's an async function which is good. Okay sub ID offset six so that's working well. It's really really good. Let's just double check. Yes we have the EG ID and we have this 50 second the EG labels and the save for the sub label of eight. The offset and second says 40 so we should be always plotting 50 seconds and using the offset this so the label offset is empty how to fix the code so it's actually populated. Yeah hopefully it understands the context. Surprised we are not doing this already. Right so update label offset display should accept data as the input that helps doesn't it? So let's do that. That can go for sure. Okay and it's not updating. Need to make sure that the input is being used everywhere. Okay it's working. No one's showing current sub ID. Yeah we need to fix this bit. Would it understand what the problem is? Update label offset display function is being called before the drop down value is updated. Okay that's seem important. Get done and buy it. Yeah probably don't need all that output at the moment. So that's better. It's cleaner. Okay so we have the subject ID. Sorry this is not the subject ID. Does look like scissors. Oh yeah looks like something. So we need to make sure a couple things. First okay there's an obvious problem here. First of all I don't know why the label offset also has current sub ID instead of just having the number of seconds in it. And also so it seemed to display it correctly the first time around but it's not updating when a different EEG sub ID have been chosen. I think we are in business. Okay so when loading it's just selecting the first EEG ID sub ID of 0 offset of 0. So that that that does work. Yeah ideally yeah we don't want to spend too much time on this one. Ideally when you select an EEG ID it will show you the whole the whole thing. And when you select okay so not sure how hard it will be to do but ideally we would like when loading it will select the default EEG ID and show the whole number of seconds all the data points in it. And only when selecting a specific EEG sub ID it will focus. Oh come on. 1.5 inch function tells me the default ID and then it's okay. That's not what I meant. Ideas will like when loading it will select the default EEG ID and show the whole number of seconds as it does now. So this is already happening. And now when selecting a specific EEG sub ID add the chart. Focus on 50 will zoom on to 50 seconds with the select set offset. Understands what I mean. Probably not. Okay let's see. To implement this feature we need to modify fetch and update chart. I also don't understand why do we have a two function fetch and update chart and then we also have pretty sure we have update chart as well. We'll sort it out later don't worry about it. So it's checking if drop down is populated if so to hold this the complete function. Now it has this rest of the code. Yeah multiplying by the sampling rate which is 200. Why is it? Okay yeah everything needs to be multiplied by the sampling rate because these values in seconds and once we multiply by sampling rate will become the number of data points. Yeah we have a fetch and update chart and then we have update chart. It's a bit odd. Yeah this is where we divide by the sampling rate which is fixed a constant of 200 constant. Yes we have it defined in the beginning of the code blah blah blah blah. Can update chart. Normally if we say fix it will ask what do you want to be fixed if you maybe say fix and update a combine. Yeah we essentially need to to for it to regenerate the whole function that's what we need. Okay we have offset default value of zero that's fine index is looking good. Okay what is the zero there? Okay something stopped working must be errors no there are no errors why something is up no that is that error is okay. What's the JavaScript changed correctly or not? Kill it for a sec. Selecting a different EEG ID stopped working it's not updating the chart anymore. How is that missing? Somebody even lists the changes there function. Yeah change it's just working that's the beauty of Coppola. I don't know how it happened. Does that bit well working again and then select yeah so for example for this EEG ID we have sub ID one a zero and one a zero it's offset zero one is it offset 18 and in theory when we select that we should just display a 50 seconds a 50 seconds of the actual segment. How do we do that? Do we actually need it? Once something becomes difficult you just ask yourself do you really need it maybe we can move on. No we do need it because those are the labels. The labels are only for 50 seconds so we had the 50 page and update chart yeah we do have that in the code but it's not so in theory when page and update chart and let's do console log there yeah that's right yeah that might be the problem you might not meet that line anymore and the label offset is not required and offset ID you're already displaying it in the sub ID and this offset should be taken the offset is already being the offset seconds is already being displayed in the EEG sub ID so I removed it from html and we would like to use it for the offset start index and index calculation let's use the offset from the sub ID drop down yep that's right that's wrong how do we do that do we have this function ready? Populate drop down if drop down sub ID yeah we have that already and then update offset display html update chart because yes it actually works no variable is defined but it's not accessible in the fetch data update chart async function and point json return turn the data that is not a horizontal scroll it's with odd we're not passing data properly those not links it's great it's great because when you just start using it they are yeah maybe it changes yeah you probably have to give it the whole code to look at let's see if it makes a different suggestion they can change anything by the way if you haven't checked the bionicchaos.com please go do so there's a lot of interesting tools surely you will find there's something useful there and when you do don't forget to provide your feedback so just an overview we're still trying to display properly this so how about medical school harmful brain activity classification data set yeah particularly focused on the the data bit of the code we want the initially to review the data properly so making this tool where you can a idea eventually is like scroll a quickly from the through the data to get a better understanding of what the data set is all about yeah having a problem all these days you can tell this code it was working a second ago so hopefully github copart can fix it for us yes it's working again we should have like some sort of unit test to to see if it's working correctly and we expect so what do we expect this doesn't work now we don't need an extra function that's for sure whoops that's not an extra function we actually have it it's the white can't just regenerate the whole thing considering yes so it is abidi so when we change that nothing happens when selecting a different value from the drop down and when changing the electrode name we get the following error and we selected the whole JavaScript code because it's obviously problem with JavaScript there's no errors in the back end hearing that is not the right that that is I mean that's how the back end is accessing the data yeah the data could probably be simplified that the seizure there it looks like really it's working again seems like we have a seizure that the okay now say changing the electrode from the drop down works well but when selecting a different eeg sub id and nothing happens we sure we have it ready and just updating the value this one still works okay this one does something and the bobsyn to work but one is not giving an error so what's up with that can we check what happens when we select a different eeg sub id when do we actually plot the 50 second time window instead of the whole a window of like it understands the context context true shall we do an overview with gpt4 by the way this session is meant to be interactive so if anyone wants to join in for a chat you can do it via the text chat yeah if anyone wants to a team up for this competition we can do that as well I'm only I never actually submitted any no I did but yeah it was like way off in the little board it's the currently currently the little board yeah we have to do probably a separate tool on how the score is being calculated yeah so we might what normally happens is that these scores get saturated towards the end because people just keep bombarding submissions yeah so just doing 200 entries and I'm kind of hopeful the best so probably just tweaking tweaking parameters slightly and recent meeting and doing it again and again I think you can submit five per day yeah the little board is calculated approximately 35 percent of the data set the final result will be based on the average 65 so the final standing may be different yeah this is the other thing so they only essentially so how much data is in the training set like in terms of percentage of all the of all the data I don't know there is a training set then there is a testing public set which has only one file in it that's for your testing so that one would not be labeled that's for you to submit the hey submit your algorithm to get a score and then later the your algorithm will be run on a much larger data set for you to get the final score and obviously we don't have unfortunately we don't have access to the data set I don't know if it will be available at the end of the competition probably not but yeah so currently we have this display you can select your eg id this one looks like it's having seizures and something I can make sure yeah we're going to make sure when we select different id that the expert consensus is updated actually it should be expectation id being updated no yeah get a lot of noisy stuff oh it just had a one noisy channel yeah it's some probably muscle probably emg in there looks like a classic emg and we have to look at its frequency component should have a higher frequency component so that what we will do later not sure if today or not see how the gpt gods are behaving or not they actually behaving well we will be making much quick progress yes I did understand even though there was a misspelling in the voice to text transcription still understand understood the context that is wrong let's rewrite it selecting a mini working with JavaScript at the moment the code is being fetched okay this is the 50 seconds time when I can we check what happens when we select different eg sub id when do we actually plot the 50 second time window instead of the whole a window yeah it's not it's not very clear let's see if it understands the context by that label we will just yeah we have to modify it I've changed it into a display instead of a input just do it as an output when you select the specific a 50 second label it will just show you what the label is free that quickly selected a geo lecturers of the parameter provided course pulling out the minute it's plotted fetched and update that chat after checking that the necessary terms are populated then they selected that the valid right so the expectation is that so I'm select oh it's really bad there's something in the ecg yeah obviously something like that can be just ignored outright it's going plus minus 2000 as well get rid of it so it became slower yeah I can tell there's a lot of problems with the dataset yeah a lot of baseline and drifts what it is just better bad recording surprisingly ecg is good it's actually there looks legit but what on earth is going on the eg you get the square waves this is obviously not what the brain is producing it so this will be 50 seconds that's correct so at least it makes sense in terms of the timing of it yeah we'll see I have some sort of filtering a thing a filtering I'm thinking so you don't want to try and filter something like that I mean you want to just get rid of it this just real bad why became much slower would mean there's some sort of memory memory leak would be highly will be real bad there is top first sec yes we expect this patch yeah why are the start and end announce that's not cool and surely we should just understand the context yeah not a number issue the calculation of the start and end the same function calculate the standard in this is based on the offset and the sampling rate so this calculation there why is this not being used travel shoe this function sure it should be accessed somewhere which is needed is it needed it's really be locked the data it figure out why yeah let's look this to get set is that okay now that is undefined what how's it not giving in there here if that what is it refusing to write code anyway the data is undefined which is not cool okay quick question is the start index anything let's pop it also into gbt4 so we can get the bot to explain the code and everything and start a new chat quickly quickly quickly quickly we have the script js the flask the data utility is it's actually probably not really needed hey can you explain the whole code and give a broader context for the application how it can be used what is it for and more specifically we're still trying to modify the code to make it better the current problem is that when we select a sub id eg the offset seem to only change the x axis label but not actually move the data not actually fetch new data of only 50 seconds can we we need to test that yep yes this is the summary by chgbt i'm still listening to it but i want to take a quick break got it you've got a web application for viewing eg electroencephalogram data it looks like it's designed for analyzing and visualizing eg data probably to assist in identifying harmful brain activities like seizures or other abnormal patterns the app lets users select various parameters like eg id's sub id's consensus on diagnosis from experts electrodes and patient id's this selection dynamically updates the displayed eg data chart pretty neat for researchers or clinicians needing a quick look at eg data right now about your problem when a sub id is selected it only changes the x axis labels of the chart but it doesn't fetch the new 50 second data segment associated with that sub id that's a key feature for sure because each sub id likely represents a different segment of eg data and you need to display the correct segment to make accurate assessments here's what you can do to fix this one adjust the fetch data and update chart function when a new sub id is selected your fetch data and update chart function should be triggered this function needs to fetch the eg data for the selected sub id right now it seems like it's not correctly updating the data in the chart two check the data fetch logic ensure that the logic for fetching data based on the sub id is correct it should use the eg id electrode and sub id to fetch the right 50 second data segment from your server three update the chart data properly once you have the correct data make sure that your chart update logic update chart function is correctly replacing the old data with the new data you might have to thoroughly check how the data is bound to the chart and ensure that it refreshes entirely with the new data set for offset logic the offset value you get from the selected sub id should be used to calculate the correct range of data points samples to fetch for instance if your offset is 10 seconds and your sampling rate is 200 hertz you should start fetching from the 2000th sample five testing after implementing these changes you need to test the application to ensure that selecting a different sub id correctly updates the chart with the new data segment check for various eg id's and sub id's to cover different cases in broader context this application can be incredibly useful in clinical settings or research facilities where eg data is routinely analyzed it can speed up the process of reviewing eg data assist in diagnosing neurological conditions and potentially help in training machine learning models by providing a way to visually inspect and categorize eg data finally remember eg data interpretation is complex so the tools accuracy and ease of use are crucial always validate with actual clinical data and possibly integrate feedback from eg technicians or neurologists for further improvements yeah it was the 54 summary that's the new voice that they have i forgot what it's called but yeah as you can tell that robot is pretty incredible so i just realized that it's actually does work okay i just realized it's potentially does work okay here's a few print screens to show you what the front panel looks like so the first image has a sub id zero with offset zero and the second image has a sub id one with offset of 28 can you review the two images and also so we are currently live streaming so if you can would you like being a host of this stream to do that you will need to give a broader perspective of what this whole application is all about and overview the code i provided earlier keep it light but let's let's see if gpt or can just take over the stream keep developing the application maybe even winning the kaggle competition while i'll go grab some coffee but i'm still listening to what says all right taking a look at the screenshots you've shared it seems like the application is indeed updating the data being displayed based on the selected sub id and offset in the first image we have sub id zero with offset zero which gives us a data set starting at zero seconds and in the second image we have sub id one with offset 28 shifting the starting point of the data set to 28 seconds the x axis of the chart reflects this change showing the shift in time which indicates that it's displaying a different segment of eeg data this is exactly what you're looking for so for the audience tuning into the live stream let's break down what we're diving into we're working with an application that's a bit like a treasure map for the brain's electrical activity imagine you're on a ship sailing the electrical storms of the neural sea this eeg data viewer is your compass and map it's not just any map it's interactive allowing researchers and clinicians to navigate through the brain's electrical patterns as they hunt for buried treasures of insights into neurological conditions like epilepsy the code behind this application is like the crew of the ship doing all the heavy lifting in the background it fetches data for different segments of an eeg recording which represents the electrical activity of the brain typically used to diagnose various neurological conditions each eeg id represents a different patient's journey and within those ids the sub ids are like specific chapters of the story capturing particular events or moments in the brain's activity the offset is the specific timestamp where you want to zoom in it's like choosing exactly where to drop your anchor and dive in for a closer look what's cool here is how this application brings together a cross section of expertise neurologists data scientists and engineers all get a common ground to collaborate it's designed to make the complex task of analyzing brain waves a bit easier potentially speeding up diagnoses and research the viewer also holds promise for teaching imagine a medical student eager to learn who can now dive into this sea of data not with a textbook but with a tool that shows them the real deal live dynamic and a bit unpredictable just like the brain now if you're into the technical side of things the application's back end is a well-oiled machine it pulls specific eeg data segments and displays them dynamically as you choose different parameters it's built to scale too so as more data comes in it can handle the load providing valuable insights into patient specific brain activity for those tuning in who are not tech wizards no worries all you need to know is that we're using tech to peek into the brain's electrical dance and this tool helps us read the rhythm and patterns of that dance it's a bit like translating an alien language complex but fascinating and filled with potential discoveries the big picture here is a fusion of tech and medicine each improving the other to help us understand one of the most mysterious organs in the body the brain can you explain the data structure the different files how they load it in the code and also explain the different labels what they actually mean a particularly focused on the difference between the seizure and other labels and also explain the label other also make a suggestion as to how to load the spectrogram files to be viewed at the same time with the data sure let's break this down data structure and files explanation this is like the main ledger in the data set it contains metadata about the EEG recordings like the unique identifiers for the EEG data EEG ID the specific 50 second long samples EEG sub ID and the offsets in seconds EEG label offset seconds it helps you pinpoint which parts of the larger EEG recording you should be looking at the expert consensus gives you a professional's take on what's in the EEG whether there's a seizure for example or some other kind of brain activity EEG ID in spectrogram ID these IDs are your keys to individual patient data EEG ID unlocks the EEG recording files and spectrogram ID gets you into the spectrograms which are visual representations of the EEG data parquet files and train EEGs and test EEG slash these files are packed with raw EEG data they hold the actual readings from the electrodes stuck to the patient scalp with each column being a different electrode site except for the EKG which records heart activity to view these your code would load the corresponding dot parquet file based on the EEG ID train spectrogram slash and test spectrogram slash spectrograms give you a visual look at the frequencies in the EEG data over time each file here matches up with an EEG ID showing the spectral data for the corresponding EEG labels explained seizure indicates that the expert found patterns typical of a seizure which could include rhythmic spikes or sharp waves across the brain LPD lateralized periodic discharges these are repetitive sharp waves found on one side of the brain they might show up with conditions like strokes or tumors GPD generalized periodic discharges similar to LPDs but these are seen all over the brain not just on one side they can signal a more generalized brain dysfunction LRDA lateralized rhythmic delta activity this is about slow rhythmic activity on one side of the brain often seen in focal brain damage GRDA generalized rhythmic delta activity just like LRDA but it's not just one side it's generalized across both hemispheres loading and viewing spectrogram files to load and view the spectrogram files alongside the EEG data you would typically one read the parquet file use a data analysis library like pandas and python to read the dot parquet file into a data frame this loads the EEG data to load the spectrogram you'd also load the corresponding spectrogram using an image processing library or even just a simple plotting tool if it's saved in a graphical format three synchronized views when displaying them you'll want to synchronize the EEG data view with the spectrogram by using the offsets and subides this way when you look at a particular moment in the EEG you're also looking at the matching moment in the spectrogram four interactive display ideally you'd want to make this display interactive so as you navigate through the EEG data the spectrogram updates to show the corresponding visual representation you might use a web framework like flask which you seem to be using to create this interactive display to ensure you're looking at the right data your app would use the EEG sub ID and EEG label offset seconds from the train dot csv to load the exact segment of EEG data and the corresponding section of the spectrogram that's the core of the code you've got it's about making sure you're always looking at the right slice of data for the questions you're trying to answer yeah some of the files the EEG files might get the the whole tool freezing up it's probably because of some ants coming through they're from the server and the front end doesn't know how to deliver it okay now it's indeed unfreeze yeah they're becoming quite slow and there's a violation it's taking too long don't like violations do we so that's some sort of chat chat js issue yeah it's doing something it's not cool shall we kill the browser it's funny how Ubuntu works when when something like this happens I still have control of the machine but for some reason the mouse doesn't work so obviously I'm still streaming and everything everything works but the mouse now it's it's back stop it from there it's the browser yeah the browser had the 350 megabytes loaded in it that's obviously not ideal yeah now my CPU back yeah so how do we go about it so it's loading okay on the power up let's keep this thing might be some memory leakage it's using 106 megabytes start monitoring resources as well and that's the chrome with the 200 a megabytes must be it yeah I mean the kegel competition website is taking a 100 megabytes and why yeah 200 megabytes what if I'm trying to replicate that era it's not an error it's just when there's like seem to be too much data and that the browser is freezing yeah that looks more like eg hopefully we can start moving forward with this tool trying to replicate the end now it's becoming less and less responsive so the other thing so I'm going to guess you have some tools for reviewing eg some of them have an option to so this one would like to improve on it so I might do this anyway this is 16 channel implantable electrodes and and this is a plotly not the charts js so it has more control and everything but all of this control sorry for the ads hopefully if you are visiting the website you actually get some more relevant ads yeah so this one lets you group the channels by left and right and in this case will be the odd and even the numbers there so this could be nice you have an option to overlay them all and select the actual window so we won't need it for this because we always say we're always showing 50 seconds window and that's how the data was labeled it's just becoming unresponsive and I don't know why it's not when you go to the bottom of this eg id scrolling yeah the offset is only two seconds in this case I don't know why so that the data is funny this way it's this Harvard medical school so they must have done it correctly because it's hard one of course let's just yeah the data is a bit odd and yeah already went over a few reasons can have some more actually we're not displaying everything in here I don't know if we should be making this tool more and more complicated or just start a new one probably start a new one we'll have more options this couple of obvious things need to make sure the patient idea is being updated with the workspace handle and it's searching everything just try and just give it the names if it can explain the context of it it's nice that it's giving a links that actually work this one is not a link I don't know why a cat patient ID so those are the unique IDs so instead of giving us unique IDs yeah originally the idea was that you could filter the data based on all the different parameters so gpt for a sec okay so looking at the code and its specific advice looking specifically at this file the train csv the different column labels are we actually using them all well the answer is no but the original idea for the data viewer was to filter based on patient ID or the expert consensus label type but that doesn't seem to currently work can you try and explain and look at the code provided above all of it keep it light yep yes the patient ID is not currently being updated so when you select for example EEG ID is the patient ID being updated to show what is the patient ID for doesn't sense the message without me finishing so no the ID the patient idea different well yeah no they're different in the fetching function EEG ID is used to fetch EEG in the patient ID is okay I was actually asking if one is updating the AVA masking if one is updating the AVA the text to speech is hopeless am I using the correct microphone okay let's do a quick tour of what's happening with the train.csv in your current setup and how it ties into the data viewer from your description and the code it seems like train.csv is the backbone it contains essential information like patient IDs the expert consensus on what type of brain activity is occurring and identifiers for EEG data segments these details are like the index to a vast library of brain activity now your app aims to be a filter savvy librarian that can pull the exact book or EEG data you're looking for whether it's based on a patient's story or the expert's opinion on the plot the consensus on EEG patterns but it seems like your librarian is a bit sleepy and isn't fetching the books as expected when asked for a specific story or opinion looking at your code you have drop downs set up for selecting EEG IDs consensus sub IDs electrodes and patient IDs these drop downs should let users sift through the library seamlessly but if some aren't affecting the data pulled up it's likely because of a few reasons one missing links there might be a disconnect between what you select in the drop downs and how the data is fetched for instance you have the patient ID and expert consensus as options but if the change events for these drop downs aren't tied to functions that fetch and update the data nothing will happen two your data fetching functions need to reference the train dot csv fields they should use the selected patient ID or consensus to filter the EEG data if they're not considering these fields when fetching data then changing these fields won't have any visible effect three event handlers you need event handlers in your javascript that listen for changes in the drop downs for patient ID and expert consensus when a user selects a different option these handlers should trigger the fetching and updating of data back end routes your flask app needs to have routes and functions set up to handle requests for data filtered by patient ID and expert consensus these routes would interact with the data and return the filtered results back to the front end here's a friendly suggestion if filtering by patient ID or expert consensus isn't working double check the javascript functions tied to those drop downs and the flask routes they correspond to make sure they are actively filtering the data and that the chart gets the memo to update with the new data fetched in the spirit of keeping it light think of your app as a DJ mixing tracks EEG data right now the DJ is only playing songs from one album EEG ID you want to give the crowd users the power to request songs from different albums patient IDs or genres expert consensus to do this you need to make sure the DJ's mixer the code is hooked up to all the turntables data filtering logic and not just one yeah we can modify this code yeah so far it's only the EEG ID that is updating the data and even that is a bit slow especially when you go to the bottom of the list yeah they have those example far so ideally we will plot the spectrogram as well on the same page they have this example figures and we see a few as you can tell right so this is zero time minus five and five see she's good doesn't seem to correspond to one an hour and why those are PDFs as well and the question all right I get it so this is the left lateral right lateral left posterior right posterior I want to reproduce those myself the they don't seem to correspond to one another seem to be something happening there on yeah don't see it here right so say this example yeah what's up so there's obviously yeah why don't you see it on the on the plot this is 10 second the EEG is 50 seconds I just don't trust that those actually coming from from the same EEG and obviously that's some sort of averaging of different channels yeah I don't like them at all some sort of artifact yeah the EEG is a mess there's essentially no EEG therefore yeah can probably tell that that's some sort of noise yeah that EEG is weird as well okay we have an EEG there some slow low frequency components in the spectrograms yeah I wanted to do the spectrograms myself shouldn't we we have done this before hey we have a tool for it yeah this is also not the best spectrum but you want to see things like this you want to do like a sanity check with your spectrograms to see when the seizures and a this should end you don't see that in there there's no nothing obviously a corresponds there's something happening something different there in the middle that's some sort of noise a low frequency component but it's not in the time series it's not in the time domain that's why I don't necessarily trust it you want to see them one above the ava so this components that I can actually see that something is corresponding to one another in here for example we'll expect a large low frequency component there don't see it well this is during a seizure so you kind of see it here but yeah you want to see something that corresponds to your signal a for sanity check yeah like this one for example when you see a low spike like that which is probably some sort of noise not sure but you want in the spectrum to see a corresponding thingy and happening at the same time it's kind of important so and I don't see it happening there first the scale is different let's open another one now you see it's rubbish so it's supposedly the scale is the same you have a zero time minus five plus five whatever it's a bit odd but then you have this low frequency component at the end here but you don't see it in the time series so it's either being heavily filtered but then why is this not showing the filtered data or there is something wrong more likely there is something wrong yeah the time series is essentially not telling you what's happening in the in the frequency domain or the other way around yeah here we have a good ECG yeah this one is more like it but then you see you have a spike in the deep there before zero time which is you would think it's that one then you have one after zero time as well which is not on the time series yeah let's read the description again spectrum of the 10 minutes okay this is seven seizure six lpd 15 lpd others one two three four five six seven eight 20 seizure okay so obviously there's no 20 seizures there that's one just one there's like 20 so that's 20 labels okay yeah I'm just not convinced of this data set right so the column names indicate the frequency in herds and the recording regions of the g electrodes the letter of the abbreviated ll yeah left lateral right lateral a left paris sagittal and right paris sagittal so essentially left front side left back side to bring up a g montage directionally against epilepsy yes we have a montage like that so so this one will be fp minus f7 f7 t3 so it's a t3 t5 t501 fp2 so by the way we don't necessarily have to do it those are those being digitally a subtracted from one another just look at the actual recording yeah I never understood why why do we need to actually do that when you do that to reduce the number displayed but then you actually don't yeah because we essentially can display just four things a four sides of the brain but that's you know if you're doing big data whatever analysis it's not it's not necessary the thing about the patients ideal will have some sort of patient description the patients will have different type of seizures and they will be originated from different types of the brain so you would you would need the description for each patient where the seizures are coming from that could be useful yeah anyway so first I don't necessarily trust those spectrograms might try and recreate them ourselves I don't think we can yeah don't see those bits in the data because do we have the 10-minute recordings no we don't we'll have to look at the a spectrum fast we would not be able to generate them ourselves yeah so the expense a labeled 7 LDP 7 L NTA from this 10 minutes white and achievement so we won't be able to display it like this because the original data is only 60 seconds it's much shorter yeah just problem yet the does is not very transparent so understand they have more that they once you submit your algorithm they test it on a hidden non-public set as well but then even with the data that is provided definitely more be done yet our problem is that even if we display the spectrum here it will not be in the same format in terms of the time window so this will remain 70 seconds the spectrogram will be 10 minutes it see things that look more like seizures yeah it's not again a I don't believe that's the spectrogram for this thing that's something wrong there is something wrong so this essentially means that one of those should have this a lot of it's called the left left side should have a lot of slow wave activity and just do not see it they do want to be able to switch between this montage and how it's actually was recorded so essentially have this bit and that bit some sort of weird convention don't really they need to do it then you have this bit have that bit and then the bit in the middle just because it was done like that for many years you don't have to do it in the same way and yes you have four then even depending on the side they once in the middle can most of the time just these God it should be telling you something about the interaction between the left and right hemisphere but again this is Harvard medical school so it must be correct all right I see where we're at you have the fundamental structure in place and now you're looking to ensure that your EEG data viewer can filter EEG data based on different criteria like patient ID and expert consensus from what I can gather you're looking to accomplish a couple of things one filter data based on patient ID or expert consensus you've got drop downs for these but they might not be hooked up to change the display what you need is a way to tie the selection of these drop downs to the data being shown to fetch and display the filtered data when a user selects a different patient ID or expert consensus the displayed data should update to reflect this to get these filters working you'll need to do a couple of things front end JavaScript changes for the drop downs that are not currently affecting the data like patient ID and expert consensus you'll need to make sure that changes in their state trigger a new data fetch right now you have event listeners set up for changes on these drop downs that call fetch data and update chart ensure that fetch data and update chart actually uses the selected patient ID and expert consensus to fetch new data you might need to modify your API endpoints to accept these as filter parameters and then adjust the front end to pass these parameters when making requests here's a quick sketch of what the updated function might look like JavaScript async function fetch data and update chart const patient ID document dot get element by ID patient ID value const expert consensus equals document get element by an expert consensus value adjust the API endpoint to include filters for patient ID and expert consensus const fetch URL slash fetch each EEG data EEG electrode equals electrode and start start index and end equals end index and patient ID patient ID and expert consensus encode your component expert consensus now use fetcher on your fetch call existing fetch logic back end flask changes on the back end you'll want to update the slash fetch EEG data route to handle the new filters python at route slash fetch EEG data methods get def fetch EEG data EEG ID equals request args dot get EEG ID electrode request args get electrode patient ID equals request args get patient ID new filter expert consensus equals request args dot get expert consensus new filter that existing code fetch the data and filter based on patient ID and expert consensus df equals df patient ID equal patient ID and df expert consensus remaining code. Make sure that your data-utils.reads.csv and data-utils.readparquet functions can handle these additional filters, returning only the data that matches the selected patient ID and expert consensus. Testing. After implementing these changes, thoroughly test the application. Select different patient IDs and expert consensus options to ensure that the displayed data updates accordingly. Keep in mind that to keep things efficient, you might want to limit the amount of data being transferred. If your EEG data sets are large, consider loading summaries or slices of the data that can give users the insights they need without overwhelming the browser. These changes should get your EEG data viewers filters up and running, letting users navigate the neural seas with a more precise compass. The funny thing about this data set is that the experts were presented with 50 second time windows that are essentially identical. So in this case, the windows are 50 seconds, but the offset essentially means that they are always overlapping, so essentially the same. I think that's probably how the data was extracted, like how the labels were extracted from the original labeling was normally, so they essentially labeled in this case, 64 seconds with a seizure and for whatever reason, there were four or five labels extracted out, five 50 second labels extracted out of this 64 seconds of seizure, which is a bit odd. Can't change the data set. Yeah, it looks like some spikes in the spikes on some channels, not others. This one's really odd. The ECG could have been just coming off. Ideally, with this data, you will also have impedance measurements. They normally help you clean up the data. There's obviously was some data in there that has a lot of noise. And that's probably not this type of noise. This type of noise probably suggests that something else is wrong. I bet the ECG for this is also not, no, I was wrong. The ECG is there. Looks good. Some DC offset, but it's okay. But come on, the data. Yeah, something is up, something is wrong. Yeah, I need to make sure seizure is actually updated correctly. You're reaching the four hour mark. I might do some more offline. Let me know if you want to collaborate, team up for this competition. I'm not fast with actually winning. It's just for learning. So don't worry about the, you know, what's your, I don't even normally look at the leaderboard. You know, yeah, if you manage somehow to get up there, it's a bonus, otherwise you learned a lot. And yeah, go check out by chaos.com has a lot of relevant tools about the ECG EG processing, in this case, EG. Now I'll see you.