 I'm going to be giving you a bit of an interdisciplinary talk that combines both computer science and biology since I did my undergrad in both Compsi and Bio. And for my final project, it was in biology and I had to do some tasks that were somewhat repetitive and since I was exposed to the joys of programming, I just couldn't sit there and do that. It was very frustrating for me. So I went ahead and tried to automate that process and the analysis of the data in question. So why did I choose Python? Well, Python has a huge array of libraries and since my task involved scientific computing, there was, of course, the library SciPy which was suitable to my purposes. And yeah, I didn't want to reinvent the wheel so Python is the way to go. So I'm going to give you a bit of an intro of what my project in biology entailed. Basically, we were interested in studying protein dynamics as in how fast is the protein going inside the nucleus and outside of the nucleus. So this little sphere here is the nucleus and here is what we call the cytoplasm and this is the cell. And the question is, so right now, the brighter spots represent the areas where the protein is most present. So what we can tell from this image is that the protein is most present here compared to the cytoplasm. But we don't really know how fast it's going in and out of the nucleus. So one way to measure how fast this is happening is to use a really, really strong laser and to just blast it here. That's what happened. So since there's a constant shuttling back and forth of this protein in and out of the nucleus, the rate of recovery of this area is indicative of how fast the protein is going in, like the net movement of the protein because, of course, it's going in and out. So if this recovery is slow, it means the transport is slow. So how exactly did I measure this? I measured two areas from the nucleus, two areas from the cytoplasm, and two areas from the background just to subtract the noise. So the issue with this is that this is the kind of output I get. It's just huge excel sheets. And I want to bring your attention to this number, which is very worrying for me. Sorry? Yeah, that's the problem. Like each time I opened the excel sheet, I waited 10 minutes. So it was getting pretty much out of hand. And you need to imagine this. So like 300 time points times six, because I'm measuring six regions times 369 movies and counting. So for me, this wasn't really an option. And I just reverted to programming. I'm just trying to structure the data in a more efficient manner. So one other issue is that I need to keep track of metadata. So which day did I take this data on what mutant am I looking at? By the way, because I'm actually measuring the dynamics of mutant proteins. So there are different versions of this protein that have just one area that is changing compared to the original template. Also, whether I'm blasting the nucleus or the cytoplasm. So that's the type of frappe. Oh, by the way, frappe stands for fluorescent recovery after photo bleaching. So this makes sense. Now that I say it, it makes sense, right? Okay. And many more things. So also, what if I change the way I analyze this raw data? What if I change how I normalize the curves, the recovery curves? What if I change how I want to fit the data to a certain function? Like if I did that on Excel, I'd have to change, you know, a bunch of cells and it will take forever. So this was me at some point. So enter Django. So Django is a web framework for perfectionists. And so why was Django suitable for me for my purposes? Well, first of all, it's written in Python. And like I said, there's a bunch of libraries that can come in handy. Second of all, it's pretty readable for me. It's pretty concise. It's a concise language, in my opinion. Furthermore, we can actually, it's a modular kind of framework, so that if you want to change the way the data is presented, for example, if you want to change the template, you don't have to go back and look at the code and change your database schema and all that. So that was also attractive for me. Okay. So now I'm going to just show you. So by the way, I think I guess a lot of you are familiar with Django. Who is familiar with Django here? Okay. Yeah. So yeah, this must be a review for you. But I guess it's just an idea of how Django can be used for a task that you're not really used to see it used for. So, okay. So for example, for each movie, I have a class for each movie and I assign, so the metadata problem here is resolved since I can assign the metadata to different fields of the class. So for example, here's the date. Here are some comments. I can also link it to another class, which is Mutant. So here you have a foreign key and stuff like that. One other thing that came in handy is that each time I bleached the nucleus, the exact frame at which the level was zero was variable since it was taken at 200 milliseconds interval. So that changed. So this allows me, this is a way for me to define the bleached frame index in a better way than going through the Excel sheet. Also, some movies were faulty, so this is a way for me to have a flag, you know, to tag things as valid or not. Yeah. This is one example template. So you see how it's very clean. You can do pretty much anything with this. Yeah. Okay. One other thing is that I personally am not versed in SQL or any other database language. So for me, Django was a way for me to, you know, abstract. I mean, so I didn't need to know SQL or anything to be able to use a database in a structured manner and have it flow efficiently. So also the Django administration interface was a plus. And I'll show you later how that goes into it. This is another example model. So this is a mutation where we have, you know, different fields that express it. And more importantly, a mutant is actually an assortment of mutations because a mutant might be, might have many mutations. So here I have also many, many fields. And then I can have nifty things like if I want to know, for example, the number of import movies for a particular mutant, then I can use one single query. And this is very efficient because it passes the database only once. So you filter for the mutant, for this mutant, for the import category, and for only valid movies. So that's just one way. This is an example view. So this is the kind of view that you would pass to the template we saw earlier. So we'd pass the template a bunch of movies, and then the template would use that information to render it. All right. So this is the Django Web Administration interface I talked about. So for example, here, so I blurred this because it's, you know, unpublished data. But these should be actual codons. Anyways, this is a dummy mutation. These are mutants. And a mutant is an assortment of mutations. You can add stuff here. This mutant one is only a mutation one. And here are the parameters associated with that mutant that I fit. This is a movie. And you can also filter. This is really nice. So you can see how this is much better than the Excel alternative. Okay, well, these are. Yeah, this is using Bootstrap. So for example, here I have all my experiments, and I have them green if they're valid, otherwise red. And I have some notes, and it's, I can look at them. I don't know. Let's get a good one. So for example, this one. Yeah. Yeah, so this is all the data I showed you. So this is a nuclear region, the settoplasm region, and the background region. And then it calculates the raw ratio, normalizes it. This is the raw data. And then puts a curve like this, where you go from zero to 100. So there are problems, like for example, and some movies, the first frames were noisy, such that you'd have a pretty high, like relatively high intensity, and then lower intensity, and then higher intensity. And that caused a problem in terms of normalization, because I want to set at zero the first frame that has, you know, low intensity. So in this case, one solution I came up, I came with was to take the average of a few base frames. So like this, I can say I want this point here to be considered as a base frame. So I clicked here. So this is 5.1, 5.1. And then I can refit this. And it updates the curve, and it updates the parameters here. And it's all, you know, nice. One other cool thing is that I can also analyze data in bulk. So I can use all this data and aggregate it and, you know, have a nice visualization out of it. Also, you know, I can set things as invalid, like this one. I don't want it, so I refresh it. Um, yeah. Also, this one. Yeah, this is another example of just summary, of a summary kind of graph. Okay. All right. So the next part of my talk, I'm just going to, you know, briefly mention how I curve, I fitted the curves for the different mutants using sci-pi. So I used least squares. Okay. So this here is the function that represents the curve, the recovery curve. So I can talk to it. I can just explain it in terms of math, of math here. So, actually just kx. Okay. So this is exponential of negative x. What happens if you do 1 minus this? So as x goes to infinity, this approaches 1. So I multiply this by 100, and then I get a curve that goes from 0 to 100. So this is my model, basically. But then I noticed that some curves looked like mixtures of two of such curves. So this is why I have a fast phase, times, you know, the expression we saw before, and then a slow phase. And this basically just means we have a mixture. And then I decided residuals functions. And this function measures the difference between my actual data and my expected data based on the parameters I fit. So this is just a way to measure how far my model is from the actual data. And then, once you define these two functions, and note that the residuals function uses the two-phase association model, once you do that, you use... So this is my raw data, I mean, my actual data. This is just a guess. And then you can use least squares and enter the residuals function. And then from here, you get all the parameters you need. So this is how I fit the parameters in my web app. So, yeah. Bottom line, automation is great. But make sure it's worth it. Because I think, I mean, if I had to count the time I spent automating things, maybe it would have taken the same amount of time, actually doing things manually. But the good thing is that there's... So an undergrad is taking on the project for this semester, so he's going to be able to use this, and it's all good for him. It's better for the future. I hope I don't get here. All right. So, thanks.