 So, okay, good afternoon everybody. So, we want to finish the naive Bayesian approach with one example. And I want to keep this lecture short because I'm getting a lot of heat from university, why nobody's doing the course evaluation. I said, I guess they are waiting for the midterm exam to see how that is, and then they do it maybe tomorrow. So nonetheless, according to the policy, I have to give you time today to do it. Doesn't matter if you do it today or not, but please do it by tomorrow. At the moment, you are very low. We are officially the lowest evaluated course for whatever reason. So please take the time to do it either today or tomorrow. Okay, so we want to have an example for how we use naive Bayes. And that's the car theft example, which is I picked it up somewhere. I don't know where that was, but it came from a colleague called Eric Meissner. I'm sure if you search, you will find it. So it's a good example because it's simple. You can do it on the board. It shows the principles and how naive Bayes concept can really simplify things to do it. So like any other things, we need a training data set. So we need, I have to draw that table again. So we are looking at different cars. And the question is, how likely is it, would they be stolen? Because not even how likely. So because as we said, the probability that you're talking about is a Boolean algebra. Even if you calculate, even if you calculate a probability at the end of the day, people don't want probability from you. They say, okay, give me a yes and no, make a decision for me. So if I buy this car, will it be stolen or not? So most likely we don't ask that question. Most likely insurance companies are interested in that question. Because they want to just accordingly adjust the premium based on the risk factor. So, and then we have some features and attributes for the car. Of course, the color of the car is important. Believe it or not, red cars are get stolen, red and yellow are stolen more frequently, based on last things that I read. I have read, so red, red, red, and then yellow, yellow, yellow, yellow, red, red. So, okay, I'm just skipping, I don't want to write all that. And then you have a type of the car, what type of the car is it? So sports car, I would say, yeah, sports car are more likely to be stolen. They are shake, they are hot, so people steal them. Sports, sports, sports, sports, and then we have SUV, SUV, and then we have three more SUVs, and then we have sports again. So before you continue, I'm seeing a trend here because it seems every attribute that you are showing me has only two values. The car is either red or yellow, and the type is either sports or SUV. If that's true, if you have not manipulated the data, I love this data set because that makes my life so much easier. The probability of all attributes is here 50%. So it just simplifies the calculation. Then, of course, the origin is important. I don't want to discriminate our southern neighbors, but if I have the choice to steal a Ford Scape or a BMW X3, I would go with the second one. So the origin apparently matters. So is it domestic, and then three more domestic, and then we have imported, and twice more imported, and then domestic again. Domestic, imported, and one more time imported. So again, I see the pattern, the attributes have two values only. So domestic and imported. That's good. That's nice. And then, so let's say you have only three. Let's say you have only three attributes, which is, okay, that doesn't mean necessarily it's too easy, but it's not vitally difficult. So then the question be, was it stolen, will it be stolen or not? So it could be a question for the past, if it is about training data. Or will it be stolen, which is a question that the insurance companies would be interested in. So because I want to charge more premium if you guess that this is the type of car that people will steal. So the first one is a yes. Red sports domestic, not a very small thief, but okay. Second one is no. Third one is yes, no, yes, let me see. Yes, no, yes, no. Yes, no, yes, no. And then yes, no, yes, no, no, and yes. Okay, don't mess this up. So one of the things I never do is you get the data and you have to copy paste it somewhere. You have to clean it up and don't start typing. Because you just need to mess up one of these rows and then the data is falsified and you'll confuse the classifier. Because it was supposed to be a no and mistakenly when I was just preparing the data, don't touch the data. Prepare it, but don't touch it. Don't recreate the data, because that's source of a lot of errors. So I'm talking that from painful experience. So and then somebody will come and say, guys, yeah, okay, you took a course on AI, so what is this? So I have a case and my question for your super smart AI technique is, will a red domestic SUV be stolen? So somebody comes to me and say, look, this is my experience. And my question is I have a customer right now that I wanna give a quote on insurance and she has a red domestic SUV. Will this such a car will be stolen? Can you answer me? Please look that there is no red domestic SUV in the data set. If it was, it still doesn't mean it's easy, but that would make things a lot easier. So there is no such an instance here. You have never had a red domestic SUV. So based on the past data, you have to learn something from the past data that enables you to make a statement about a case that you have never seen. Would that constitute intelligence in your perception or not? I would say yes. Okay, so red domestic SUV. Of course, this is not in the table. It doesn't exist. If it would, it would be a ridiculous exercise. Or at least you have to exclude it from the table to train something. Because if I train my technique, whatever technique that is, with that instance, big deal. You have seen a case like this. At least for situations when I have three features, that would be too easy. Okay, so what do we need to do to do this using Bayes? So we need to calculate some stuff. We need to calculate some stuff. We need to calculate what is the probability the color is red and stolen is yes. What is the probability that the type is SUV and it is a stolen yes? And what is the probability of being domestic and it is a stolen yes? And the same thing for no. What is the probability that it's red? But no, it was not stolen. So you have to figure that out. It's very confusing. There are some red cars that are stolen. There are some red cars that are not stolen. So it was not that easy that based on my prejudice, I said, red cars are getting stolen more. Sometimes not. Sometimes people steal brown cars, so. And of course, the probability of SUV no and the probability of domestic no. So if you recall that equation that you must have in your notes, that we made that assumption of conditional independence. And we said, okay, now I can write it down as the product of the probabilities. That's what I am calculating separate probabilities. Because I'm assuming being red has nothing to do with being SUV. Being SUV has nothing to do with being domestic, yes or no? They are conditionally independent. A car that is red could be SUV, could be a sports car, true. A car that is domestic could be SUV, could be a sports car, true. Well, it seems naive base is not that naive here. So they are really conditionally independent. So I can really, if I calculate these numbers, I can just multiply them and come up with a conclusion or inference. Okay, so we know that those numbers that I wrote, the number of cases V equal VJ and A equal AI. So N is five here and P is 0.5 for all of them. So this is my luck. I don't need to write. The probability for this year is red and yellow is 50%. It can be either red or yellow. The probability for type is sports or SUV, 50%, P is 50%. I want to go back to that P estimator that I wrote at the end of the last lecture. So and I want to calculate that because I want to be on the safe side. Just in case some probabilities are zero, I don't want to get zero. So I'm using that. So that N that we wrote, the number of cases is five for all of them. The probability, luckily, is half for all of them. And M, we set to three. Why? Arbitrally, I just like number three. What happens if it doesn't work? Okay, I try five. Many of us have difficulty with the concept of empirical research. A lot of things that we have learned is because of empiricism. We do things by trial and error, and then we find out painfully, oh, three is not a good number. 2.79 is a good number, okay? Then use 2.79. So M, this is basically empirically set. No logic behind it. Just, okay, let's start with three. Let's see what happens. So which means that I have to give you basically NC. So what is NC? What NC was the number of cases with V equal VJ and A equal AI. So that you say, this classification is this, and the value of attribute is this. Color is red, and stolen is yes. That number, how many times do I have? That color is red, and stolen is yes. I will get that number. So that number clearly is different. So this numbers we are, so NP and M in that last equation that we have. We had, so all the same for all of the cases. What NC is different, of course. So then we will write NC. So what is, we have red, and then we have SUV, and then we have domestic. And we have yes, and we have no. Okay, life is easy. So for first one, NC is three. Sorry, I'm just writing. So I'm writing three, and this is NC, NC, NC yes. Red one, red two, do I have three? One, two, I must have three. So why do I have, maybe I made a mistake. And C is three, and no, that's wrong. So how many reds do I have that is a stolen? How many color reds do I have? One, two, do I have one more? No, number ten, okay, thank you. Automatically I have two here, okay? So you see, you make mistakes, so that was a bug in my mind. So for SUV, NC is one, and here is of course three. So, and here I have two, and three, cannot be one. So what is wrong here? SUV, yes, SUV, SUV, I have this here. One, two, three, four, I don't have five. Okay, that's what the data is giving me. Okay, I don't understand it, but I have to work with it. Okay, so these are the numbers. So now I can calculate the M estimator, M estimator. And don't forget that that equation of M estimator, that we say, okay, what is the probability of red given yes? We don't say just the probability that we have. So is three divided by five, right? So three divided by five. So in three of the cases that I had the color, the color three cases led to the car being stolen. But I'm not doing it that way, I'm doing it with the estimator. It will give me a different number, but it's a safer number. So is three plus three times five over five plus three. Just looked at M estimator equation up that we had in the last lecture. So that will give me a little bit somewhat different numbers, 0.56. So what is the estimated probability that car is red? And it gets stolen, it's 50%. Well, okay, I was not that wrong. So what is the probability that is red and is not stolen? So is two plus three times five, that doesn't change. Over five plus three, that doesn't change. This is 0.43, okay? So there are cases where the red is not stolen, but majority of these, slight majority, red cars are stolen. I don't wanna analyze this, but as I go along, I try to understand the data, because that helps to analyze the result of the classification or inference, whatever that is, if I understand the data. Okay, so what is the probability of SUV, yes, stolen, would be one plus three times five divided by, again, five plus three, because those numbers are the same for everybody, which is 0.31, okay? So SUVs are not stolen that frequently, so 30% of the time. And what is the probability that SUVs are not stolen? Which is three times plus, three plus three times five over, again, five plus three, which is, give me the same probability as for, yes, for red cars, so 0.56, okay? So these are individual probabilities that I'm calculating. And alone, their richness, their substance in making a statement is very limited. You cannot just make a decision based on the color or just based on the type of the car. So you need to put them together, and that's why we talk about inference in different ways. We had fuzzy inference, now we do a sort of Bayesian inference using the Bayesian, the Bayes theorem. So and the last one is, what is the probability of domestic, yes, stolen, which is two plus three times five over, like everybody else, five plus three, which will give me 0.41. And what is the probability of domestic, nine, no, which is three plus three times five over five times five plus three, which will give me, again, this number 0.56, which is repeating itself. If you have a large database and a number is repeating itself, that could be something wrong. If you have a small database and something is repeating, that's because your numbers are limited, yes. Not necessarily, because my data may not be complete. I don't have enough observation for SUV, let's say. Which one? Five plus three, five plus three, five plus three. No, n plus m, the estimated probability in nm estimator is n plus m, and n was the number of cases. Yes, okay, so we're talking about this, sorry, okay. So it's m times p, the probability, but I simplify it. So I just write the number of cases. So I don't calculate it because it doesn't matter for all of them. Sorry, so it's a problem, okay, sorry, okay, sorry. Okay, thank you. You have to scream when you see a mistake, okay. So, which means, okay, then we have to calculate the probability of yes, which is 0.5, the probability of no, which is 0.5. That makes it really easy, everything is so straightforward. So, okay, then the value is yes. So, now I wanna do the inference. Now I have all numbers. I wanna plug it in in that naive Bayes equation and calculate everything. So, which is, I have to calculate what is the probability of yes, times the probability of red, yes, times the probability of SUV, yes, times the probability of domestic, yes. So, which is equal, 0.5 times 0.56 times 0.31 times 0.47. You can get a really small number, so this is 0.037 if I have not made any mistake. So, okay, so now, that was, if you remember the sum of the values, the probability of values, which is the class membership, times the conditional probabilities if they are independent, which is just the multiplication of individual probabilities of every attribute. So, the probability of, this is the probability of yes, right? So, the probability of yes is this much, 4%, let's say. So, the probability of yes is 4%. So, what is the probability of no? So, then I have to calculate the probability of no times the probability of red no times the probability of SUV no times the probability of domestic no. And that would be 0.5 times 0.43 times 0.56 times 0.56. Which is 0.069, okay? So, which is, of course, greater than 0.037, which means no. So, the red domestic SUV will not be stolen, okay, buy it. So, very simple example, of course, but it's nicely working and showing that we can actually calculate inference based on the values that we have. Okay, just want to mention one other method, because I think in context of optimization, it may be useful, which is the concept of swarm intelligence. So, it's fundamentally population-based. I just want to give the example. We will not go into details. Population-based stochastic optimization. So, we are done with the naive Bayes. I just want to mention one more technique before we close up. Optimization, example, and colony optimization. And colony optimization short ACU. And particle swarm optimization PSO. So, it's basically in the nature, mostly inspired by the nature, that you look at many social insects, mainly that you see animals, insects acting group, and the group behavior brings intelligence. So, they may be individually not very smart, not very capable, but then they join the forces and get smarter, which is, I would say, which is exactly the opposite for humans. We are really good when we work alone. When we work together, usually we mess things up, and we mess things up badly, especially when there is a large majority in a country and they want something, they will do some bad stuff. So, it's not for us. This is just for insects. So, the ACU, for example, this was inspired. The current ant population is around 10 to 16. I have no idea who has counted them, but 10 to 16. Ants are one of the most successful species on the planet. They are, evolutionarily speaking, stronger than dinosaurs. They survived. They were tiny, they don't need much, but most importantly, they act together. So, they use the concept, and that's one of the things I wanted to mention, the concept of Stigmergy. So, for many years, we didn't know how they do it. So, there are fantastic videos you can watch online that you see how smartly ants act when they are after food and there are obstacles that appear to be just impossible for such small beings. So, the Stigmergy is indirect communication, indirect communication used by social insects, social insects to coordinate their activities, to coordinate their activities. So, I don't know you have done it when you go to cottage. I remember when I was a child and I would go to the village of my grandpa. I would watch ants for many hours. There was nothing else to do, so you just observe animals. So, and you would see that you have here, let's say you have here food, and here is the nest, and you have the ants here, and how do they find the food is probably some of it is reinforcement learning trial and error. Explore, and then exploit. And then you would see that they would go around, so if you look at them, if you look at them, maybe, I don't know, after 10 minutes you would see something like this. And maybe one of them had made it to the food, maybe. And then you would go have lunch, come back, everybody's sleeping, let's see what the ants are doing. And you would see, of course, that they actually are going back and forth between the food and the nest, and they have found it. And now they are really going in line, super efficient, linear behavior, go there, pick it up, come back, don't waste any time. Doesn't that require intelligence? How do they do that? How do they do that? Why is that of interest to us? Why it's of interest to us, I'm sure the sick mind of human beings will find some crazy things to do in military and things like that. What you want to do is this, assuming the food is the optimal answer to my problem. Can I build an abstraction of that? That I create some candidate solutions, and I let them go, and they somehow magically find the solution. So that's a population-based approach. That's a population-based approach. You need a population. The neural network is not such a thing. Reinforcement learning is not such a thing. Physicism is not such a thing. So usually whoever it is that is looking for a solution, you start and then you are searching. One guy is searching. The entire neural network is one guy. Represented in the gradient descent of the back propagation. That's one guy looking. But here we are saying, okay, so maybe we send 100 people. Go find food. 100, first of all, who should compute that? Well, perhaps because they are relatively independent, we can use some sort of parallel programming. Come on. We could do that. So what they do is we can do one more, I wanna do one more cruel, abstract animal experiment, and then maybe we leave it at that. So now, interesting was that, again, you would look at this, and this is the food, and they were going back and forth. The poor, worthless animals, they are just going about their life. They have no idea that homo sapiens can come up with really cruel experiments. So we come here and we put a stick here. Just, have you done cruel animal experiments when you were kids? So, okay, so let's see, they figured it out. So, and you see the confusion. Again, after a while, you see that, and okay, so we can do this, okay? We can do this. Okay? So let's see how smart they are. And on this side, it's shorter, and this side is longer. And you see that some of them are going this way, after a while, some of them are going this way. And of course, these guys are taking much longer time if I look at the perpendicular. So this is a much, they should come south. They should not go north. So from optimization perspective, that's the amount of time that it takes to find the solution. How long does it take you to figure out the shortest path? We understand the concept of shortest path. What is the shortest path? And they figured it out. They figured it out. So again, you look at them after a while, and that one-sided stick is here. Again, under north side is longer. Let me exaggerate it. And you see, look at it, after a while, you see, oh my God, they figured it out. They are just going south, nobody's going north. They are going back and forth. How do they do that? There are actually automobile companies were the first one, among the first one who started this. They created learning from animal departments. They would send bunch of engineers to Africa, watch how lizards move in desert to learn how cars can drive better under certain conditions. So how do we do that? So learning from animals is not something new. And the question, the point is the way that they do it, so they use a concept that is some, not a concept, they use a chemical that is called pheromone. And that's what the intelligence is. So whenever they go, they leave pheromone behind, which is just a little bit of chemical that each one of them can pick up. Now, every, like any other chemical, pheromone will evaporate over time. But if you have a lot of traffic in a region, the pheromone concentration stays or becomes even much more intense. So the reason that they come here is that because everybody is using this, there is a lot of pheromone, but you don't see it as a child unless you have a device to measure pheromone. Do you have a device to do that? I didn't, so I said, oh my God, it's amazing. They can figure it out. So, okay, what I want you to do is whenever you have time, take a look, I will upload something. Take a look at ACO as another metaheuristic optimization that is inspired by nature, and then you have to say what? What are the concepts? First of all, it's population-based. But so was genetic algorithm. We had chromosomes. But the driving, so the learning was happening through fitness. The members did not do anything. They did not move. Here, the members, the solutions move. Okay, I don't know how to implement that. I have to think about it. But then also I need some sort of abstraction of a pheromone, which is we talked about error, we talked about reinforcement signal, we talked about fitness, but what is pheromone? So I need to somehow model pheromone concentration. That if you are moving in the vicinity of a solution, you create a buffer, and every time that the solution is close to that, you just count it up, you increment it. Well, I can just simulate what pheromone is, but then I have to somehow make sure that I, how long should I wait for the algorithm to converge? Well, these are questions that usually are not really easily can be answered unless you sit down and really design it. Okay, so one last thing. So ACO applied on TSP. So ant colony optimization applied on traveling salesman problem. So I guess most of you still remember the traveling salesman problem. I have some cities, and these cities are connected to each other. What is this as a Boltzmann machine? Not necessarily, because we don't have connection from old cities to all other cities, but there are some connection. So then the question is, I'm sure you remember this. This is your source, you start from here, and this is your destination. You wanna get end here. How do I find the shortest paths to get there and come back? So if I go here and here, and I have to visit everybody, here, here, here, here, I already visited those. Okay, that's not good. So here, then I can go here, but then I am not visiting this. So I have to come back, I have to here, so I visited this twice. And then come here, and then come here. Okay, not very optimal. I visited two cities twice. And I still don't know what is the distance. Are there any weights associated to that? So keep that in mind, traveling saves one problem. TSP is a set of tough, tough optimization problem. We have solved this up to, I don't know, 200, 300 cities conventionally. And then is the end of it. That is Merry Christmas. You cannot do anything. You cannot solve this for a million cities. Why should we do that for a million cities? Why not a million cities, but if these circles are representative of ISPs on the internet, then we have more than one million cities. So how can I use this for optimal bandwidth allocation? Why, that's intractable. You can, but there were problems like learning of seven layers also intractable. Yeah, we came up with some tricks, with restricted Boltzmann machines, and then the Gibbs sampling and all that, and layer-wise greedy, layer-wise training. Can we come up with some tricks like this? What population approach would work here? So artificial ants or agents moving from city to city. You see, I wanna find an application that excites people from city to city on a TSP graph. Traveling salesman problem. Why traveling salesman, what? There is a salesman. He or she wants to go from city to city and sell his stuff, but of course he has to pay for gas. He doesn't wanna visit any city twice, and he wanna take the shortest path to be efficient. Okay, so there's a cost and there's a time. Perfect optimization. You make this big problem, it becomes really challenging. So now what happens if I create 50 ants and let them go? Go on the traveling salesman problem. Let's see what you can find. And on each node, they will deposit the pheromone. So if a node is being used by many, many ants, you get a very strong concentration of pheromone, which is like accumulated reward. It's just a number in a vector. So cities connected, cities connected via pheromone rich edges. Cities connected via pheromone rich edges are of course preferable. You see, I wanna create just the seed in your mind that where you can go with this type of idea. But still, of course, you have to sit down, come up with a pseudocode, think about the details, think about the initialization, think about the, these are the way that new ideas are born and suddenly come with something that's called ant colony optimization. Still one of the best techniques we have for meta-heuristic. So, okay. Then to solve such a problem, basically, I wanna stop but I just, other part of it, let me just tell this one to a little bit more, a little bit more. Okay. So, how do you make sure that the optimization is progressing? Again, we are brainstorming, but I will brainstorm and leave it to you to really design by providing some information. So, we learn from animals. We look at one of the difficult problems we have in computer science, which is traveling salesman problem. Just try to do this with exhaustive search. You will be done at 40 cities. 50 cities, oh, even 40 is difficult. 40 cities, you will be done. Fire happened, merry Christmas, doesn't work. So, okay, you need meta-heuristics. You cannot do this with deterministic algorithm. You need some stochasticity in the game and colony optimization like, but how? How do I make sure that things make progress? Things should go forward. If I just let 50 ants on a big graph, how do I know that they will get somewhere? What is the guarantee that they really find the food, which is the destination? Not just find the food, find it efficiently on the shortest path. Okay, that makes it even more difficult. So, to do that, we do two things, and I leave it at that. One is local trail update. Local trail update. So, when you go from node to node, that's a local trail, right? So, this is, when you go from here to here, that's a local trail. So, what is the pheromone concentration here? So, I don't know, P of T. What is the pheromone as a function of time? So, you have to update the edges. If you don't update the edges, nothing gonna happen. Because we said the concept of a stickmergy is indirect communication. Ants do not talk to each other. They don't send each other text messages. What they do, very sophisticated ones. It says, I was here, this is a good place to be, follow me. That's all you need to know, very, very compact. So, you have to do the local trail update, and second, you have to do a global. Trail update. How do I do the trail update globally? So, when, that's a tricky one. So, this is clear, yeah? So, when you are crossing this, you just increase the pheromone. I was here, next one crosses this trail. I was here, okay, that's easy. It's not a problem. But the global one, how do I know that, so if the global one is the red one that I drew, so how do I know this was a good trail or bad trail? How do I know this was a good trajectory or bad trajectory? So, when all ants complete a tour, complete a tour, the ant, the ant that made the shortest ant that made the shortest tour, modifies the edges, modifies the edges of its tour by adding a pheromone inversely proportional inversely proportional to the tour length. So, when all ants complete a tour, so what is a tour? So, you start from source, you get to the destination. This is one tour. So, I got there, I got to the peak, then that ant makes the shortest tour. Of course, who is allowed to change things? The winner, who got there first? That ant, ant number 125. Okay, ant number 125, you can make changes because you won. Okay, so we have to determine who makes the changes. You see, this is where most of the time we get inspired from nature and then we come to a certain level and then we deviate in the nature. There is no such a thing in nature, but we have to do it just to make the abstraction work. So then be clear about it what it brings because then now you're, so there are things in the stigmergy that we do not get. Therefore, we have to make things like that to make it work. Just be clear about that, everybody's doing it. So, the ant that made the shortest tour modifies the edges of its tour, the edges of its tour, not the other edges. I know this path and I will modify it, it was a good path. By adding pheromone, inversely proportional to the tour length. Of course, if it was a very short path, I will add a lot of pheromone. If it was a long path, I will add not much, but I still am the first one. Yes, the edges, how many edges took you to get there? So one, two, three, four, five, six, seven, eight, nine, 10, 11, 11. So I have to figure out those details. So there is, okay, there's a lot more that you don't have time to go into it. I will upload some stuff for ACO and PSO. You should know about it, you should know how it works and how the algorithm design is done. So I wanna stop because I wanna give you time and I know that you don't need just the time for the course evaluation, but also for the midterm. So I will see you at seven. Please take time if you can do it now, do it, otherwise do it tomorrow. Hopefully university will calm down tomorrow. We have more evaluation and I will see you at seven.