 Hi, I am Omar Gutierrez. I want to talk about some things for nonparametric bias. Most of the things I will say are based on some tutorial from this researcher, Samuel Gerstman. Actually, I won't tell you too many things. We will recap somewhat the definition for a model in Anucho. I will show the most simple example for nonparametric bias that is clustering. I will show you how we usually do the clustering and the alternative approach that is the nonparametric model. Well, I will say something, and I will try to say some jokes. I hope you understand. So, okay. What we do in machine learning is, how can I say? Not only, but the main idea is to modify the values for some parameters in some training process. Well, this is almost the default approach in machine learning. So, it's not common to hear parametric models, but almost all models that we are using right now are parametric. But there are another group of machine learning models that are nonparametric. And for me, they are very interesting, and I think we need to discuss the idea. So, just to remember, in linear regression, our parameters are, well, our beta values. So, we can move a line in the plane or a hyperplane. And, well, in neural networks, it's almost the same, but, oh, well, this is a perceptron. I confused the image. But it's the same, we modify the ways, etc., also, I don't know, hidden market models. We have some kind of ways that are the hidden values, and we modify them. So, then, I think you know too much about machine learning. So, I want to ask you something. Oh, for you, which one is the best model? I think on the blue line, as, I don't know, a kind of time series, and the red line, the model. So, for you, how many of you think that the model A is the best one? And how many of you think that the model B is the best one? Okay, well, I did a trap because I didn't show you the complete data. And, well, in the last step, we can see that the line blue dropped down dramatically. So, in this case, the best model is the red line, oh, no, sorry, the one in the letter B. So, then, it's not so easy, like, just modify parameters in a training process. Data science sometimes is, rather than a scientist, an art. So, let's think, I don't know, there is someone from Gease here. Those guys are doing stock marketing. So, imagine that we are buying stocks, but in some moment, you need to stop by it because it could be risky. So, the model in the side B is a good model in that kind of scenario. But maybe you are asking why this name, like Bertrand Russel in the TV Turkey, do you know the history? Okay, so it's very funny because imagine that one Turkey that is smart. This Turkey is, it has this philosophy inductive. So, he made conclusions. And then, each morning, he receives food from his owner. Then, he started saying, oh, okay, I'm receiving food at night. In the morning, the rainy days, sunny days, weekends. So, he starts to conclude that he will receive food for a lot of time. But that is not the case in Christmas Day. Because the next day, the owner will put the trap for this Turkey. So, this model is very good in that scenario. And I like that history from Bertrand Russel. So, I think that you know this formula. Of course, the budget rule. Well, I will repeat what it does mean. Well, let's start with the blue term. That is our prior knowledge. Imagine that as the conclusions that the Turkey is getting every day. So, but this reality, it will change. It will change because of the observations. Let's think in D as the data set, I don't know. Or, yeah, the observations. Given our hypothesis or our parameter. And at the end, we will have a new reasoning that is called the posteriori value. That one in red color. So, well, we define already what a model is. So, now we are talking about Bayesian reasoning. That is all, I think, very old. But this is still very useful and very popular. So, the idea is that all this was before the machine learning revolution. And the data science revolution. So, the idea is that manipulating probabilities, we can make inference. So, let's see this other formula. We want to get the maximum argument in the Bayesian formula. So, this is called maximum posteriori. For example, from the step two to the step three, do you know why we delete this term? You know, here is present, but not here. Can you tell me? Because there are like 24 formulas like this. No, just kidding. Well, we can see that this term is not affecting the other one. So, we delete it. And also, if we have at all prior knowledge, we can say that the probability of our hypothesis is the same. So, also, we can delete it. We can finish with this maximum likelihood estimation. That is a very important formula. And it's not so hard. So, we will see maximum likelihood estimation in almost every algorithm. If you want to prove that your algorithm is correct, just try to match with this maximum likelihood estimation. And I will come back later. I just want to mention that there is a very nice paper where the paper studied the history of this maximum likelihood estimation. So, it's nice to see the progress. I don't remember the title, but you can Google if you are interested. So, now we know more about some Bayesian reasoning. And we know that it's present in many algorithms, much in learning algorithms. So, let's think on some problems with data. The data is always evolving. So, for example, imagine Wikipedia the first years. I don't know how many articles or the topic of the articles. But let's say that they were just biology and chemistry. And then the next years, Wikipedia started to have articles in sports or biographies from artists. So, the data was evolving. The same with the species in the planet every week or every few days. The biologists are discovering new species. So, they need to modify the taxonomy sometimes. Then it's evolving the data. And let's think in the social networks. They are evolving every second. For example, the hashtags on Twitter. And, well, how we usually address the problem. For example, clustering. There is a kind of common way to do it. One classic approach. So, let's say that we want to use max Gaussian mixture models. And let's think in the Gaussian mixture as something that is proven by the maximum likelihood. So, the Gaussian mixture models is not equivalent, but it complains with the... No complains, sorry. It fits with the maximum likelihood. This is a claim without proof. I won't prove it. But, yeah, then in the Gaussian mixture model, there is bias reasoning. So, yeah, we can do some clustering maybe with Gaussian mixture. But then some questions arise. How many clusters do we need if my data is evolving? Then we usually create too many kernels. And then do a comparison. I don't know, maybe with bias information criterion or silhouette. There are a lot of metrics. And in this case, the best clustering was five kernels. I don't know if you can see. So this one, yeah, it seems a good clustering. But actually also number three. So that is what we usually do. But less thing then in another approach. So we have seen the parametric approach, but less thing in non-parametric approach. And non-parametric can be confused like there is no parameters at all. But actually the number of parameters is infinite. But also there is a bad assumption about that. So the idea is that we have empty clusters, infinite clusters. And we will start to fit them with our data. So in this way we can solve that problem if our models can be adapted to the complexity of our data itself. Well, let's see this diagram. So we have the Bayesian models. Some of them are non-parametric. And those names are funny. Like, you know, Chinese restaurant and Indian buffet. I don't know. Sometimes the science is funny. You know, I am from Mexico. And I am thinking that if I study enough the direct process, I can conclude another similar model. I don't know the Mexican taqueria process maybe. So these models are known as direct process. And I will explain the Chinese restaurant process because for me it's very intuitive. So imagine that we are in a Chinese restaurant. And usually the Chinese restaurants in California, I think, they are huge. So one scientist discovered that. And he said, oh, well, the Chinese restaurants are huge. So you can go there. If the restaurant is empty, you can choose any table. And then the second customer will go and will choose another table or the same one with some probability. And then at the end, we realize that what is happening when a lot of customers are going to the restaurants is a kind of clustering. And each table is one cluster. And, well, that's pretty cool. So that is the idea of the model, the Chinese restaurant process, one model that is evolving. So this is the clustering for the previous data we saw. But with the infinite Gaussian machine model, that is some direct process. So we can do everything you can do with a parametric model. For example, digit recognition or topic modeling. And, well, actually, I am in the conclusion part. So let's recap. In the traditional approach, sorry, I mixed some letters instead of data listing probability of H. Well, the number of parameters are fixed. We have some distribution over those parameters. In the other case, the non-parametric models, we assume that we have infinite number of clusters. Our data can be adapted to... No, our model can be adapted to this data. So there are some libraries in Python. Of course, Skyler... Sorry. But they only have, like, two or three or one... One or two dialy algorithms. And the best one, in my opinion, because it's more like the research stuff, is this one, data microscopes. But I don't like it so much because they only are available on Konda and not in the official Python repository. But actually, yeah, they are the best library. They have the best library. So if we want to know more about this, we can study what the beta distribution is. In a nutshell, the beta distribution is just a probability of probabilities. And then we have the dial distribution that is some generalization from beta distribution. Then that means that it's a distribution of distributions. And finally, the dialy process. Well, if you also want to read more about it, well, my favorite book in machine learning is the one written by Tom Mitchell. And this tutorial is really nice. And the library I mentioned, we have also very nice tutorials. You can check it. Well, that's all. Grazie. So, have you questions? Okay. Can you use the microphone and speak directly to him? So you mentioned the Gaussian mixture model. Is that considered a Bayesian model then as well? Maybe the Bayesian model is not a Gaussian mixture model. It's not strictly a Bayesian model. But the algorithm that we use to do the clustering is expectation maximum. And this one uses the maximum likelihood. So the maximum likelihood is derived from the Bayesian rule. So somewhat we can say that it's part of a Bayesian reasoning, Bayesian model. And actually there is some discussion. Like most of what we know as Bayesian models, they are not exactly Bayesian models, but just statistical learning. And less thinking that if we use calculus and we say for everything that we are doing things from Newton, and it's not exactly like that. Well, Newton was important in calculus, but there are other mathematicians that also put things in that, over that. So it's almost the same with Bayesian things. Sometimes it's not exactly Bayesian, but it has behind this Bayesian rule. Okay, more questions. So we have plenty of time. Have you questions? Okay. Oh, it says someone again. Oh, great. Yeah, I can ask a second question maybe. So I'm just trying to understand this Bayesian models. Is it a category of models, or is it like a way of using different other models? Because I mean, I know the name Gauche mixture model, and I'm trying to understand Bayesian models. Is it also a category of other models, or is it a way of using a model? Yeah, maybe. Yeah, there are some models that are strictly Bayesian. For example, let's say it's Bayesian networks. We have strictly some two probabilities that are not independent. They depend on each other. So we can say that Bayesian networks are strictly a Bayesian model. And there are other ones I think we can think on a Bayesian model or hidden market models. Because yeah, we have some nodes and they have a dependency. And this dependency is a probability. But exactly we can then discard Gauche mixture model as a Bayesian model. But still, as I said, it has some Bayesian things inside. Very deep. I don't know if it was a good answer. So maybe I can finish. I am more like a software developer, but I am not doing software development anymore because it's kind of boring. Then I decided to do a mastering computer science. I'm doing a lot of statistics now, but I am also not a statistician. So as Bertrand Russell said, I am not a philosopher. I am not a mathematician either. So I am a centaur. So it's the same. Last question. I don't see hands up. So thanks again to Oma.