 that make me feel to make them claimers to begin with. First of all, it's my opinion, it's not bad hat. I'm really lazy to create slides that look okay. I found this, I like it, I took it. Pretty much all the stock was created by having multiple bad pizzas at multiple places that had like reviews that say is the best pizza at that place, whatever. So that's what started. Who am I? I'm a Brazilian Pythonist that moved to Scotland for the weather. I'm from Sao Paulo, a 20 million people city. So I moved from these into Dundee, 140,000 people. The fourth largest megalopolis in Scotland. Home of the Dundee cake and the birthplace of the marmalade. So again, from this into this, that's two minutes away walk from my place, so just around the corner. I'd like to define what is a good place to eat, but I'm not. We all know what it is. Stock is not about that. The stock is about, like, you can go to any first place in any restaurant site review, it's going to be nice. But I'm worried, I want to know about what the 119th or the 314th sites are. And that's what this site's about, it's how to find them. The first idea would be, like, stars and ratings. And if you get something like this, what does this tell us? Well, pretty much nothing. Is the first restaurant a McDonald's or a very good, picturesque place? So we really don't have to judge that. Also, if it is a McDonald's, you get, like, five stars for a lot of people, it might be an interesting McDonald's, you see, it might be the best one. So what does that tell us? It's just that restaurant A, Scarrow likes a little bit more restaurant A than Peter Sam likes restaurant B. Ratings just not so bad. And all the rating sites found that really early on, as any kid just from the kindergarten would say that ratings is not going to work. You have to use the lower bound of Wilson's core confidence interval for a Bernoulli parameter, of course. So another good metric that you might try is the number distribution of ratings. So if you get that, it's good. And for the first restaurant, you have, like, pretty much four reviews. It might not be good, but it's not nearly enough to know that. For the second restaurant, you got what's called a G-shaped curve. What they call it in the edge high, where you have a lot of terrible reviews, a lot of excellent reviews. The thing is, people tend to vote for the very good and the very bad, and the average is just, you won't vote. It was all right. So that helps, but doesn't solve the problem. So now for something completely different. What's in the algebra? When is that true? Yes, dirty clothes. So washing and drying and drying and washing are dirty clothes that don't commute. That is for Wikipedia. They make me write that. But yes, matrix multiplication also between after home short. So a funny thing about matrix multiplication is that depending on the order that you do, that's pretty much all the math that you're going to need for this talk. You get different results with different dimensions, and that's pretty much based on the order that is. So the trick here is that in order to multiply two matrix, the middle values, that is, the columns for the first matrix and the rows for the second one have to be the same number. And three is when you multiply them, they pretty much vanish. You might as well use a pit emoji and that's going to disappear. So that's something really nice. Back to Sao Paulo. Sao Paulo had a lot of immigrants. In the turn of the 19th century, it was the population of the city. 35% were Italians, 11% were Portuguese. That's almost 50% of immigrants in the center. When we got close to 1940, it was almost 700,000 Italians, a lot of modals. At that time, it is for the state, the second one for the state, it couldn't find for the city, but it didn't set my time around quite a time. That's nice. But the population at Sao Paulo at that time was less than one million people. So there's a lot of Italians there and a lot of immigrants there. They created a food subculture pretty much close to what you have in Italy as well in New York, like the New York Sao Pizzer. We do have a Sao Paulo Sao Pizzer. And that's important because it comes to my hypothesis that people that have the same background judge the things, judge food, use the same standards. If you have a culture of something, you're gonna judge, and all of you have the same culture, you're gonna judge it the same. If you don't have it, you're gonna judge differently. And here's me trying to prove it. That place, the first one, it's not even the top places I've never been there, but look how tight the Gaussian is. People agree quite well that it's a good pizza. If it was a big pizza, it's gonna be lower, but it's gonna be as tight as well. The second one is the best pizza place in Dundee. And it's awful. But look how close the Gaussian is. And Dundee has 10% of people from abroad. And each one has a different idea of what a good pizza is, and don't have a pizza that's defined by Dundee. The same thing happens here in Bilbao. The first one is just, it's not even, again, I like the middle ones. This is the first restaurant that I went here in Bilbao. And it's a solid four, but again, a very tight Gaussian. A lot of people know that's a four and they vote for four. The thing you went in Edinburgh, it's a fish and ship place, and people there have a good fish and ship. Fish and ships. And also very tight Gaussian that you have from people that would not. Thus, solving my hypothesis. Proving my hypothesis. Well, it's not really a talk about diet science. It is just a yuck shave. I don't want to go in the Gaussian rampage, analysis rampage. So this is just based on the hunch. And this is the hunch that what can suit a good place to eat is based on an individual background. And that's deeply based on the individual, makes that if you agree for multiple reviews from someone, you tend to, the chance that you're gonna like another restaurant that they like will be much more likely. And then we go back to linear algebra. No food. Suppose that I have this, I got it somewhere for a friend. A huge list of users that gave an amount of stars for a restaurant. And I have a huge list of those. So you have for the 500 top restaurants in Bilbao, for example. What could I do with that? Of course, load on the matrix. And I create a matrix that is restaurants for each row, users for each column. And the value in the middle, value for each position is just how many stars they gave it. And we call it M, just M has dimensions restaurant users. So the same way that we could remove dimensions, we can also create them. So if I have a matrix and I wanna create two new matrix out of there, if I can make that equality, I can generate any size that I want with this extra dimension that I'm creating. And that's quite useful because suppose that I create a matrix that's really an approximation matrix C. There's an approximation of that matrix users that I got from real data. But this is theoretical, theoretical one. I created that. And that one is the result of users or a matrix of restaurants with some categories that I choose. And a matrix of users with the same number of categories. If I can multiply them and create a matrix C, that again, it's a good approximation of matrix M. I can pretty much classify the restaurants and the users in these categories. And that's pretty much what non-negative matrix utilization is. It's a way to create weights for automatically generated categories. You don't know what each category is from beforehand. Sometimes they don't even make sense, but sometimes you can get pretty much a rough approximation of what they are. And our result is pretty much a matrix for restaurants that can tell me oh, this restaurant has these categories in different weights. And I can try to match those two other restaurants and try to find the restaurants that I might like based on that one that I like. Oops, I should have done that. So the non-negative part of it and that's something I should have to say, it's just, if you keep it all positive, greater than or equal to zero, it's a lot of the ways of generating C because one of the most usual ways of doing that is the least squares. And that's gonna be easier if you do just with positives once. And okay, it's very good because stars are zero to five. So how do you generate C? This is taken from this book, this Programming the Collective Intelligence. It's a slightly older book. I guess it's 2010, probably. But it's one of those great books that when you return to them, from time to time you find different stuff. It's all done in Python. It's probably one of the most fun books you're gonna have in your bookshelf. So this is just pretty much his algorithm for that and I just left the comments because that's pretty much all you have. All you need. So first you start R and U of just random values and then you start iterating. And for each iteration you calculate how different C and M are and then if they are the same you just exit but that doesn't happen to like realize data. But then you fixate one of the matrix and integrate the weights for the other one. Then you fixate the other ones and try to integrate the index and the weights for this one. Calculate again and try to improve it and see what it gets. And that's pretty much how I got him. This is a hard part because I can't, I have no view over from here. So this is pretty much like proof of concept. What I did, this pretty much it just loads a bunch of import. My focus in the right place. And this loads the data that I got. No, it didn't. Okay. If I wanna run the no negative, let's just say it just runs on here. But I've just run it. So I got an M matrix that's, so it is 11,000 users long and 500 deep. From that they got the rest of the matrix. It is just the bottom of it with the categories and weights that they have. And also the weights for the users. So if you look at here, you see that user zero voted for restaurant zero and so far there's a five here. This is a very sparse matrix. And the more connectivity that you have, the better results that you have. And it makes sense, like you try to find people that vote likely, likewise for all the residents. If you have like just someone just voted once, it does not gonna help your data that much. Um, here I do a little bit of the pretty fine of that. So, okay, so, pretty matrix. So this is the restaurant one. So I have restaurants here. Each of the factors there are row though, the URL. And that goes on for 500, for the 500 restaurants. Here have the users. And I've just transposed the matrix. So it does make it more likely. How are we in time? Okay, okay. Nice. I didn't start my timer. So here's again users ratings. A lot of them is just like people that voted once so they don't really influence stuff. Someone with an empty name that I found that just goes here. And here I do have some pandas magic to get people and show them. I can get a little bit back to this, but the thing is I get the restaurant that I like. And this is how they look. Ah, no, I should have disabled controls, yes. Okay, so here's how they look. So it has a huge factor four, which we don't know what it is. A larger factor zero and factor three. And that's pretty much it. So we might want to find similar restaurants. So this is just the basic find similar that I found. It's very trivial. It's not even the best way. The best way is probably using some linear programming to find it reducing the difference. But here I have an INC, which is a solid four. It has a little bit of zero here, but you can see it because the four is like 40. So it's huge four. And from that you can get an idea what that category might be, because they sell code tapas, code pinches, and croquettes on top of toast. So that might be an indication of croquettes on top of toast kind of market. Costco also huge four and a large zero zone. Well, that might be an interesting restaurant to see. It doesn't have that much F3. The same restaurant, that's a good one. And La Deliciosa. And that's a weird one because I first looked at it, it didn't seem that interesting, but that's something that has a huge four. It has a good F zero, some M4 and M2. So that's something that this program, this way of thinking just suggests a new place to look at, El Vuelvo Frito, that I didn't like it. This is just searching the similarity between the curves on the rest of the same data. You can also do that through the users. You have factors here that are correlated to some factors. So if I find who voted a lot for factor four, factor four might be people that likes this kind of factor for restaurants and they might be interested to find what they like. So I got to users, this next thing. This P63, P63 votes a lot for F0, F3, M4, and F4. That might be an interesting person to see what has they voted for. It is an eight and it's kind of tough. And then from them you can try to find familiar restaurants that people like. And again, it in Zini, Cafe Barbibaud, that's a different one. It's Soco Beria, Casco again, VWogia. And that's it. Oh no, no, no. It went back to the beginning. It's changing. It stops following. Thank you. Oh yeah, no, it's the right place, sorry. And that's it, thank you. Questions? Questions? Hi, you ended up with nine categories in your example. Was this by 10, so what was this? The study? You ended up with nine categories or so. Oh, it's F0, F9, so it's 10 categories. Oh, I chose 10, it's just a random value. I tried with 20. It got the results. The more categories you add, the closer they get to the more similar the matrix are. But also it makes harder for search and makes less sense of each category. I got a balance about this. I tried with five and I tried with 20. And pretty much 10 was like the sweet spot and that's pretty much what I used. When you run it, I don't know if I showed there, I was about to, oh, sorry, did it, oh, I'm not marked off. So yeah, I can get the next, go away. So this is 13 directions, 15 directions, sorry. Oh, it's not that, oh, just 10, totally not that. So they are calculating it. I should be able to put in something in a while. So it's a start, so let it go, like I'm gonna show you how to put more questions. Thank you. Oh, sorry, oh, there's one question. So having done all that, is it true that you can sort of tell how much you're gonna like a Cesar restaurant just by looking at the shape of the, and if you just go on Google reviews or something like that. I know that I've been walking with you and I've seen you do that. Did you see, oh yeah, this one looks good, just by quickly looking at it, you get a feel for it. Yeah, yeah, just the shapes, the general shape of, it can tell you if it is a good, it's a huge, from looking at the shape, you can clearly see if you got the J-shaped, and depending on how large is the terrible one, it could be an indication that there's a lot of average and poor ratings that you're not seeing. So that's something that I try to avoid. Seeing as close as we can see to a tight Gaussian, it's usually a good sign. But even though a large one might mean just that the restaurant is not really stable, really doesn't always do stuff to the same standard. So even though it might be an interesting or not place to see, but it does tell you some stuff. It is still running. Okay, thank you. Over here, I can't just, so I do not. Here's how they do, the results come from. This is the difference, and as it goes, it reduces until it doesn't reduce anymore. And that's pretty much the amount of difference that I have between my C matrix that is the theoretic one to the actual one. So with five, that thing doesn't go that low. If 20, it does, but not that much, and doesn't improve the results that much from what I saw. Again, not data science, it's just a guy with computer and transparent place to it. Thank you.