 Okay, hello to everybody. It's a pleasure to be here. I'm sorry if I'm not looking at the camera most times I will look at the screen, but I will try to look at the camera when I remember. Okay, as Mateo says, we are starting the second part of the course of data mining the many body quantum problem and quantum model, but let's say many body problem. I'm in charge of the data mining part. So today we are going to talk about machine learning. In concrete, we are going to do something that it's probably what you don't have in mind when thinking about machine learning, but it's unsupervised. Not about unsupervised machine learning. And before we start talking about that, let's try to have an idea about what is unsupervised machine learning. And for having that idea, we need to start with what is supervised machine learning. In supervised machine learning, you have ideally one input let's call it x associated with a response function. You know everything about that. From these two things you obtain nothing else and nothing more than a functional form and a set of parameters that we will call omega that define the response of the model. So the idea is learn these parameters from this data in such a way that when I apply new data, let's say x prime process and obtain its associated response function. Let's say neural networks, any supervised machine learning that you listen in nowadays in all the fields of science are based on this basic idea. And this is if the model is linear, which is nothing else than a linear regression, if the model is not linear, maybe it's much more complex, but this screen still follows, still holds. Okay So, how do you do that? You do that just this process in such a way that let me denote the model as f that is depending parametrically of these parameters and have x as input and what we do is to optimize these parameters in such a way that they minimize, but it's called the loss function. And the loss function among the loss function, probably the most common one, is the one defined practically, input data. I optimize my parameters in such a way that the square of the the real response I may predicted is minimal. One example that will allow us to out unsupervised machine learning. And the idea is imagine that you have let's say a data set of millions of images of images that have 1000 times 1000 pixels that may be a neural network to predict one million pixels. Let's say we see each data model, not reducing all this complexity to a single number, extremely important dimensional reduction. I wonder if this is all the information that it's in this data, but it's not. The same data with the same type to predict I don't know. For instance the profession. In order to tell you if the photo is taken out of or indoors, infinite kinds of classifications from a given data. What happens in two is that it's not true. I mean in reality of possible classifications are many of them many of them are correlated. Continuing with the same example of people. If you want to predict the age or if you want to predict the predictions are correlated. People have white and it's not that they are the same because for sure there is all people that this has no and there is young people that do not have any hair. Okay, so possible number of classifications independent that can be done from a data set it's limited and usually it's called this more that it's a number that it's usually much smaller than new pictures. Okay, that you're total number of features. That's because the correlations introduce a structure in your data in such a way that it leads in a manifold that it's whose dimension it's much smaller than the embedding dimension. The dimension of the data by itself. Okay So you would ask me and okay nice, nice introduction, but what to do with unsupervised machine learning? Unsupervised machine learning instead of having a response function without it it tries to understand that they're learning a structure of your data and what's this underlying structure? One of the things that would like to have it's a map. It's a map that tells us this 10 to the 6 in this case, features coordinates, how they map in the manifold, in this task, what it's called dimensional interaction and it's what we are going to do today and the and on Wednesday. Okay Also by itself knowing this D, it's a task of a unsupervised machine learning that it's called intrinsic. That's because it's of touch for the message in this course. Finally in what we are also going to explore that it's another task that is done with unsupervised machine learning it's how is the distribution of this data on here on so how is the peak of x? Because it's not it may be and it usually happens that your data it's not uniformly distributed in your manifold so to explore how it's distributed we will use clustering that it's a way of obtaining the modes the peaks in your distribution function. Okay And there are questions from the audience I can read the chat if you prefer to write in the chat and mute yourself If there are no questions, I will go ahead and supervise machine learning want to explain you Sorry, I have a question When you say manifold do you mean it's in a geometrical sense like Geometrical manifold differentiable real it's The only problem you can think about that in the geometrical sense But in this case we are reduced to our sample. We have reduced to our data. We don't have points in the in the middle, but More or less maybe it's not. Okay. Thank you. Welcome. Let's say Legit In a legit I'm Are we talking about alchemy machine learning when we were talking about The many bodies problem, right? I want to put one example coming from physics Imagine that you have a simulation of atoms in three dimensions Right during your simulation what you do is you solve the questions of motion and At the end have a set of configurations in which you have the coordinates on the of the atoms in its Each configuration are the coordinates of all the atoms. Okay? So each vector That defines a configuration would have three n total number of atoms and you have the each y theta for each atom. So We are talking about Imagine that your simulation has 1,000 atoms that is pretty small. We are talking about an space of 3,000 atoms A 3,000 coordinates, sorry But Let's do it in the much easier way. Let's take Equal to do it if There my Particles are free I just need six coordinates, right? x1 y1, sorry, I will 1, y1 theta 1 and x2 y2 theta 3 With these six coordinates I can okay And these are my two atoms that would have in each configuration The x y theta coordinates now Imagine that I have that's reducing a rigid bone connecting these two atoms Okay If I need to define a configuration The number of coordinates would be the same however When I have to compare different configurations The rigid bone will change the chapter of the manifold that I Champlain because These atoms cannot be further than this distance. Okay It's just putting a rigid bone between two atoms but Okay, now imagine that you are a computer You are a computer Is there a way in which if I give you the set of configurations And configurations are free Two atoms that are free in the space or the same number of configurations But of two atoms that are not free but are bonded In the space, can I distinguish them? Well As chemists I can deduce yes You can have instead of using x y theta you can change your coordinates and say, okay, I'm going to use the x y theta of the center of mass and then my distance between the two atoms And The The two angles that define the position in this sphere. Okay. Okay. These are two angles Whatever it's not important What is important is that if I do this transformation What I would obtain it's a big difference between the my simulation in which the both atoms are free so All these coordinates would be important when I compare a pair of configurations and my simulation in which this both is frozen because if this one is frozen this F12 would be equal to for all my configurations, right? so By introducing this correlation that is a frozen ball I change the space of of the configurations. I change the manifold of my that is explored by my configurations. Okay So by changing that uh my manifold, I mean if I Take a look to the configurations in this coordinates I would immediately realize that this thing is not changing so It would tell me something about the physics of the system right Okay, that's something that I can do it in this really easy case because I know it. I know what's going on. I know The constraint that I put in my simulation. I know everything Alex Tell me There is a question in the chat Okay I think it's the one that I already have replayed Okay the space Sorry, thank you. No, no. Thank you, Matteo and The idea is that Okay, in this case that it's a physical case I can do this dimensional reduction I can pass from this six to five because I can ignore that Just because I know That I fixed this one left What happens? What happens when you don't know it? What happens if I just gave you this coordinates for both simulations and you have to understand what's going on So in these cases it's when You can go to unsupervised machine learning dimensional reduction and try to figure out what's going on and try to your computer figure out what's going on One thing that usually happens Is that this bond It's not fixed But it's constrained by a kind of let's say elastic constant and really quadratic potential So what you would obtain if you plot Let's say this coordinate This one to a function of any of the other coordinates, but let's put the x of the center of mass You would obtain that new point Do something like that The constraint variable the variable that is less important because it's almost fixed Would have a really low variance If you compare with the other variables Okay, the variance of L12 would be something like here while the other variance Would be big, okay Uh That's important because this idea It's the one that we are going to use for The first method that I'm going to explain you that is principal component analysis Okay, before going to principal component analysis, I think we can stop for a small Questions Professor Yes In this case if we consider that you have one particle And we can easily follow the particle in this space Yeah, that's mean at each time we we can take the position of the particle Yeah, if you consider that We have two particle in this case and those are in Tengal that's mean they are connected Yes, uh, you can you can think about two independent simulations in one They are independent and in the other one they are connected Okay, it means every time we try to to check the position of those particle Yeah Here what is the place of machine learning in this case? What are we using machine learning? The question is that now we are still not using that In this case we use it we use Human learning Let's say just to change to do a projection of our x y zeta of both atoms In a coordinate system that allows me to identify that there is one coordinate that is not changing Okay, but that's something that I did by myself I did it by myself because I know That in this system that I have two atoms and I have a bond So I my chains of coordinates It's guided by my knowledge, right? But what happened what would happen if I didn't knew it? What would happen? What would happen if I Yes, I see the other question in the Other questions in the chat, but let me finish that The question is that if I had just x y zeta x y zeta for both simulations What would be just columns of numbers from which Myself I can look at them and I would not Understand if there is a difference between both simulations I understand the difference once I did This change of coordinates But it's just a change of coordinates. Do not care about the details I just did a change of coordinates that allow me to see that this bond Is fixed or it it's constrained in this case Okay It's much clearer now Yes a little bit It means we are just we are looking for the coordinate if we have some coordinate Which are not changing in the system Yes, that's what we are looking for. Okay. Thank you Yes, we may also ask a question, please Yes Uh, does it mean that if we do not fix a parameter, uh, we cannot use unsupervised learning? No, it doesn't The question is that Till now I use my knowledge To understand what it's going on, but I didn't use Any technique of unsupervised machine learning yet I just use Human learning to you to perform a dimensional direction Right. I just Knew that there was a bond And then I changed my dimensions In such a way that one of them can be ignored because I In my configurations when I would compare two configurations in the case that this bond is fixed I can ignore that bond because It would be the same in all the configurations, right? Thank you, sir So it's not that we are still not using unsupervised machine learning. I'm just using my learning to understand the physics And the idea is that Make you understand that this dimensional reduction that I did with my knowledge Can't be done automatically by you by the computer I get it now. Thanks. Okay uh, let me Read the questions in the Okay So Sorry, could you repeat what was the importance of transforming the future space? Well in the case in this case it allows us to distinguish between a simulation in which There is a bond between both atoms and a simulation in which this bond was not existing Because in by this transformation of the space by transforming the space in this way, I can see That this variable was not It's not varying in different configurations Or by the contrary in the case that there is no bond this variable will change like the other ones Okay This is from physics why we are using dimensional reduction in this case But I did dimensional reduction Yes, by by hand. Let's say I did it by myself Okay, Alejandro there are and all the questions regarding the This coordinates it's not true that the the Okay This is not the only change of coordinates that I can't do I choose that because it was Very well sweet for explaining this this case, but Whatever The idea is that if you change your coordinates in such a way that you see Some imbalances these imbalances are usually induced by the correlations in In this case among your particles, right among your coordinates so Singarabella and it's not true that the feature space is linear in general In the case that we are going to explore today it is okay so Let me continue of the method that I'm going to explain to you now It's to arrive to a situation like this one In which I have whose variance it's much lower than the other ones One coordinate or several coordinates. I would transform my space in order That I can a new set of coordinates In which the variance of some of them it's maximum And the other ones that I would ignore it's minimized. This is what it's Fundamentally done by PCF that is principal component analysis Before I start doing that, let me just assume some things of the Of notation because it's a bit messy Okay, the first thing is that we can assume that all The vectors are centered so imagine that I have each configuration for each configuration I have Of fetus right in which each of them would be this one. What I mean is that The sum it's equal to zero. I can do that without loss of generality because I can't just Transform my data in such a way that once I have it I can do this sum and subtract it, right? I can subtract the average from each coordinate And in this way My data is centered I am doing that just because It allows me to perform a bit much the much of it easier, okay But just having that in mind Excuse me professor. Yes, the sum is over i The sum is over i. Yes It's for all the configurations I mean for each point of the for Each of the fetus are centered It's not that the this vector is centered in itself The center for all the data that you sample Hello, please could you repeat the last part? Okay, thank you The century the idea is that they not by the Index i okay each of them Each of them is defined by a vector of x in which You have d dimensions, okay The idea is that each of these dimensions is centered. It means that for doing that you have to Do something Center, let's put it like that It's equal the sum each of these coordinates would be same, right back in that To all the all the data the data center. Let's do one and some and such I my answer is that my new coordinates Let me cancel also here That I would denote as Formation this is just saying that there's a new set of coordinates Each of its components That would be a vector one Each of its components is a linear transformation a linear combination Of all the other components in the other space Okay This is not true in general, but I'm doing this Assumption, okay the idea Is as I told you before it's to find transformation in such a way That the variance In the transform space, it's maximum. I just want to have coordinates My new set of coordinates in such a way that they have a huge variance a big variance Because I don't want the ones with the small variance as the in the transformation before when I transform my coordinates from the Cartesian x y c tau fix atom to The cartigians of the center of mass and all that stuff What they say is okay to describe a configuration. I would ignore The distance between the two atoms because It's almost constant, right It has a really low variance So what I would do is to Transform my data my previous data in a new one In such a way that I maximize the variance. Let's say to do it in one really easy case And Then I will try to explain it with a style Demat but let's start doing it in a really easy case. I don't think we are going to have time but imagine That you have in two dimensions In these two dimensions, let's call it x1. Of course You can see that the variance It's the same in almost the same in most cases So What I plan to do what I want to do is to obtain a new set of coordinates That tell me this is One coordinate and this one is the other one And I will retain only The coordinate that has A big variance Ignore that in a small error. This is a way of doing dimensional reduction with PCA Is it similar to the clustering method clustering method Clustering Yeah Not yet not yet because in clustering What we are trying to do is imagine that you have these two point distributions. You want to have something like that, right? Why here we are just Trying to know which coordinates are important for describing our system. Okay Okay, that's a bit different We will see clustering in the next lectures So Tell me Here we are already have a many position for the particle According to the to the graph you plot Yes Those are not even particles Those are those are two features describing your data your configurations Okay, and the dot are the particle or the position It's not it's just Forget about the physics It's just they are just features in the space Okay, they may be I don't know you can whatever transformation Of coordinates that you can think about but In this case you have two but in general for instance in the case of icing that was explained by marcello And you have if you have a nice in 2d You will have something like The number of spins in your system coordinates, right? yes So in this case i'm simplifying that to two because it's the way that we can draw it Okay, okay, but the the ideas are the same Now i'm doing it in two dimensions But The ideas are applicable to many dimensional Cases Indeed it's when it's interesting here The question my problem is that I cannot draw in 100 dimensions So i'm trying to Explain the concept in the easiest way that it's just with two dimensions Okay, thank you Yes, I have one question You said that variants have to be large Is it part of the method or there is some specific reason for that? I'm going to explain you now the method but the idea is that What I would do Is to perform a rotation of the space of my A rotation of my Back top my let's say my basis In such a way that I can orient. Okay. This is your original basis, right? each one xx2 If you perform a rotation of this In such a way that now you have this basis Let me write it with a different color in such a way that it's clear Now you would have i1 i2 Your data is a key value, right? You are just rotating your your basis But in this new basis The variance along one of the components is maximum. Okay So by maximizing in this way the the variance What you can do it saying okay now in two dimensions It's not so meaningful, but what you can do it saying okay i'm projecting all my points That they were in two dimensions in just one In in which one in the one that has a big variance So I would plot all my points in this dimension So I passed from two dimensions two one With minimal information loss Okay, because I'm losing just This thing This small variance coordinate. Yes. Okay. Thank you. I can do it in two dimensions Yes, I have a question I don't I don't know maybe the example is a little bit misleading for me, but what if the data is distributed in some peculiar way like three different clusters or Okay I don't see how we can give the sense to If they are two clusters, then it's not a big issue because maybe I mean there is nothing against having This the data like that, right? The problem came and I want to explain it to you later if we have time Well, the transformation of the data is not linear I'm Okay You are right the method Is the data is not distributed Linearly, let's say in an hyper plane in the case. Let's say it's correctly The method will not work properly But it's important to know this method because it's at the basis of all the dimensional reduction methods Okay, so this is why I want to explain you this method today And then we will see the problems that it has and all that stuff Okay, thank you Welcome more questions. If there are no more questions Let's say to do it in this case. Okay In this case, if you compute the covariance matrix How do you compute the covariance matrix? Since it centers It would be covariance between two variables okay would be The sum for all your points of the product, right? Because I already centered so this is the covariance. So once you have the covariance Well, I mean to take the average This is the covariance. Let's compute the covariance for these points or let's assume that it's something like that It's a matrix Which I have 1.09 Let's use these numbers that I Put in an example that I have here. You can easily diagonalize that right So if you diagonalize that What you obtain Since it's symmetric, you will obtain two real values one of them it's also Two agent vectors agent vector associated to one that would be Vector that would be the opposite two values Two agent vectors agent values agent vectors. What you do It's response To this variance this one Corresponds to this one agent vectors corresponds to your new space you can from that easily obtain Your transformation Let's say once you have that this variable lower variance explains and it comes by lambda one divided lambda one Plus, let's call it the fidelity conventional reaction in this case I'm performing an extreme dimensional reaction. I'm going from two to one coordinate And I'm saying that my new coordinate would be expressed in this basis it explains One not 89 over So it's a really good coordinate Okay, I'm I'm losing a bit of information but the amount of info that I'm losing It's really really low. It's let's say my data will fit really well in the line once I have that This is a diagonalization. You can easily project my data In my new agent vector just by taking that 0.7 0.7 Times the value of each one plus 0.7 the data Just by taking this let's say this is my transformation The formula that I would use to obtain from this coordinate a single coordinate I take it here for a easy case in two dimensions can be generated to many dimensions Okay, and I will write you the formulas I will not derive it because we don't have time formulas here and then next day we will continue. Okay What we do is to compute the covariance matrix not its symmetric this diagonal matrix will obtain an spectrum of agent values, right the sum of agent values from divided So when I pass from pca from one space to the other I have to compute this number in such a way that I have an idea of the total amount of variance that is preserved by my projection Okay, and I think I will stop here On Wednesday, I will continue explaining you the pca We will finish with that. I want to know if there are questions. Excuse me, sir Yes I have a question which might sound a little bit silly because the whole machine learning thing is very new for me but I would appreciate it if you answer it and my question is that Does it mean that in unsupervised learning the classification that we do over the whole data is like The different clusters are not completely independent. Does it mean that they are somehow related to each other in unsupervised learning? Today we need to perform Say a single word about clustering and just talking about how to transform the coordinates in the space from original space to another one of lower dimension clustering we will see that in following lectures No, you know, my my main question is about the difference between the supervised and unsupervised learning techniques There is in supervised learning You have a ground truth Okay, we have heard sorry We have a ground truth You know something about your data, you know, you have a response function That it comes along your data that allows you to train your model So it doesn't mean that We only do the linear regression for the supervised learning. We can also do that in unsupervised also Yes, I mean the idea is that for linear regression It's a case of supervised learning because you have Some data and a response function, right? You have an x y, right? Yes In the case of unsupervised learning, you just have your data You don't have any response So we have to set some criteria as ourselves and Let me finish imagine a configuration of atoms in the space And you have energies associated, okay? Supervised learning would take all the configurations and all the energies and If you gave a new configuration, it will allow you to predict the energy right Okay, this is supervised Because I have a response function In unsupervised, I would not have the energies I would make groups or try to understand the structure just of the data space Okay, thank you professor. I have a question Yes In this last case in which we went to general dimension d Do we project on a hyper plane? Yes, we are doing a transformation In a hyper plane I mean Let's say you can apply to whatever data you want But it's only correct if you're It's strictly correct if your data lies in a hyper plane Okay, thank you. Hello Yes Uh, when when you talked about the the smaller the eigenvalues are bigger compared to what? In the in you You saw them from the biggest one to the smaller one Yeah, but then you it's like you specify the sum. So what are the how many eigenvalues we take? You for okay. This is something that we are going to discuss next day because okay. Okay. We don't have night Thank you And there are questions in the chart asking about references for the lecture Okay, uh, I didn't get any reference as I told you as we told you we are we are going to give you the latex latex notes for this lecture I can I would put a reference in the Magics in the elements. Okay I didn't get it today with me, but I will give it to you. That's all. I think we can stop here And see you on the wednesday Okay, I would give Fabio finishing your You don't replace I will Give you the Reference next day on wednesday. I think we can stop recording. But for what what link if it's here Please ask Ask Matteo in case Matteo, could you please Put the link of the matrix here I already put it Matteo, I think we can stop recording