 Pretah smo zvoje dobro gotovali. To je nekajto, da sem se pravda nekajte načinne ljudne urablje. Danes splala mi nici. Da je bilo več in razpravljalo. TheAz At the World is a big set in the Blackboard is artificial intelligence, so the Blackboard is artificial intelligence and we don't care about everything that is outside this circle, this circle is machine learning. Zelo. Zelo je tudi tudi v 3 kategoriji. Tudi tudi, ki je to početno, če je supervisljerno, če je in supervisljerno, če je reenforcment. In sem tukaj o supervisljerničnih, v prvih različnih različnih različnih različnih različnih različnih različnih. Lekasno različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različnih različni are neural networks. So there are various forms of neural networks. The most simplest form is the feed forward neural network. Then there are other type of networks we are going to talk about, convolutional neural networks, or there are more complex networks in which the information goes back and forth between the neurons. These are called, for example, recurrent neural networks. So this should give you a bit the idea of how the field is categorized. Actually the distinction, the boundaries between what is supervised, what is unsupervised, what is reinforcement learning are not so strict. But this is just to give you an idea. And essentially in supervised learning, as I was saying, you have a set of data with associated labels. What distinguishes it from unsupervised learning is that in supervised learning you just have the data. You don't have the labels. And I'm going to talk about that now. And in reinforcement learning is a bit different. You have some information coming to, let's say the agent, but not in the form of data. And we will talk about that even later on. Oh, yes. Deep learning is nothing else than feed forward neural networks in which the number of hidden layers is relatively large. I will explain it better later on. It's supervised learning. It's feed forward neural networks. Let's say. Or whenever you add the word, the adjective deep in front of any of the neural networks means that there are more neurons. So, other question? OK, so let's continue with unsupervised learning. So as I was saying, this category of machine learning tools are characterized by the fact that there are no labels in your data, just data. And the goal is to essentially find informative patterns in your data. By this, I mean things like the example over there. So you might be given images of the type left and side there, so apple, dogs, bats, whatever. And you might hope that giving just those images to your machine learning algorithm, it will end up in somehow dividing them in clusters. Of course, there are no correct clusters here. The point is very tricky. But you might think that, OK, in all the pictures in which there are animals, there are two spots that are not too far apart, one from the other, OK? The eyes. So at some point, you might think that the algorithm might detect the fact that there are eyes in certain images and there are no eyes in other images. So you might divide these two sets, one from the other. And it's useful to have this type of classifications then because when then you have a new image, like, for example, this one, the algorithm might recognize it immediately without, as an animal, without needing to test whether it is an animal or not. So without needing to test whether it breathes or not. For example, if you have to feed it, et cetera. And this, of course, can be useful. You can imagine in whatever application you like, but for example, in medicine. So there are various methods here. So let me list, since I've listed just now this one for supervised learning. Let me list some of them here. One that we're going to see in detail is principal component analysis, abbreviated as PCA. But then there are others of which are pretty common. One that is, if you want it, you can think about it as being inspired from statistical physics. So these are called restricted Boltzmann machines, RBM. There are also here, you can use also for unsupervised learning neural networks. And in particular, what they're called generative adversarial networks. So you can already see that the boundaries between these sets become quite blurry because I'm using neural networks to do something in unsupervised learning. So you can use neural networks essentially in everything of this. Autoencoders that finds, if I'm going to explain some of the details of this, it's going to be too long, so let me skip it and let me go directly to the one that I wanted to explain, which is principal component analysis. One thing that we can use for, I think for which we can use unsupervised learning is for what is called dimensionality reduction. What is dimensionality reduction is that essentially reducing the, so your data comes with very large vectors in general, but not all of the information that you need, not all the components of your vectors encode interesting information for you, maybe. So you might want to reduce essentially the dimension of your vector. You want to find the important degrees of freedom there in case you want to use a more physical language. And there are various ways of doing it. For example, autoencoders can do that, but we're going to see how we do dimensionality reduction with PCA. Principal component analysis is something that, say it like that, might sound, at least to me at the beginning, sounded with a physicist's formation, quite strange in the end is diagonalizing a matrix and truncated for the eigenvalues with small values. That's all what it is. So I can explain it pretty easily here. So we have, as usual, our set of data in the form of vectors, XI, and each of these I, sorry, each of the components of my vector might be p real components. This p can be called features, are usually called features. So those are features of your vectors. So you can imagine some data about me. One feature can be how tall I am, one other feature my weight, et cetera, et cetera. And you can have many people, not just me, and this index I denotes the various people. And you can store all this data in a large matrix, let's call it Z, whose columns are given by my vectors X1, and say X2, et cetera. And I have n of them. So the column represents the single samples, so the single individuals, like me, and the rows represent single features. So in the first row you have the height in centimeters of all the people in my example. In the second row you have the weight, et cetera. And the main goal of PCA is that it returns the most informative basis, in a sense. What do we mean by basis here? Well, this is a matrix. So a basis is associated to it. And this is called the one that is given to you directly by your data is called naive basis. And the most informative basis will be given by a combination, so in the easiest case a linear combination of the most informative features here. So how does it work? Well, let's see first maybe an example. Ah, that is it. This is a nice example because it's very much related to Northern Ireland. So you can see that here there are let's say four individuals, the four nations of the UK. So one nation is England, one is Wales, Scotland, and the other one is Northern Ireland. And you have the average food consumption on average per person per week in each of these four nations divided in various categories of food. So first I'll call it drink, then beverage, et cetera. And okay, you see this data and you have to start wondering, okay, what's the difference between here and there? What's the difference between Northern Ireland and England? So in this case it might not be too difficult, let's say to spot that for example in Northern Ireland the consumption of fresh fruit is pretty tiny compared to the consumption of fresh fruit in England, in Scotland and in Wales. And you can see that of course the other way around is when you talk about potatoes, because Ireland is famous for potatoes. So in fact there is a high consumption of potatoes in Northern Ireland, higher, sensibly higher than the rest. So for this type of very few data it might be extremely easy to, let's say, find important features. And in general that is surely not the case. So if you use PCA, which I'm going to describe in a second for this set of data, you will find a principal component that is composed essentially of potatoes, fruit, cheese and alcohol in different weights, such that it's a linear combination of these four components essentially with different weights. And this is now going to be one single axis and if you plot the four nations on this axis you can immediately see that the cluster appears. So weights England and Scotland are cluster together whereas Northern Ireland is far apart. So this is the axis, the basis on which the largest variance can be found in this set. So the largest variance in which sense, in a specific sense. So let's give, to immediately understand this, let's give again a silly but physical example in which you have a spring moving and you have a camera taking a snapshot of the spring moving. Now the camera is not positioned perfectly perpendicular in a perpendicular position with respect to the spring. So if you plot it in this naive basis given by the camera, the various points in which the ball is you will find something like this. Which of course we are physicists, we recognize that this is a one-dimensional motion so there must be just one important degrees of freedom, one important feature. And it's simply the one that you might have if you rotate your camera. Well performing principle component analysis for this type of easy problem means actually to do a rotation. And this is the rotation. Those are exactly the same points, but rotated. And we say that this horizontal one is our principal component and you can see that there is a large variance on the data on this principal component much larger than the variance that you have in the second component. OK, let's stop here and let's explain the mathematics behind it. And you will do a tutorial exercise on this so on the computer with some real data actually coming from photonics. And so here we have our matrix here. So essentially we want to remove redundancy from this matrix. And so one first point is that the data have to be centered. So they have to be centered to their mean. So first center the data for each feature such that the mean of each feature is zero. And then once you do that you can construct immediately a mean free correlation matrix related to matrix Z. So we can construct a correlation matrix let's say we divide by the number of data that we are given. And if you construct this matrix you can immediately convince yourself that the i jth element is the average of the feature i times the feature jth so just a correlation between the features. So if you want to find the most relevant the combination of most relevant features what you do is that you diagonalize this matrix. So when you diagonalize it you might plot the values of your eigenvectors and you might see that some of them are let's say particularly large and then others are relatively smaller. And if you are lucky enough you find a significant gap. When you find that you can say okay this in this case for eigenvalues are the ones that will correspond to what we can call the most relevant eigenvectors and in this field these most relevant eigenvectors are called principal components. Obviously those are given by a linear combination and of the original features and for some problems linear combinations is not enough so there are variations of this algorithm it's called this variation for non-linear features it's called a kernel PCA in which you use a non-linear kernel to perform something very similar. Question was I... Yes. So the question is that my data can have different units can have very different you have to take that into account. So you... I would say that one can normalize this data apart from centering them you can use their own variance within each feature. Yeah, you can use that but I think that one really needs to put the hand on the problem and maybe also have some initial intuition of what can actually be the most important. Because if you normalize everything with the variance then you screw a lot. So you might think from the very beginning OK, my... the consumption of neo-Zelandis fruit in the whole UK is not going to be relevant in any case maybe it's all consumed in ways. Yeah, it needs some subjective initial filtering and interpretation of that. Other questions? I'm going to give you again a couple of examples from quantum information similar to the ones that I gave you before. So for example here we have performed... it's another silly example you can take a Werner state same as before of course there is only one parameter but if you rotate it like I did here by... this single parameter is not immediately there from the data. You diagonalize your matrix and you see that in this case there is a gap of scale one of length one from the first eigenvalue to the other significant eigenvalues. And if you add a bit of noise to this data so alpha... sorry, you add some white noise to all the components of this matrix then this is not going to be exact and these are going to be slightly blurred and once you plot your matrix on the first and second principal component so the first two eigenvectors corresponded to eigenvalues of largest value you still can see a high variance on the first principal component and a much smaller variance on the second principal component please note the scale the scale of the horizontal axis is much larger than the scale in the vertical axis. And here I just plot so these two different colors are not obtained with PCA but it's just to give you an idea that once you have found a certain... once you have found your principal component then you can further analyze your data so for example you can distinguish in this case whether the state is entangled or not and it's going to be much easier than to use for example other supervised learning method to distinguish the state if you have found just one or two degrees of freedom that are relevant rather than the full matrix and this is shown in this picture a more complex example again slightly more complex is when we use two qubits so if I take again the x states you can see that as I said before the x states three degrees of freedoms and if you plot the x states in the axis determined by these three degrees of freedom you plot for example here is the expectation value of sigma x times sigma x, here you have y and vertically you have z you can see that these x states die in this tetrahedron and here I distinguished the inner part in which the states are separable and the outside part in which they are entangled just to say that there is a certain structure here we know that there is a certain structure once you have identified these three axes but if you rotate these qubits these structures is again blurred and you cannot see it however if you first do principal component analysis and you see this is an example again of these rotated x states in which there is some noise added so these three eigen values are not exactly the same but still one can see a clear gap and if one therefore rotates accordingly or plots let's say the original data accordingly to these new bases these principal component bases the structure that I was mentioning here so this geometric structure reappeared now here it's a bit difficult to see maybe yes so here it's a bit easier because you move it and again I colored in order to make it more clear to your eyes I colored in blue the entangled states and in these other colors that I don't know what color is it I'm slightly color blind the separable ones so at the beginning these x states they have I don't know 3 plus 4 plus 4 so 11 degrees of freedom but once you find the three relevant ones then you can use them to further analyze your data in easier way so that is a possible application of principal component analysis and other examples are for general for quantum tomography you will see Luca giving to you an example of this in the tutorial or for example entanglement detection and things like that questions? if not I can start with reinforcement learning so this is the last of the three main categories the one that I was mentioning here so you can think of reinforcement learning as a set of machine learning methods in which for example you want to teach to a robot how to work in an unknown environment so you cannot really use the other methods because the environment will be unknown so you cannot train the robot on an environment for which you have a set of data a set of labels like don't go there because there are stairs and you will fall down I'm not aware of that it might be exactly the same I really don't know the question was if this is the same PCA used in social science thanks so I was saying that you can use supervised learning to teach your robot to work here in this environment but it will tell nothing to your robot if you now put it in Adriatico guest house so you cannot really use these tools so the generic setting is the following so there is an environment will be in a certain state and it will indicate this state as s and then there will be an agent your robot for example that can or let's say the brain into your robot that can observe the environment there will be some observation and based on that observation can perform certain actions to the environment means action for example to the robot itself the robot itself is part of the environment this is the agent is the machine learning algorithm that teach what to do to whatever is inside the environment and the agent usually gets information not just from the observations but also from some rewards which can simply be I managed to walk around an unknown room without falling down for a certain amount of time the largest amount of time possible so you can give a reward for every 10 seconds that you stay up and ok the setting therefore is very different with respect to the setting of the other two cases because you start with no labels and no data the environment itself can be dynamical in particular your robot can move or someone can move the objects in the room and the agent can observe the environment can perform actions and can get rewards and the goal here is to learn the best sequence of actions that maximizes the rewards the rewards can be very sparse so for example the rewards can be just at the end of your actions so the question that I was asked before about how to treat situation in which you have really small amount of information this can be a setting so this is a typical example of this setting so the state here the state S of the environment is given by the full map of the position of a player that wants to this red thing that wants to go around and get more treasures this green spots as possible moving around and gets more treasure so the action will be moving and the rewards are these treasures collected that you can imagine at self driving cars medical applications etc there are various methods for reinforcement learning some of the most used one are policy gradient that I am going to explain with some details Q learning Q learn there is also another one that I want to mention projective simulations and I am measuring it because it was not invented by computer scientists it was invented by Briegel, Hans Briegel and collaborators who is a theoretical physicist working on quantum optics etc. quantum information science and they first developed it in the classical setting because they wanted to apply to the quantum setting so there is already a quantum a full quantum variation of this but is this strict because this would be a nice moment to stop us so questions? see you then at half thirty