 Olen nyt nähnyt esim. faktor-analysisen, jossa olisi hyvää helpottomasti helpottomasti ymmärtävästi asiaa. Sitten faktor-analysisen on teknikin, jota aloittaa sinua löytää ympäristöryhmistä dataa. Se asuu kysymyksiä, mitä nämä varioit ovat ympäristöryhmistä, ja ne ovat yksi varioita. On eksplotero-faktor-analysisen, ja sitten on konferentero-faktor-analysisen. Eksplotero-faktor-analysisen on yksinkertaisu, ja se on jotain, jota olen yksinkertaisu, jota aloittaa ympäristöryhmistä. Konferentero-faktor-analysisen on yksinkertaisu teknikin. Tämä video on eksplotero-faktor-analysisen. Toi tehdä faktor-analysisen, jota tarvitsemme dataa, ja meidän dataa on täällä. Tämä dataa on R, ja nyt uskimme R for their analysis, ja dataa on the scores for Decathlon in the men's 1988 Olympics. So Decathlon is a sport, where you have to do 10 individual sports, so we do 100 meters run, long jump, shot put, high jump, 400 meters run, 110 meter hurdles, a discuss throw, pole wall, jowling throw, and 1500 meter run. We're gonna factor-analysis data. But before we do factor-analysis, let's take a look at what the data look like. So this is the first 15 observations. Observations are on the rows, and the variables are on the columns. To understand what these numbers mean, we need to understand the units. So we have the running sports, which are seconds. Less is better, more is lower, and therefore not as good. And then we have the throwing sports, more is better, and jumping sport, more is better. So these are meters, more meters is better, and less seconds is better. Okay, let's factor-analysis variables to see what they have in common. And we will take two factors first. So factor results and those results are here. And I have ordered the sports, according to the factor loadings. And we can see that all the running sports, particularly those that are about short distance running, less than four-and-meters, or that involve short runnings, short sprints like long jumping or pole vault, they belong to the first factor. So this first factor, we could label it as running speed, because or running, because that is what these items, that load on it are about. Then the second factor, we have a shot put, we have javeling throw, discuss throw, and pole vault that load on that. So we could say that that is upper body strength. All the throwing sports are there. So why is this pole vault in first, loading on both and running, and upper body strength? Well, if you consider what the sport is about, you first run very fast, you sprint, you gather speed, then you use the pole to put it in the hole, and then you hold it, and then you must use your upper body strength to get yourself over the bar. So it requires both running and upper body strength, and therefore it loads on both these two factors. We can extract also more factors. So if we take three factors, then we get even more dimensionality from the data. So we can see now that the running sports, there's actually running speed. So previously we had the 1,500 meter running, run, a loading on the first factor, but now it does not load on the first factor anymore. So it's only about speed now. The second factor contains the upper body strength sport still, and then the third factor contains running stamina. So you basically in the stamina for 1,500 meter run, which is the main item that loads on this and the others don't load as highly. 4,000 meters run loads to some extent because you need more stamina for that than for example the 100 meter run, which does not really load on running stamina. So when we extract more and more factors, then factor analysis quite often takes existing factors and it splits them into sub factors. Clearly if we think about these different sports, running fast and being able to maintain your speed for a longer time are two different capabilities that the person could have. So it would make sense to take three factors from this data. We can also take more factors. So if we take four factors, what is difference here is that the fourth factor here is simply contains high jump and nothing else. So high jump is quite unique sport because it's not about running speed, it's not about stamina, it's simply about how high you can jump and it's not related to upper body strength. When we start taking more and more factors from these data, eventually we will have each sport belonging to each individual factor. So if we have a 10 variable set like we have here, we can extract 10 different factors because there's always some uniqueness to each sport. But quite commonly we start by extracting two or three factors depending on our theory and then we stop when we consider that adding more factors would not add any value. So we will just start getting these individual sports and saying that all sports are different is pretty obvious. It does not answer the question of what the sports have in common. We can take a look at the correlation matrix to see what the factor analysis does. So these are the correlations between the sports and factor analysis basically finds those combinations of sports that are highly correlated with one another. So we have here the running sports, they are highly correlated and they are less correlated with other sports than they are with one another. So we could see that these all sports measure your running speed. Then we have the upper body strength factor here, all the throwing sports belong here. So they are correlated highly with one another and less with the running sport items. Then we have the running stamina factor. So we have 1,500 in the run here. We have the 400 meter run here which are correlated because they require stamina. For some reason long jump is here. Maybe the athletes that are good in long jumping are also good at these stamina sports. And we can also see that the fourth factor here, high jump, it is very unique. It's not highly correlated with any of the other sports. So that's something that depends on different sets of skills than the others. Of course, we could simply be interpreting this factor, this correlation matrix directly, but it's a lot easier to do it with a factor analysis using a computer because it simplifies it. You have less numbers to look at particularly if the number of variables grows larger than 10.