 In the first video about structural equation models, I gave some background to what structural equation modeling is, the historical paths that led to its development, some of the key ideas and the ways that it can be applied in social science settings. In this video, I'm going to talk about some of the key ideas, terms and concepts in structural equation modeling. This is important because SEM is rather different to other areas of statistics. Some of the ideas that are important in understanding and applying SEMs are quite unfamiliar, and so it's important to have a grounding and a familiarity with these ideas before we move on to other applications. So in this video, I'll be talking about path diagrams, the way that we represent equations and theories in the form of diagrams in SEMs. I'll talk about the difference between exogenous and endogenous variables. I'll talk about the way that structural equation modeling analyzes not the raw data, but the covariance matrix of the variables that we're interested in. I'll talk a little bit about how parameters are estimated using maximum likelihood in structural equation modeling. I'll also go over how we apply what are called parameter constraints, how we don't always estimate every parameter in the model. Some of the parameters are fixed to values before we start fitting the model. I'll also talk about how we assess the overall fit of a model in structural equation models and the importance of the idea of what are called nested models for assessing model fit. I'll also talk a bit about identification of structural equation models. Again, that's something which is linked to model fit and something that we don't encounter so much in regression context that many people are familiar with. The first thing I'm going to talk about are path diagrams. Path diagrams are one of the reasons why structural equation modeling is very appealing to many social scientists in particular. This is because social scientists don't always have such a strong grounding in mathematics and are less comfortable with reading complex equations and so on. Path diagrams are another way of presenting the same information as we can get in an equation, but they do this visually. That's often a clearer way of seeing what is being presented in an equation compared to Greek letters and symbols and so on. If we write our path diagrams correctly, then we can read directly between an equation and a path diagram. They tell us exactly the same thing. In this example here, we could write a bivariate regression equation in the usual way where our dependent variable y is a function of our independent variable x. We are going to solve this equation using data and we're going to solve for the unknown parameter beta, what is the relationship between x and y. We can also write that same information down in the form of a path diagram, a simple path diagram in this case. So we have here y is now represented as a rectangle and x is also a rectangle. We have an arrow running from x to y in a single direction and we have a small circle pointing into y which represents the error term in the equation. You can see there that there is a b above the line to indicate that the parameter represented by the straight single arrow is a regression coefficient. This is quite clear visually in the sense that what the model is implying at least is that x causes y and that there is some coefficient beta which summarizes what that causal effect is and there is an error term in that equation. There are conventions for path diagrammatic notation so that we use it consistently. There are some variations in how different conventions are applied and so on but this is the general form where we have a latent variable is represented in the form of an ellipse. An observed variable, a variable that we've actually measured in our dataset directly would be represented as in the last slide using a rectangle. Error variances are small circles and this is similar to a measured latent variable. It's a circular form but it's actually a small circle now and this indicates that these error terms are also latent variables but we don't actually label them. These are kind of unknown or residual latent variables. We also indicate the relationships between variables using lines with arrows. A curved line with an arrow at each end indicates a covariance between two variables. We call this sometimes a non-directional path or an unanalyzed relationship because this is used to show that two variables are related to one another but our model does not specify anything about the direction of that relationship. It may be because it's not an important part of the theory but we know that the two variables are associated. Lastly, a straight line with a single arrow at one end indicates a directional path, a regression coefficient. We're saying if we use a single headed arrow then we are indicating the direction of the relationship between two variables in our model. We can put these basic symbols together to form more complex models but ones which have a clear meaning and which can indeed be translated back into the standard equation notation. Here are some examples of some quite simple path diagrams. Here we're just looking at measurement models. These are confirmatory factor models. We have here Eta1 which is a latent variable shown as an ellipse. Eta1 here is shown to cause three observed variables, X1 to X3. We can also think of that as Eta1, the latent variable being measured by the observed variables X1 to X3. At the top of the diagram we have three error variances E1 to E3. Those are the errors for each of those equations that Eta1 is predicting X1 with some error, it's predicting X2 with some error and so on. So that's a simple path diagram for a factor model and that could be written as an equation but we are in this instance using a path diagram. We can extend this to make a slightly more complicated path diagram. Now we have two latent variables Eta1 and Eta2. They're essentially the same diagram as we saw in the previous slide but now we have two latent variables. And we have six observed variables, six variables in rectangles each one of which has an error term. Now we've also added in here a curved line with an arrow at each end. This is to show that in our model the two latent variables are correlated with one another. We're not saying anything about the direction of the relationship between Eta1 and Eta2. We're just saying that we think that there is some kind of relationship between them. In this path diagram we've now introduced a theoretical statement about the direction of the relationship between Eta1 and Eta2. So we no longer have this curved arrow but we have a straight line with an arrow at one end. What we're saying here is that Eta2 is a cause of Eta1. And this again would be similar to the first diagram that we saw, a bivariate relationship, a bivariate regression, with Eta1 regressed on Eta2 and we would then solve for the unknown beta coefficient above the straight line with the arrow at the end. But, as I said, this is now a bivariate regression of a latent variable onto another latent variable. When we are building path diagrams and systems of equations, in structural equation modelling we need to distinguish between two important kinds of variables, exogenous variables and endogenous variables. Now an endogenous variable, as the name suggests, is something which is caused within the system. It's a variable that has, if you like, an arrow pointing into it. It is a dependent variable in one or more equations. An exogenous variable on the other hand is akin to an independent variable in that terminology. It's a variable that is not caused by anything within the system of equations that we are presenting as our SEM. That doesn't mean to say that we believe that exogenous variables are in some senses not caused by any other variables. It's simply that within our own model, the variables in the model, it doesn't have any direct cause. Now an important part of SEM is that variables can be both exogenous and endogenous. So we can have an arrow pointing into a variable, making it endogenous, and that variable itself can have an arrow pointing at another variable, making it an exogenous variable in that limited sense, although it's now a different kind of variable because it has an arrow both pointing into it and an arrow coming out of it. And that's important because that kind of variable is a mediating variable. It's a variable through which another variable has an effect on a third variable. In this path diagram, which we've already seen, but this path diagram now we can distinguish what kinds of variables these are. We've got two exogenous latent variables here. They're exogenous latent variables because there is no directional path pointing into either of them. Neither of them therefore has an error term. This is just a correlation that we're seeing here. So these are both exogenous in the model. Again, we've seen this path diagram. We've got a new distinction that we can apply to it now, though, that Etta 1 is endogenous and Etta 2 is exogenous. Etta 2 doesn't have any direct path going into it. It doesn't have an error term, whereas Etta 1 has an error term pointing into it because it's got a directional path running from Etta 2. So a fundamental advantage of using structural equation models is this ability to represent our theories as diagrams rather than using notation which many social scientists are less comfortable with. Another, if you like, unusual feature of CEM is that in the conventional practice anyway we don't analyse the raw data of the observed variables, but we analyse the covariance matrix, which will denote S, of those observed variables. This is kind of unusual and somewhat surprising to people when they first come across it that all the data that we need is just the set of covariances and variances of the observed variables. As we shall see in later videos, some structural equation models also use the means of the observed variables in addition to their variance-covariance matrix. So what are we doing with this covariance matrix? Well, in broad terms, we are trying to summarise S, the variance-covariance matrix of the observed variables, by specifying a simpler underlying structure. So we're going to specify a model which is in some ways simpler than simply reproducing S. And our model, our CEM, in this sense, the simpler underlying structure, will yield an implied variance-covariance matrix. What I mean there is that if our model is true, then the variance-covariance matrix that we observe should look like this. It should have these numbers in each of the cells. And again, as we'll see later, this implied matrix can be compared to the one that we've actually observed. And that comparison, if it's done properly, can tell us something useful about how well our model is accounting for the data. To the extent that the implied and the observed matrices differ, then our theory, our structural equation model, is not doing a very good job of telling us how this data were generated. So a covariance-covariance matrix, probably most people will be more familiar with a correlation matrix, but here we're dealing with unstandardised variables, and this matrix shows six observed variables, from X1 to X6, and they are in both the columns and in the rows of this table. And the diagonal, which is shown in bold, indicates the variance. So the covariance of a variable with itself, in this case maybe X1 and X1, that gives us the variance of that variable. So a covariance of a variance with itself is its variance, and those are shown in bold on the main diagonal. Then we see in the other cells the covariances, which can be negative or positive in the other cells of this matrix. And you'll observe that the top part of the matrix is redundant with the bottom part. So we actually only need the lower part of this matrix. Now, an important aspect of any model fitting, and structural equation modelling is no different, is the need to estimate what the unknown parameters in our models are, the betas. What is the relationship between eta1 and eta2 in the population? Now there are different ways of estimating these parameters in standard regression modelling we would use ordinary least squares. In structural equation modelling, practices mainly around using a technique called maximum likelihood. And maximum likelihood estimates the unknown model parameters by maximising the likelihood, which we can denote L, of a particular sample of data. Now L is the likelihood is a mathematical function which is based on the joint probability of continuous sample observations. So in essence, maximum likelihood finds what the maximum value of L is for a particular sample of data, and it does that by sort of iterating through using different values for the unknown parameters until it finds the maximum likelihood. Once that maximum has been found, then we have produced the maximum likelihood estimates for the unknown parameters. Now, maximum likelihood is appealing because it is unbiased and efficient. Now, what those terms mean are that if we have a large sample, then our estimates of the unknown parameters will be correct. They will converge upon the true values in the population. They're efficient in the sense that no other way of doing this will give us more precise estimates of those parameters. Now, those two assumptions of being unbiased and efficient do themselves hinge on some other assumptions. One important one is that the data come from a multivariate normal distribution. Essentially, that requires us to be using continuous variables. So, maximum likelihood is less good when we have variables in our data set that are not continuous and that we have arrows pointing into. In those situations, we need to use different estimators. But for now, I'll be focusing on the simpler case of multivariate normal data and maximum likelihood. Now, another way in which maximum likelihood is used in SEM is that not just in the estimation of the unknown parameters but in use of the likelihood, the maximum likelihood. If we take the log of the likelihood for the model, then we can use this to test how well our model fits compared to some more or less restrictive alternative. So, maximum likelihood is used in two ways in SEM. One is for estimation of the unknown parameters and linked to that is use of the log likelihood to assess how well the model fits the observed data. Most areas of statistics that social scientists are familiar with, the focus is very much on estimating the unknown parameters in the model. We want to know what the relationship between X and Y is in the population or possibly what the conditional association between two variables is. And so we focus on estimating those unknown parameters. This is also true, of course, in SEM, but in SEM we have an additional focus which is on fixing or constraining parameters to particular values before we estimate our model. And that's a bit unusual for many people. So, we can fix model parameters to any values, but it tends to be the case that we will be fixing parameters to be the value zero or the value one. Those are the most common parameter constraints that we make in SEM, and I'll come back to why we do that later. But we can also, in addition to fixing parameters to these values, we can also constrain model parameters to be equal to other model parameters. So, we will still estimate those equalized parameters, but they have to be estimated so that they are the same. The model applies that constraint on what the parameters are that are estimated. So, again, that's something which is quite unusual and we don't really see that in many other statistical techniques that we might use in social science. The main thing that we are using these parameter constraints for is for the purposes of model identification, and I will be saying some more about that soon. Now, I said that we can use the likelihood of our model to test how well it fits the data by comparing our model with another model. Now, when we do this, the two models that we compare have to be what is called nested, one within the other. So, what do we mean by nested? Well, it is precisely this that the one model is a subset of the other, or the parameters in one model are a subset of the parameters in the other model. Another way of saying this is that if we have two models, A and B, then model A is the same as model B, but it just adds some additional parameter restrictions. So, A is B plus parameter restrictions. To take an example here, then, if model B has the form y equals A plus beta 1 x1 plus beta 2 x2 plus E, then model A will be nested if it has that same structure, but it applies a parameter constraint to the two beta coefficients that they are equal. So, we now have this property that model A is the same as model B with an added parameter constraint. It is therefore nested within model B. If we consider a third model, model C, though, and we now remove x2 from the model and we add z2 instead, then model C is not nested within model B because it isn't just model B plus some parameter restrictions. It has a new variable, z, which is not in model B. So, these are, if you like, Apple and Pear models. We can't really, in any sensible way, make comparisons between the fit of model B and of model C because they include different variables. So, I've said something about model fit already and the fact that it's based on the log of the likelihood of the model that we've estimated and that we can do this comparison of model fit when the two models are nested. This is because if we take the log of the likelihood for model A from the likelihood for model B, or the log of the likelihood, then that is itself. That number is itself distributed as chi-square and the chi-square distribution then has a degree of freedom which is equal to the difference in the degrees of freedom for model A and model B. We can therefore use this chi-square distribution to test the fit of the first model to the second model. Now, if our value of chi-square has a p-value greater than 0.05, then we will prefer the more parsimonious model, model A. Because what we're saying here in this situation is that the models are not different with regard to one another's likelihood values. We say that the likelihoods are essentially the same. That means that we will prefer the model that is simpler and is estimating fewer parameters. So, where in this case that model B would be our observed data, the variance-covariance matrix, then we're saying that there is no difference between the observed and the implied matrices and our model therefore fits the data well. So, that's the essence of the assessment of model fit using chi-square in structural equation models. We can look at the difference in the likelihood for one model and compare it to the likelihood for a nested model and make a statistical test of whether one fits the data better than the other. So, the last thing I'm going to talk about in this video is model identification. This is all linked with the things I've talked about already to parameter constraints and fixing parameters to particular values and to assessing model fit and so on. So, what is model identification? Well, in conceptual terms, we need to have enough known pieces of information in an equation to produce unique estimates of unknown parameters. Now, we need unique estimates, otherwise we don't know which ones to prefer. So, to give an example of what we mean here by the balance between known and unknown pieces of information, if we look at these two equations, the first of these is unidentified. We have x plus 2y equals 7. So, what we would want to do is to find the unique value that satisfies that equation for y. Now, because we have a balance of knowns and unknowns where x and y could really take on many, many, many different values and they would all be true if you like in terms of the equation being correct, that equation is unidentified because it doesn't enable us to produce unique estimates. Now, if we change that equation slightly where we now are not having x as an unknown and we make that 3, then we can only have one value for y, which is 2, in a way that will satisfy that equation. So, that equation then is identified. Now, that is the essence of what we need to understand about identification is that it's to do with the balance between the number of known and unknown pieces of information in an equation. Now, there is something else to know about identification which is that it's a theoretical property of the model. It's not really linked to the data as such, so we can figure out what the identification status of a particular model is without having any data or estimating any parameters, but it's also true to say that a model can be theoretically identified but empirically unidentified given a particular set of data. So, we are looking at the balance between the known and the unknown pieces of information in our equations and in SEM, the known pieces of information are the variances and covariances and means, if we are using means in our model, of the observed variables. These are the known pieces of information. The unknown pieces of information are the parameters that we want to estimate in the model. Now, models can have different identification status. A model can have, as we saw a moment ago, more unknowns than knowns. That means that it's unidentified. We can't produce unique values for the unknown parameters, so that's an unidentified model. Other models can be just identified where the number of knowns is equal to the number of unknowns. We don't have any what we call over-identifying restrictions on the model, and therefore, for just-identified models, we don't have any likelihood for the model that we can use to assess its fit. Now, most of the models that people are familiar with using, again, ordinary least squares regression, those kinds of models are just-identified. The third level of identification status is over-identified models, and that's usually what we are trying to get to and deal with in SEM. That's where the number of knowns is greater than the number of unknown parameters in the model. That means that we can assess the fit of the model as well as estimating the unknown parameters. There are different ways that we can assess the identification status of a model. A very simple one these days with modern computers is simply to run our model, and most software will tell us what the identification status is of the model even before we fit the data. It's quite easy compared to how things were done in the past, but nonetheless it's still useful to have a consideration of the identification status of a model as it helps us to understand where things might be going wrong if we have a problem and our model is unidentified. Working through it in this way can help us to see why. Here's the accounting rule that can be used where if we have S as the number of observed variables in the model and the number of non-redundant parameters is given by this equation which is a half of S times S plus 1. Again, S is the number of observed variables and T is the number of parameters that we are going to estimate in the model, the number of unknown parameters. So if T is greater than the answer to this equation then our model is unidentified, we have more unknown parameters than we have non-redundant parameters and if it's less then we have an over-identified model. So to give an example of that here is the path diagram that we saw earlier where we have eta1, a latent variable which is measured by or causing three observed variables and each of those observed variables has an error variance. So if we want to find the number of non-redundant parameters we can use our half S times S plus 1 equation. Now we have S here is equal to 3 so S times S plus 1 is 3 times 4, that's 12, if we take half of that it gives us 6 as the number of non-redundant parameters. Now, how many parameters are we trying to estimate with this model? Well, three variances, one for each error term. We've got two factor loadings, one of them you'll see there is constrained to one so we're fixing that loading and that is for identification of the model. So we're not estimating that factor loading but we are estimating the other two. So we have two factor loadings and then lastly we have variance for the latent variable. So 3 plus 2 plus 1 is 6 parameters to be estimated which is the same as the number of non-redundant parameters. So we have with this model 0 degrees of freedom. The model is just identified so we can estimate the unknown parameters but we do not have any way of assessing the fit of this model because it's just identified, no degrees of freedom. Now something else that's important to understand about identification is that we as the analyst can control to some degree the identification status of our model. So we can do this for a model like the one that we just saw that's just identified or a model that's under identified by adding more known pieces of information to the equation or by removing some unknown pieces of information removing parameters that are to be estimated and adding constraints. So if we were to constrain two of the parameters in the model to be equal to one another, let's say we constrain two of the regression coefficients or the factor loadings to be equal now we're only estimating one parameter where previously we were estimating two. So we've removed one unknown and gained one degree of freedom. Now we can see this in this model here where we have added an additional observed variable to the previous path diagrams and now the model is essentially the same but we've got a fourth observed variable X4. We now are estimating an additional factor loading and an additional error variance but we have gained more in terms of our known parameters. So now if we use our half s times s plus one equation s is now four so four times s plus one four times five is 20 we take half of that now we have ten non-redundant parameters in this model and we have four plus three plus one parameters to be estimated eight so ten minus eight gives us two degrees of freedom so adding that fourth observed variable our model is now over identified and we can say something about the fit of that model to the variance covariance matrix that we've observed. So in that example we changed the identification status of our model by adding in another known piece of information another observed variable of changing the identification status is to remove unknown parameters. Now here in this example we are now not estimating the two factor loadings that we were in the first example so you can see there's a number one next to each of the arrows for the factor loadings so rather than estimating those we're saying these are all equal now this may not be a very theoretically meaningful thing to do this isn't the point at this particular juncture what we're showing here is that you can change the identification status of the model by removing unknown so we're not estimating these anymore so we still have six non-redundant parameters but we now are only estimating four unknown parameters because we are not estimating any of the factor loadings so now this model is over identified so in this video I have covered some of the important ideas and concepts that learners will need to take with them into later videos and applications these are focused around the use of path diagrams for representing our theories and our equations the fact that we analyse the covariance matrix of the observed variables rather than the raw data that we use for the most part maximum likelihood estimation which has some quite restrictive assumptions about multivariate normality but nonetheless is a very useful estimator it gives us consistent, unbiased and efficient estimates of the unknown model parameters and allows us to do global tests of the fit of the model to our data those kinds of fit tests are mainly applicable in the context where models are nested where we can say that one model is a subset of a second model that is the same as the second model with some additional parameter restrictions and I've talked about identification of models and models that can be under identified just identified and over identified and how we as the analyst can exert some control over the identification status of our model by removing unknown parameters or adding in more known parameters