 Hi, this is Dr. Justin Estri and this is week six of Polypsi 506, Bayesian Computational and Non-Pyramentary Statistics. And this week we're going to talk about measuring latent characteristics from observable behavior using item response theory in the Bayesian context. So the first thing I'm going to do is talk a little bit about what the problem here that we're trying to tackle is. The problem we're trying to tackle is how to measure something that we know exists and that influences a behavior that we can see, but that we cannot directly observe. There are lots of examples of this in a variety of fields, including in political science. So the archetypal problem that item response theory was originally designed to solve was measuring learning or learning ability or some kind of learning skill. So measuring a skill such as mathematics ability or reading ability using responses to a multiple choice test instrument. So the problem here is that we know, we can see how many questions a student gets right on a test and we know that that has something to do with how good they are at the skill we're testing like math or reading or what have you. But they're not directly related necessarily to the number of questions you've got right. Some questions are easier than others, for example, and so some are better at discriminating ability than others. Some are so hard that nobody gets them. Questions measure different sorts of skills. So for example, some questions are almost purely math skills. Some are combinations of math and reading ability like word problems. There are other analytical type skills that may come in there. And so we want to recover that skill data out of a test instrument, a reset of responses to a test instrument. And this is actually the problem that IRT, item response theory, which originally designed to solve. But since then it's been applied to many different problems, among them how ideology informs voting behavior in legislatures. So for example, we know that the congressman vote according to a whole bunch of different considerations, probably one of them is what they believe, their ideological beliefs. But we can't necessarily read their ideology straight off their voting records for a variety of reasons. But we might want to try to recover that ideology from their voting patterns in some way. And it's a problem analogous to the testing problem. And if it works from one case, it might work for another case. Another related problem to which this technique has been put in political science is recovering Supreme Court justice ideology from case decisions. This highlights a particular theory in the court's literature, attitudinalism, that says that Supreme Court justices and other judges, but particularly Supreme Court justices, make decisions according to their ideological beliefs and how cases will impact policy in terms of how they will benefit or harm their ideological interests. It's not the only game in town. That's not the only theory of how judges behave. But if that theory is true, then we should be able to look at their voting records and recover their ideology from their case decision voting records. The Supreme Court, as you know, has a certain number of justices on it, and they make decisions by majority rule. So if we look at their votes, if you can call them votes in these cases, we might be able to recover their ideology information. And a number of studies have attempted to do that. The Supreme Court justice case highlights the idea that all of these measurements are theory bound, which is to say that the measurements are recovered by assuming that people are behaving according to a certain theory, and then measuring the characteristics that are consistent with behavior inside of that theory. So these measures that we're going to produce today, and we'll talk about this a bit later, they involve theory. They're not theory free. No measurements are probably theory free, but these are particularly bound to theory. And that puts some limitations on how they can be used. You can't use item response theory based measures to test a theory of how people make decisions that's sort of driven by the latent characteristics. So in other words, I can't measure something assuming a theory is true, and then use the measurement to test the theory. So these are not, these measures are not magical, you don't get something for nothing. And at the end of today's lecture, I'll talk a little bit about the limitations to which that these measures face. But inside of those limitations, they can be extremely, extremely useful. So like I said, we're going to look at a bunch of different cases like votes cast, yes or no survey questions, answered Supreme Court justice decisions. And what we're going to do is basically, well, I'll leave that to the modeling section. What we're going to do is we're going to pick the latent characteristics of both the people and the decisions they're making that are most consistent with the observed patterns we see in their decisions. So what we're going to do is basically say, like I just said, assume a certain theory of decision making is true, now pick the characteristics of the people and choices, vote choices or Supreme Court cases or whatever that we see that seems to be most consistent with the data we've observed. I mean, it sort of should evoke some memories of how one picks parameters in maximum likelihood type scenarios. In this case, we're going to use Bayesian methods, although one can use maximum likelihood methods on item response models. Like I said, the technique we're going to cover is only one particular technique of latent class recovery or latent class measurement. There are many, many others, factor analysis, k-means, clustering, all kinds of stuff. This is just one particular take on measurement, but it's a take with some interesting applications and that's found widespread use in political science. So without further ado, let's get started. All right, so the first item response theory based model we're going to talk about is the rash model, also called the one parameter model. And the rash model works according to a fairly simple spatial logic that actually can be linked and is linked in an article by Clinton, Jackman and Rivers to spatial decision making via utility theory. The idea behind the rash model is that there's a latent dimension. We can't see this dimension, but it exists and it influences behavior. And for this particular example, I'm going to label this dimension ideology. And we've got two things that we are going to rank on this scale. One is decision maker. So we're going to say something like a congressman. Each congressperson has an ideology and we're going to denote those with X, like so. And obviously there's far more than three congressmen, but we're just going to stick with three for this example. And then they have votes that they have to take on various issues. And each one of these votes corresponds to a cut point, delta. And I'm just going to consider one, or actually I'll make two votes in this particular example. There's two votes right there, delta one and delta two. Now the way the rash model says the decisions get made is that each person, so like congressman X1 here, looks at their own position and the position of the vote that they're deciding on and says, well, should I do this or not? And negative dimensions, negative distance between the one own ideal point and the location of the vote cut point mean that you shouldn't do it. And positive distances, like the one between delta one and X2 here, mean you should do it. The greater that distance, the more you benefit from saying yes, and the greater the negative distance, the more you benefit from saying no. So the location of the vote forms a cut point where people on one side should say yes and people on the other side should say no. Behaviors, I should actually make a note that this is votes right here. This is like vote cut points here. So these are like decisions that have to be made and decisions. Now we could say this is a true cut point decision making. Everyone on the left-hand side, for example, says no. Everyone on the right-hand side says yes, says yes. But probably decisions are really probabilistic, not cut point. So maybe what we'll do is we'll say, all right, your probability of making a decision. So for example, the probability that you say yes to a particular vote, you vote yay on a vote, is given by some lean function. I'm using the normal here, the normal CDF, the probit in other words. And that's going to be a function of your distance to the vote cut point distance. That distance is positive. You have a greater chance of making a yay decision if a vote distance is negative. You have a lesser chance of making a yay decision. And with no normalization, you can see that right at zero, so at the zero point here, you'd have a 50% chance of saying yes. As that index declined, you'd have a lesser chance of saying yes. And as that index increased and became positive, you have a greater chance of saying yes. So what we have here is this is PR yes. And this is the X minus delta index. And that function right there is recognizable as the normal CDF. So that's the simple rash model. You map positions into decision probabilities. The positions are first of the decision makers and secondly of the indifference points of the votes. Now I should say here, this is a little different than I should say that the location of the decisions is a little different than saying it's the location of the proposal in ideological space. Delta is not the position of the proposal. It's the position of the indifference point or the cut point between the proposal and the alternative. And usually the alternative in a voting situation is like the status quo. If a bill fails, then the status quo obtains. That cut point, it could be located between many different points in space. So for example, it could be the case that the bill is located here, that's the bill, and the status quo is located here, that's the status quo. And midway in between these two is the cut point between where I'm indifferent between the two. But that same cut point would exist if the bill were a little bit to the left, and if the status quo were a little bit to the right. In other words, we can have the same cut point for different pairs of bills and status quos. In terms of the decision model, all that matters is where the cut point is, where people are going to be indifferent between saying yes and saying no to this binary decision. But just don't try to get confused between thinking that Delta is the location of the vote in ideological space. It's actually not. It's the location of the indifference point or the cut point in ideological space. So that's really all there is to a one-parameter rash model. Now there are different ways of writing this model. I've written this in such a way that you've got X1 and Delta1 as separate estimable parameters. Occasionally, people write this in a slightly different way. They write it as the probability of a yes is equal to the logit of beta times the quantity X1 minus Delta1. That's actually equivalent because we can show that this transformation is robust to expansions or contractions of the space of decisions within certain parameters. In other words, the beta is actually separately identifiable. But the model is sort of, these two models are equivalent. This leads naturally to questions of identification. So instead of talking around this issue, let's just jump right into talking about identification constraints in the rash model. So you've seen the rash model and there's a problem with the rash model, not an insurmountable problem but something nevertheless to be thought about. The rash model is technically unidentified. What do we mean by unidentified? What I mean by unidentified is the statistical sense of identification, wherein each parameter has to have a unique solution, in maximum likelihood framework it would be a unique maximum likelihood solution in any particular data set. Another way of thinking about it is that you shouldn't be able to change two parameters at the same time and end up with the same likelihood or probability for all observations in the data set. The rash model is unidentified because in a sense the spatial model has an identification issue. So let's just consider a case where we've got a couple of different, here's two different people in the space and then one decision, delta 1 here. And the relevant criteria for whether these people decide to choose yay or nay for item delta 1 are going to be the distances between the ideal point and the cut point, the indifference point for this vote delta 1 here. Now imagine what would happen if everything in the scale was moved over by a constant amount. We'll call that amount, let's call it alpha, so we're going to move everything over by alpha. So now everything is in a different location than it was before, x1 prime, x2 prime and then there's green here, this moves over by alpha, this becomes right there, delta 1 prime. All the distances are the same as they were before. The distance from x1 prime to delta 1 prime is still d1, the distance from delta 1 prime to x2 prime, I'm a very bad artist, that should be d2. In other words, the key arguments that enter into the probit function, remember the probit that's going to be used to calculate the probabilities of voting yay or nay are going to be functions of each person's ideal point and the particular location of the cut point for this vote, that's a distance right there, that distance is going to be the same in either of these two locations, the original location or the prime locations for all these items. Now this causes a problem because when we run our Bayesian model or if we're doing a maximum likelihood version of this model, we need there to be a unique solution to the problem. Otherwise, what will happen is the Markov chain or the maximum likelihood algorithm will not be able to discriminate or choose among all these different potential solutions and so it will get lost and you'll get results that don't necessarily make sense. Fortunately, this is a one parameter model, remember this is a 1PL, one parameter latent variable model or the rash model, we can force identification of the parameters by simply specifying priors for example the x individual locations. What that does is it anchors the position of the vote locations in space, so for example if we say alright here's a zero point and all the x's are going to have priors, they're sort of located somewhere in this space around a normal with mean zero and standard deviation whatever, standard variance one or something, that's going to force all of the x parameters to stay in this neighborhood as you're sampling them with the Markov chain Monte Carlo process and that's going to sort of cause them all to fit in that space and not wander far off into outer reaches of the space, so it's also going to tend to concentrate them around the prior. That will allow for identification at least in the one parameter model or the rash model. Another way of going about this, which will also work and which will make better use of in multiple parameter models, is you can force certain either votes, I should say votes or people, deciders in space, you can set a few of them to be fixed values. So in the one parameter case, here's what that looks like. So here's the latent variable, the latent class that we're trying to recover so this could be like ideology and we don't know where everyone goes in this space but we're going to say okay I'm just going to fix, let's say I've got 20 people, so x, you know, oops, that's not good, each element, there's a whole set of these, x1, x2, blah blah blah, up to x20, I'm just going to say x1 is located right here at say 0 and x2 is located right here at 1, I'm just going to fix those points in space. Every other person is going to be defined relative to these two points, so for example if x3 is somewhere in between x1 and x2, he's going to be positioned according to his relative location between x1 and x2. If x4 is more rightist than x2, then he'll go somewhere out here and so on. If you fix x1 and x2 in space, what that does is it allows relative location or relative, it solves the identification problem by identifying all the other people relative to x1 and x2 in the one dimensional space. The rule in this case is that you need to have at least one, you need to be able to define, in other words, have technically a basis space in the ideology, in the space you're trying to recover latent dimension is from. In this case, we've got a one dimensional ideology, so we need two points fixed in space to create a line in that space which then everything else will be defined around. You'll see that when we do a two parameter model, we're actually going to need one additional person, we need three people to be defined because that sets up the, that allows us to define everyone relative to those three people in a two dimensional space. So the long and short of it is that the rash model is technically unidentified, but you can identify it with a few simple assumptions and if you do that, everything should work fine. Now what I want to do is show you how one might estimate a one dimensional or one parameter, one PL or rash IRT model in R. There are a couple of different ways we can think about doing this. The first way I'm going to show you is by writing an IRT model in WinBugs or OpenBugs or Jags and then estimating it directly on some data. So what I've done here is I'm loading up Jags, I'm going to estimate this model in Jags. And what I'm going to do is create some fake vote decision data. And so I'm picking some 20 ideal points, I'm going to have 20 people in this example and 100 votes and I'm picking the mid-point or indifference points for each votes. I'm going to store probabilities and votes in matrices because I'm going to compare these later to what I'm able to recover out of the vote data. I'm going to use the vote data to recover ideal points and I'm going to see whether I can correctly predict voting probabilities by storing the true probabilities in this PR matrix. So what this code does is just generates the ideal points and vote midpoints and generates some data very quickly. And you can see if I just do the head of this vote matrix, what I have is 100 different votes and 20 different people and each person's ye or nay vote is recorded in the matrix. And the probability of them saying ye or nay is proportional to the distance between their ideal point and the vote midpoint as given by the normal. And there's the normal right there. That's the p-norm, the cumulative normal density. Now what I'm going to do is I'm going to try to estimate, I'm going to try to recover the ideal points that I generated in JAGs. And so here is a bug file that I'm going to run in JAGs that implements a simple one parameter rash type IRT model. And what you can see here is all I'm doing is just modeling each vote decision in the vote matrix which I'm calling y here according to the Bernoulli density with probability given by IJ where probability is the loge of the difference between theta and delta. Theta is a personal or a parameter that adheres to the person. Hence it gets assigned according to I. And delta it adheres to the vote decision. It's an estimate of the vote midpoint. So it adheres to J, which is the vote. We've got n equals 20 people and m equals 100 votes that we're going to cycle through in calculating this likelihood. Now I'm assigning priors to theta. Just simple normal priors, mean zero and standard deviation are actually in this case precision point one. They're fairly diffuse. That's a one over 10 precision, so a 10 standard, that's 10 variance, sorry, square root of 10. So five square root of two standard deviation. And my priors on votes are going to be given by common hyper priors here. And I'm setting the common hyper priors to be just fairly diffuse mu and tau deltas. That's going to cause the deltas to come out of a common distribution with a common mean and a common tau. But I'm not saying I don't know what they are. I don't know what the characteristics of that common distribution are. So I'm going to save that and then just use my usual JAG techniques, which we've discussed in previous lectures, to generate some samples. And you'll see that in this case, the data set's not too huge. It doesn't take the JAGs too long to get a good number of samples out of my Markov chain. In this case, I'm doing a really quick, basically a total of 3,000 runs, discarding 1,000 of them as burn-in and then keeping 2,000. All right, so now I've got my IRT samples. And what I'm going to do is calculate the posterior means for all of the theta parameters. This is the individual ideology parameters that I'm trying to recover. If you look at, I think it's call names, theta dot s, you'll, it must be, oh, I'm sorry, call names, IRT samples, list object number one. You'll see that it saves not only, it has a chain for each one of the 100 vote midpoint parameters. It's saving those. I'm only interested in the thetas for right now. So I'm just going to basically hive off those those last 20 theta print chains and then take the means of those to figure out what my estimated posterior mean ideal points are. And now I'm going to plot those estimated ideal points against the actual ideal points. And as you can see, I'm doing great. The correlation, I bet it's pretty close to 0.95 actually, theta dot s, ideal dot points. Damn, 0.99, I'm doing great. So my model is accurately recovering the ideal points. You can see the scaling is different. I've actually got a different scale from estimated thetas. But remember, the rash model is not identified. It only identifies ideal points up to a scale. And I set priors in my bug file that identified the model, but it does, there's no guarantee that that'll produce the ideal points on the same scale. But it doesn't matter because as long as my deltas are defined relative to these estimated thetas, all the probabilities will be the same, or at least close enough for government work. So now what I'm going to do is recover those deltas and use this to actually, you can see that if I do a plot of the estimated deltas against the true deltas, I'm getting pretty close to the deltas as well. As you can see, I'm estimating them quite well. The correlation is pretty high. These are the true midpoints, these are the estimated midpoints. Now what I want to do is see whether I'm accurately recovering probabilities. So what I'm going to do is just run through that code I had before to generate the real probabilities. But instead, I'm going to use my posterior mean estimates to generate posterior predictions of a very simple kind. I'm not really factoring in the uncertainty in my estimates of the ideal points that I've delta into this. I'm just taking posterior means of the theta, posterior means of the delta to get a real quick and dirty sense of whether my estimated probabilities are in any way good guesses about my true probabilities. So what I'm doing is just creating a matrix of estimated probabilities, making the relevant calculations here, and then plotting the true against the estimates. And here's the, this is the true value here on the x-axis here's, I'm sorry, here's the true value on the y-axis, the estimated value on the x-axis. You can see that we're doing pretty well. So where we're falling down, in so much that we are falling down, is that some of our estimates of zero actually correspond to higher probabilities than we've estimated. That's probably because there are cases where we don't have enough information to, in other words, we don't see enough of the data set to recognize that a probability of 0.1 is actually different than a probability of zero. You have to, with the rarer that event becomes, the more data you have to observe in order to distinguish the rarer event from the non-existent event. So just eyeballing it, that seems to be what's happening here, but in general it seems to be, seems to be doing, seems to be doing okay. So I wanted to show you a quick example of how you can fix certain voters in space and use that as an identifying assumption. What I've done here in IRTA, which is this variant of the file we just ran, is I've set the first and last values of the theta, which is the individual specific parameter, to be equal to an initial value that is going to be sent to the bug file. And that initial value, I'm going to actually send the true value of the ideal points as a parameter. So this is the case where I actually know the ideal point, so I can, I can set them to have exactly the same scales. I've also changed the other parameters to have a very flat distribution, so we're taking out any kind of identification that was provided by the prior in the past, at least for theta. If I run this model and then I look at the relationship between estimated and true thetas as I did before, I should find that they're very closely related. And once it gets done here, we should see that that's the case. Sample. Bam. Okay. Ah. Look. Very closely related. Now you can see the two of them, here's the two identifying points, and they're actually a perfect match. The rest are not exactly perfect matches, the perfect match would be right along or it would be pretty close to this line, but they're still highly correlated. And all the identification, all the correlation is being provided by those, by that identifying assumption, or by those fixed points, not by the priors as they were before. So that's an example of how you can provide an identification by fixing certain actors in space. And that's what's commonly done, for example, in estimating legislative ideal points where you fix Ted Kennedy on the extreme left and Strom Thurmond on the extreme right in the Senate, and that enables you to identify the rest of the Senate relative to those two actors. I should say also, you don't have to pick the most extreme two actors. You try to pick people that are far enough apart that you get a good amount of identification provided by those people, but you don't have to pick the most extreme person, people in the dataset in order for it to work. So one-dimensional model is great, works fine, but for a lot of the things that we care about, the characteristics that we're after are multi-dimensional. So for example, ideology is a really good example of this. Let me give you a, for instance, so in many of the latent variable models of congressional roll call voting over the years in the United States Congress, it's been found that there are typically two dimensions to voting, two dimensions of ideology that are recovered by a voting. And those differ over the years as to what they might mean in terms of their interpretation. So far, we've sort of been able to, in a fairly simplistic way, say, well, we recover this dimension in voting. It must be something like ideology, but when you recover two dimensions, then you have to sort of think about what those dimensions are and how they partition out whatever is being used to make the decision. And so for example, one might be straight up left-right ideology in the traditional American partisan sense, and another dimension that influences voting might be something like civil rights. This happens in, for example, the 1960s where there are blocks of voters who differ on economic issues, but also different blocks of voters that differ on civil rights issues. In the modern context, this is typically not exactly what we're after. Most of the civil rights issues, at least in terms of the racial civil rights issues have already been decided, and remaining civil rights issues are polarizing, but probably collapsible into a larger realm of what we might consider social ideological issues. And those are distinct from economic ideological issues. And we might see different blocks of voters in different votes taking different positions because of their positions in these two matrices. So what would a two-parameter model look like in this case? Well, we'd have people just like before that are in this space, X1, X2, and so on, X3, put a third person in the space. And there are going to be zero points for voting bills. So here's a bill whose zero point is at delta one. But now when somebody goes to make a decision, they're still going to compare themselves as they did before, but instead of comparing themselves just to delta one, they're going to compare themselves to whether they're to the right or left of the indifference point. So delta one, if I'm interpreting this correctly, will be the point at which both options give a utility in a sense of zero. The cut point line, though, has many other indifference points, and that's because it's possible to be indifferent between two policies, both of which you lose, or you're worse off, or you don't like them, but you dislike them both equally, or you can even like both of them, but like them both equally. They can sort of both be good for you. So the cutting line gives all of the different positions, ideological positions, that will be indifferent to this particular vote, this binary vote choice. They're indifferent to both options in the vote choice. And so what we need to do in the multi-dimensional case is locate delta one, but also locate that cutting line in space, and then finally locate all of the voters. So what this all boils down to is we're going to estimate a whole bunch of parameters, many more than we did in the previous case. First let me write down the model and then sort of give you a sense of how this equates visually. So in this case the probability that any particular individual votes yes or in favor, and we're just arbitrarily saying one side of the line is yes and one side of the line is no, is going to be again given by some kind of link function I'm using the probe in this case. But now for every vote we're going to have a slope parameter on each dimension and a location parameter for each dimension plus beta two, or I'm sorry, beta one two x one two minus delta one two. These are not squared terms incidentally. These are just superscripts indicating two, a dimension two. So these are the characteristics on dimension one, and these are the characteristics on dimension two. We'll call this dimension one and they'll call this dimension two. X one, superscript one is just this person's location on the first scale, which in this case we've noted economic ideology. And the second is their location on the second scale, social ideology. And the same holds true for delta right here. And as before, the distances in both directions, so in this case this distance and this distance determine whether a person is going to vote for or against a particular object or a particular vote choice. But they may weight these things differently. And in particular, every vote choice might have different characteristics that make it more or less sensitive to these two ideological dimensions. So consider for example a one line bill that just says all abortions will be banned. That kind of bill is likely to be mostly social in nature, doesn't have a lot of economic implications. And so we would expect its cut line to be fairly low sloped in the sense of it tends to discriminate strictly on the basis of social ideology. We can imagine tax bills might be more vertically, so this we'll call this the abortion bill. And this is like a pure tax bill, something that really cuts on economic ideology but doesn't really have much of a people in different social ideologies, doesn't really sort them very well. And then of course many other bills like this prototypical green bill here have dimensions, have implications for both. And we want to identify those characteristics for each bill. And so that's where the beta comes in. This beta coefficient is particular to each bill. So that's why it has superscript one. That superscript one is just like the delta superscript one, it indexes the bill in this case, not the person's bill number one. And the slope of this line will be equal to the relative ratio of the two sub-batas here, so like beta one, two, beta one, one. The bigger that beta two is the more weight, the more you have to compensate on dimension two in exchange for movements in beta one. So for example, if beta one, one is really, really small, okay, so that's a small number, that's going to make the denominator very small and thus it's going to make this slope really, really big. And so what that's going to do is it's going to mean that, oh actually I think I got this wrong, I need to reverse this. Yeah, that's right this way. Beta one, one over beta one, two, there we go. So if beta one is really big, that means that the numerator is very big, which means that this slope is really big, which means that you need a lot of compensation, it takes a ton of compensation in the social ideology dimension to make any difference for how someone's going to vote because this bill is primarily economic. If beta one, two is really big, that means that the slope is really small, really flat like this, like the abortion bill, and that means you need a lot of compensation on the economic ideology dimension to make up for any changes in the social ideology dimension if you're going to vote for this bill because it doesn't have much of a social component to it. Or I'm sorry, it doesn't have much of an economic component to it, it's mostly a social bill. So we're going to estimate the beta parameters, the delta parameters for each vote. So for each decision, in this case, votes, we're going to estimate a beta parameter for each vote J, and a delta parameter for each vote J. And we're going to estimate, you can think about this being the number of dimensions, this is a vector of dimension D. So if D equals two, we're actually estimating two of these for each dimension. And then individuals, decision makers, so legislators, if you want to think about it that way, we're going to estimate an X, I for each individual. And again, this is going to have dimension D. So this is a so-called two-parameter model because you'll notice down here we're estimating two sets of parameters for the decisions for the votes. So hence this is a two, it's called a two-parameter model. So as it turns out, the two-parameter model is also unidentified. For similar reasons as the one-parameter model is unidentified. But the problem is a bit more severe in the sense that normalization or mere priors won't necessarily identify this model. What you need to do instead is to fix a certain number of people in the two-dimensional space. So for example, we could fix a person at 1, I'm sorry, 0, 1, fix a person at, you know, negative 1, 0, fix a person at 0, 1. This creates a triangle of space inside of the dimensionality that will allow us to identify all the other people in the space relative to these three particular people. And so what we're defining by setting up these three people in the space is, in essence, the scale that we're going to use to estimate the IRT model. As long as the scale is correct and as long as all of the vote midpoints and cutting lines are defined relative to this scale, we'll be able to appropriately recover the probabilities. Now you can see I've got an example of this in my IRT two-dimensional case. What I've done here is I've taken that IRTA model and I've added a dimension, so I've got two dimensions now that I'm estimating for my latent utility. But I'm defining three of the theta variables as initialized values. And so those are just going to be fed to the algorithm as in its. And then the algorithm will just fix those in space and not try to estimate them. Everything else is pretty much as it was before in the previous model, except that I've got two dimensions now, so I've got twice as many hyper-prior parameters as I did. I'm also estimating these betas, which I didn't before. I've got to see these beta coefficients on the thetas there. That gives the slope in some sense of the line. So I'm going to estimate those things separately now. I could actually multiply the beta times the delta here as well. I'm setting up this model to not do it that way. Ultimately, that's not going to matter because if I were to multiply it by the delta, the delta parameter would just adjust accordingly. It would just rescale itself down. So now going to my R file, you can see I've got, let's see, where am I? Ah, right, that's the one. Here we go. Here's a two-dimensional example. So I've got my beta coefficients I'm setting up. I've got two midpoints and two-dimensional ideal points that I'm establishing. And then my model is constructed according to the IRT model. In other words, the data I'm generating, this is fake data that I'm generating. I'm generating that according to the model that I established. So if I go ahead and run this, I should get an appropriate data set, vote data set. So if I check on head vote here, it looks like I've got a set of votes. And then what I'm going to do is just estimate my model using JAGS the same way I did before. And it only takes a little while to update a little bit longer because it's a slightly more complicated model. You can see it's taking a little longer to initialize. And I'll just pause the video here and then pick it back up once the model has completed estimation. Okay, so that's finished estimating. So now what I want to do is I'm going to look at the names of this sample object that I created. So if I do names IRT samples, you can, oh, it should be call names. So you can see I've recovered a bunch of Delta parameters and a bunch of Beta parameters and a bunch of Theta parameters. So I'm going to focus on these Theta parameters. So I'm going to extract, whoops, extract just those Theta estimates. Again, I'm taking the posterior means of the chains for these Theta, these estimated Thetas. And then I'm going to plot my Theta estimates for the second dimension against my Theta estimates for the first dimension. So this is going to get me the two-dimensional estimated ideal points that I've recovered using this process. And now I'm going to plot the true ideal points over top of them using different symbols. Bam. Doing pretty well. Now you can see I'm doing perfectly on three of them, like that one right there. I'm doing perfectly on that one. I'm doing perfectly on that one right there. That's actually because, not because I'm awesome, it's because those are the ones that were fixed in initial values. You'll notice I had to fix three initial values, the Theta and NITs right here. I set them to be correct, to be true. So I'm nailing those exactly. All the rest of them are getting nailed to greater or lesser degree of accuracy. And if I look at the correlation between the true values and the estimates, I'm getting correlations upward of, you know, 0.9, 0.95. These are pretty good correlations. So I am doing a pretty good job of recovering these estimates. And if you want to see directly, I could do a plot of ideal points, say 2, which is the second dimension, against my Theta S Matt 2. And you can see I'm recovering those pretty well, not exactly on the same scale, although pretty close. And I'm doing a pretty good job of recovering those pretty accurately. So my two-dimensional model works. That's kind of nice. Now I promised you that there would be another way of estimating these models. And there is. What I'm going to do now is show you the PSCL package, which was put together by Simon Jackman. The PSCL package has a method of estimating IRT models in multiple dimensions, and also has some interesting data that you can use to practice this particular method on. And it can help you estimate these models without having to go through the trouble of manually writing a bug file. So what I've got here is the PSCL package. That's the package that Jackman wrote. And there's some data in this. And the first bit of data I'm going to use is S109. S109 is a data set that Simon Jackman prepared of roll call votes in the 109th Senate. His data, his particular algorithm, uses a data format called a roll call. And if you look at a roll call, that's a class that is just a particular object in which yes, no voting data is stored. Now if you wanted to create a roll call object, now S109 is sort of already a roll call object. So I just loaded it in and it was already a roll call object. But if I wanted to create a roll call object, I could do so. So if I wanted to create a roll call object out of my vote, my previous vote object, I could just use the command roll call. And you can see I've created a roll call object using vote.rc, specifying the vote matrix, which you can see depicted up here, and saying that the yeas are one and the nays are zeros. And so now I've got a vote.rc object that I could use to analyze the Jackman's package if I wanted to. I've already written my own package to do that, or my own, not package, but my own program to do that. Instead, I'm going to analyze this S109 data that I've got. And I'm going to use the S109 verbose true to give you a sense of what's in this data set. So you can see, I'm scrolling up here, it's kind of a long thing. Here we go. So here's a summary by vote. These are all the different votes that were held in the Senate, the yeas and nays for each vote, and then also missing codes, people not in the legislature, and so on. And you can see a lot of the votes are sort of, here's a 98 to 0, 99 to 0. These are probably like, do we love America kind of votes. And then there are a lot of other votes that are closer, 40 to 60, 41 to 59, and so on. Here are the votes by legislator. And for example, you can see here, Jeff Sessions, the Republican Senator from Alabama, cast 34 yeas votes and 297 nay votes. Richard Shelby, the Senator from Alabama, cast 351 yeas and 284 nays, and so on. I actually don't know this, but I assume that Bush is actually tie-breaking votes that Cheney cast, although I'm not entirely sure who that is. I don't think President Bush was a Senator in the 109th Senate, but whatever, what do I know? Anyway, so what we're going to do here is we're going to fix three Senators in this space and then estimate ideal points in this space. So we're fixing Kennedy to have a negative one, zero ideal point. Senator Enzi to have a one, zero, so that's an extreme right Senator. And Lincoln Chafee, who's a moderate, we're going to put him as a zero, dead center in the left-right spectrum, and a negative 0.5, so we're sort of putting him on the second axis, whatever that may be. And this constrain.legis function generates the appropriate starting values, priors, and fixed points that are necessary to identify this particular model. And then what I'm going to do is just estimate this model using the ideal function. And I'm going to call up ideal here so we can talk a little bit about it in the help file while it's running. So let's just let that run. Ideal takes an object, which in this case is a roll call object like S109. You tell it D, the number of dimensions you wish to recover, it's latent dimensions you wish to extract from this data set. You feed it some priors, so in this case I'm feeding it the constrained priors that I fed it before. If you leave this as null, it will actually just create its own very diffused priors, but the model may not be identified. In fact, in many cases, it will not be identified in a two-dimensional model. The start values are actually also the CL2 object. The start values are the nice thing about that constrain.legis function is that it calculates priors and start values for you, so hopefully the chain gets off to a good start. Store.item, true, just tells the model that you want to not only remember the estimated ideal points for the legislators, but also the locations of the cutting lines, the cutting planes, which we can plot if we wish. No, it's done. Max Iter just tells it how many MCMC samples to take. Burn-in is a burn-in just like any MCMC chain and thinning is thinning, just like any MCMC chain. These are the, I believe these are the defaults. I may have shortened the burn-in a bit, lengthened the max iteration. I forget, I changed it a little bit, but these are pretty close to the defaults. So if I do a plot on the resulting object, which I've stored in ID2 constrained here, what you'll get is a two-dimensional plot highlighted by party. So these are the Republicans and these are the Democrats in light blue. Republicans are red and one independent in green. That might be Lieberman, not sure. And you can see that there's clear separation of the parties on both dimensions. However, the dimensions are not the same and there's probably, yeah, my new way, so that's what's happening here. If you want, you can also, there are just a ton of votes in this Congress, but if you want to overlay the cutting planes for all these votes, you can specify overlay cutting planes as two and then you'll get zillions of votes. This is perhaps a little more informative when there are fewer votes, nevertheless, you can plot those if you like. So what's nice about this is that it allows you to do IRT sampling pretty easily. It takes care of a lot of the dirty work for you. And I should say, even though that this dataset is a legislative dataset and even though this example is very legislative in nature, it's not the case that you always have to, that you have to use legislative data to use the ideal estimation command. This will work for any IRT model. In other words, anything you can put in an IRT model where the responses are one zero binary, yay or nay type responses, you can estimate an IRT latent dimensions out of that data using the ideal command. So all you need to do is just sort of get used to the legislative terminology and it can be a very, very useful package to save yourself some bug code time. All right, so I want to wrap up this hayride with a brief discussion of the limitations of IRT measures. You may have been really excited about the possibility of recovering these latent dimensions out of observational data, and it is a very exciting possibility. It allows us to measure many things that we suspect exist and are important and can't directly observe. But they're not magic bullets. And in particular, IRT models assume many things. They assume, in particular, the truth of particular theory of decision making in order to recover their estimates. You'll notice that the rash model and the two-parameter model, they recover betas, deltas, and Xs on the theory that these things are all related to one another in a very specific way that produces vote outcomes. That means a bunch of things. Firstly, it means that they can be sensitive to specification. And what I mean by that is the estimates are only as good as the model is. The model is a good approximation to the underlying data generating process. The estimates will very likely be good. If the model is a poor approximation to the underlying DGP, probably the estimates will be bad. Consider, for example, the possibility of strategic voting. All of the rash model, the one-parameter model and the two-parameter model, all assume that people are voting pretty much based on their location relative to the cut point line of the particular bill or vote or decision, whatever. If, however, legislators don't simply vote according to their own preferences, but they coalesce with their party or they strategically vote yes or no in order to influence later votes or if there's some kind of law-rolling going on. None of that is going to be captured in this model. And in fact, you could get misleading indicators of their ideology because the ideology estimates are recovered strictly based on the vote behavior. That's all we see is vote patterns. So what I mean by sensitive to specification is the theory behind the IRT process that you're estimating needs to at least be a decent approximation to what you think is actually going on. And if it's not, then you could have a problem. It's also important to remember that IRT measures by their nature can't be used for everything. For example, we can't use IRT measures to test a spatial model of voting because they're constructed using a spatial model of voting. So you can't predict Y with Y. You also can't use Y to make X that then predicts Y. So you can't use an estimate that's generated out of a theory of spatial voting to then test to see whether spatial voting is a good idea or is a correct description of what's going on. Along the same lines, you can't say predict vote choices using ideology estimates that are generated out of IRT models, at least if those particular choices were used to create the ideology variable. Now, I may be able to use voting patterns from previous years to estimate future voting patterns. That's a perfectly reasonable thing to do. But I can't, for example, use legislator ideology as recovered by a Bayesian IRT model and then use that as a predictor of congressional voting behavior. That's in essence using vote behavior to predict vote behavior. Because remember, IRT estimates are, in cap, they are vote behavior crystallized through the lens of a particular theory. So with those limitations in mind, IRT models can be very useful at generating estimates. For example, if we need measures of government ideology or we want to test some kind of theory that doesn't directly... If we can assume that IRT measures are true as a part of the test, then it's perfectly acceptable to test them. You just can't test things for which the IRT measures are themselves open to question whether they're part of the thing being tested, they're part of the thing at stake. All right, that's enough for one week. Thanks a lot, and I will see you next time.