 Well, I'm sure by now you all think I'm completely completely completely crazy But it's totally okay because I like lecturing on things that very few other people like lecturing on a specially inter-observer agreement So keep in mind we're gonna look at inter-observer agreement You may have heard this is inter-rater reliability, but we're gonna beat that out of you today So why don't you just step right up and take your lashings? Seriously, like stand up here and I'm gonna again and again I'm really not a violent person but on these videos it sure sounds like I am why because these particular issues need beat into your heads not literally, but Man, sometimes I swear. I just digress anyway. So yeah, these issues are important and I don't want to ever hear you really refer to This as reliability. There's a reason and we'll get into that. Okay. Anyway Because it doesn't say anything about accuracy or reliability when two people agree that something happened That doesn't mean it's a reliable finding. It just means that they agreed There's a lot of problems with thinking that that's a measure of reliability the number one thinking maybe maybe it didn't really happen Maybe they're just like one person's trying to please the other person. Maybe it's just agreement, right? Accuracy, maybe you're you both agreed that Calling the dog brown should never happen You're gonna replace the word brown with the word red so you call the brown dog red numbers Oh, look a red dog, but then reality the dog's current brown. It's not accurate stupid stupid example But think about it in terms of definitions of things like oh, I don't know Imagine the police coming up with a really weird definition for stopping like they have right You know just stop and go and you gotta stop and you gotta wait for three seconds And then you got to go they came up with this really accurate definition the specific definition to make sure that we all agree that a person Has stopped So they give you a ticket if you haven't stopped for long enough Hopefully if they're doing their job they give you take it or if you do really well Then they pull you over say hey great job at the stop sign. I'm really impressed You're exetting a good example for the rest of the citizenry If that's a word anyway, so I'm digressing again. I apologize anyway So interobserver agreement says nothing about accuracy or reliability just about how much two people agree on something Which is really about believability, which if you ask me is equally as important as accuracy and reliability Why because you could have the most accurate data in the world You could have the most reliable data in the world and no one will believe you That's kind of sad, isn't it? I mean isn't that the state of science all the time? Science is always screaming about something. That's the most effective way to teach You should never feed your kids juice before say acts years of age or months of age You should never do this. You should never do that. I mean very accurate. It was like everybody's like yeah, but I don't believe you They don't believe them because they don't understand the methodology, but that's a different issue So my point being that blitted a bit there that not validity, but believability is drastically important as much So as accuracy or reliability, right? The method that you use to calculate in your observer Interobserver agreement produces different results folks This is one of those little bitches of science that it's like man. Dang it. Which method to use I don't know. I can't tell you which method to use you got to decide and no matter what you decide You're gonna be wrong and you're gonna be right and you're gonna be wrong and you're gonna be right You're gonna be that battle is gonna weigh in your head like I don't know which piece to use just do your best All right. So anyway, here we go. Let's get some more The question really is this what is accurate enough? No, no, no, I mean did you have to agree 100% of the time you got a green 90% of the time You get a degree 75% of the time 80% 60% I don't know Everybody has a different rule. I've seen researchers and people use 90% of these people seem people use 85 I've been a part of a research team where we were forced to agree Which was really weird because I disagreed with people but yet we wouldn't leave the room until we agreed So finally we just got tired of staring each other spaces. We said fine. We agree it happened That isn't real agreement. That's forced agreement. That's really stupid. I don't like that at all And if you were the faculty member that made me do that, then you know that you're doing it wrong. Anyway Here we go. This is a big one folks total agreement It also sucks We're gonna take the total number of behaviors and we're gonna divide it. We're gonna. Sorry. That was stupid We're gonna take the we had two people. Let me explain the graph before I explain all the numbers So we are gonna use an inter-observer agreement procedure called total agreement. So the count all right So we're gonna count the behaviors this is a Duration sort of setting or latency or whatever so we can use this in all sorts of different types for different types of behavior counts of behavior Durations of behavior latencies of behavior so counts how many times happened duration how long it happened forward latency How long from the stimulus the response right we got that out there So these are good for all sorts of things, but this particular procedure sucks Let me explain why so we have five minute intervals And we have the numbers of behaviors that occurred in each interval each row is a different person So row one is me row two is you all right. So row one I saw it happen to four three five zero zero three one three four You saw it three four two three zero two three two four five times So what do we need to do we need to add up each row? So your observes observations divided by my observations or whatever I don't know which ones highest and lowest so we take the smaller total divide it by the larger total look We agreed 86% of the time, but here's why this sucks. Look at the next set of numbers So again, five minute intervals. I saw it two four three five zero zero zero zero zero zero zero zero whatever you saw it zero zero zero zero zero zero zero three two four five notice that in no point except the middle two intervals Did we actually agree? However, I saw the behavior happen 14 times and you saw it happen 14 times. I'll be darned We saw it accurately a hundred percent of the time If you believe that I got a bridge for you man girl whoever you are The point is folks think about this We can set this up to where you didn't we were we and you and I didn't see anything at all the same ever But yet our formula says we did we agreed a hundred percent of the time. This is why I hate total agreement. Let's move on Exact agreement. These are nice ones. Okay, so this one's really really accurate again Same sort of scenario for measuring behavior types of behaviors Notice we're just looking at intervals where we agreed so the first interval is I saw it to you saw it three Right no agreement next one is an agreement because we both saw it four times next one three two no next one five three No next one zero zero. Hey, we agree next one zero two and then three three yay We get and then never for the rest so three out of ten is 30 All right now this was the same data set, right? So look at the data set now as the previous grammar backup. Okay, so look at this We're at the bottom line here, so we're back at total agreement It says we agreed a hundred percent of the time exact same data calculated differently using an exact agreement procedure It looks like this total number of agreements divided by the total number of our intervals Equals zero percent in that second set down there We did not agree at all at no point did we agree on our data and it represents that that's exact agreement So that's a exact agreement of occurrence. Anyway, there's other ways We can do exact agreement of non-occurrence and there's all sorts of layers here that if you were in a BCBA program You would get into but we're not so we won't Anyway, let's move on Interval agreement these are good when you're using partial or whole interval or momentary time sampling or just time sampling and Or the duration type behaviors, so we're using x's rather than numbers this time So we agreed in the first one not in the second not in the third yes in the fourth Yes in the fifth yes in the six not in the seven yes in the eight yes in the nine and not in the ten So six out of ten intervals we agreed in notice We have agreements of occurrence and agreements of non-occurrence here We could break this down again into agreements of occurrence versus agreements of non-occurrence And but we're not going to do that for the sake of this particular lecture actually we are Here we go This is what happens when you don't pre-plan and you don't read your lectures five seconds before you write before you record them Anyway, so this time we're going to only count the inner observer agreement for occurrence is the top half And in the bottom is the non-occurrence So we're only going to look at intervals that have x's in them if there are interval where we both were one or both Of us have an x okay, so the first one the second one the third one the fourth one not the fifth Not the sixth the seventh the eighth the ninth and the tenth right so that means we have one two three four Five six seven eight so four out of those eight intervals We agreed we agreed in interval one in interval four in interval eight and interval nine So we agreed about 50% of the time on occurrence If we do non-occurrence now we're looking at anything where one of us observed nothing right? So interval two interval three interval five and will six interval seven and interval ten So now there's six intervals, but we only agreed twice Which is intervals number five and number six because we both saw nothing So we only agreed 33% of the time and not occurrence So these are some very simple ways to calculate inner observer agreement remember This is not reliability, but this does speak to believability. So we want these numbers pretty high Historically for us to really trust our data. We want these above 85% Some people say 90 some people say 80 I picked an arbitrary number right in the middle 85% why because I don't know Some I probably read it in a paper somewhere sometime, you know 730,000 years ago Because I'm getting old and been doing this stuff for a while. Anyway, I don't really know There's no really set requirement here But a high number is it makes it harder to get but it also makes your data more believable This speaks to a bigger issue Which is that we should always have two people observing something at the same time independently to make sure we can get an accurate assessment of that behavior and I also understand because I work in the real world But that might not actually be possible. Anyway, we'll come back for more on something else later. Take care. See y'all